WO2021219054A1 - Transverse federated learning system optimization method, apparatus and device, and readable storage medium - Google Patents

Transverse federated learning system optimization method, apparatus and device, and readable storage medium Download PDF

Info

Publication number
WO2021219054A1
WO2021219054A1 PCT/CN2021/090825 CN2021090825W WO2021219054A1 WO 2021219054 A1 WO2021219054 A1 WO 2021219054A1 CN 2021090825 W CN2021090825 W CN 2021090825W WO 2021219054 A1 WO2021219054 A1 WO 2021219054A1
Authority
WO
WIPO (PCT)
Prior art keywords
federated learning
participating
learning system
horizontal
system optimization
Prior art date
Application number
PCT/CN2021/090825
Other languages
French (fr)
Chinese (zh)
Inventor
程勇
刘洋
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021219054A1 publication Critical patent/WO2021219054A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method, device, equipment and readable storage medium for optimizing a horizontal federated learning system.
  • federated learning is increasingly used in various fields to solve the problem of "data islands".
  • the requirements for the efficiency of federated learning process in many fields are getting higher and higher, such as in the field of real-time data processing.
  • the technical resources of the equipment of each data party are different. Some devices are lacking in computing resources and cannot support efficient federated learning well, and may even lower the federation.
  • the overall efficiency of learning if these devices are not allowed to participate in the federated learning in order to prevent these devices from affecting the overall efficiency of federated learning, it will not be possible to benefit from the contribution of the data in these devices. Therefore, how to realize the contribution brought by the data of each data party in federated learning without affecting the overall efficiency of federated learning has become an urgent problem to be solved.
  • the main purpose of this application is to provide a horizontal federated learning system optimization method, device, equipment and readable storage medium, aiming to solve how to realize the contribution brought by the data of each data party in the federated learning without Issues affecting the overall efficiency of federated learning.
  • the present application provides a method for optimizing a horizontal federated learning system, which is applied to coordination devices participating in horizontal federated learning, and the method for optimizing a horizontal federated learning system includes the following steps:
  • the calculation task parameters including the estimated processing time step and/or the estimated processing batch size
  • the calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
  • the present application provides an optimization device for a horizontal federated learning system, the device is deployed in a coordination device that participates in a horizontal federated learning, and the device includes:
  • the acquisition module is used to acquire the device resource information of each participating device participating in the horizontal federated learning
  • the configuration module is used to configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size ;
  • the sending module is configured to send the calculation task parameters to each of the participating devices correspondingly, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
  • the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system stored in the memory and capable of running on the processor.
  • a federated learning system optimization program which implements the steps of the above-mentioned horizontal federated learning system optimization method when the horizontal federated learning system optimization program is executed by the processor.
  • this application also proposes a computer-readable storage medium, the computer-readable storage medium stores a horizontal federated learning system optimization program, which is implemented when the horizontal federated learning system optimization program is executed by a processor. The steps of the horizontal federated learning system optimization method as described above.
  • the device resource information of each participating device is obtained through the coordination device; the calculation task parameters are configured for each participating device according to the device resource information of each participating device, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size ;
  • the calculation task parameters of each participating device are correspondingly sent to each participating device, so that each participating device can perform a federated learning task according to the calculation task parameters.
  • the computing task parameters to be processed locally by each participating device are coordinated by configuring the computing task parameters for each participating device through the coordination device.
  • the computing task parameters include the estimated processing time step and/or the estimated processing batch size, that is, By configuring the expected processing time step and/or the expected processing batch size for the participating devices, the number of computing tasks to be processed locally by each participating device is coordinated; and through the device resource information of each participating device, each participating device
  • the equipment is configured with different computing task parameters, taking into account the differences in the equipment resources of each participating device; for participating devices with richer equipment resources, more computing tasks are allocated to them, and for participating devices with less equipment resources, It allocates fewer computing tasks, so that participating devices with less equipment resources can quickly complete local model parameter updates without the need for participating devices with more computing resources to spend time waiting, thereby improving the ability of each participating device to perform horizontal federated learning.
  • the overall efficiency can be used to contribute to the model training of the local data of each participating device, including the ability to use the contribution of the participating devices with less resources to increase the stability of the model, thereby realizing the horizontal federation learning Optimization of system efficiency and model performance.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for optimizing a horizontal federated learning system according to this application;
  • Fig. 3 is a functional schematic block diagram of a preferred embodiment of an optimization device for a horizontal federated learning system according to this application.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • optimization device of the horizontal federated learning system in the embodiment of the present application may be a smart phone, a personal computer, a server, and other devices, which are not specifically limited here.
  • the horizontal federated learning system optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 can be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • FIG. 1 does not constitute a limitation on the optimization device of the horizontal federated learning system, and may include more or fewer components than shown in the figure, or a combination of certain components, or different components. Component arrangement.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a horizontal federated learning system optimization program.
  • the operating system is a program that manages and controls the hardware and software resources of the device, and supports the operation of the optimization program of the horizontal federated learning system and other software or programs.
  • the horizontal federated learning system optimization device is a coordination device that participates in horizontal federated learning
  • the user interface 1003 is mainly used for data communication with the client
  • the network interface 1004 is mainly used for participating in horizontal federated learning Participating devices of Establish a communication connection
  • the processor 1001 can be used to call the horizontal federated learning system optimization program stored in the memory 1005, and perform the following operations:
  • the calculation task parameters including the estimated processing time step and/or the estimated processing batch size
  • the calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
  • the step of configuring the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource information of each device includes:
  • the calculation task parameters in the training process of the federated learning model corresponding to each participating device are respectively configured.
  • the step of configuring the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource category to which each participating device belongs includes:
  • the candidate task parameters of each participating device are correspondingly used as the computing task parameters of each participating device.
  • the processor 1001 may be used to call the horizontal federated learning system optimization program stored in the memory 1005, and also perform the following operations:
  • the time step selection strategy is correspondingly sent to each of the participating devices, so that each of the participating devices selects sequence selection data from their respective sequence data according to their respective time step selection strategies, and selects sequence selection data according to the sequence
  • the data performs a federated learning task, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
  • the processor 1001 may be used to call the horizontal federation stored in the memory 1005 Learn system optimization procedures, and also perform the following operations:
  • the estimated processing batch size corresponding to each of the participating devices configure the learning rate corresponding to each of the participating devices
  • the learning rate is correspondingly sent to each participating device, so that each participating device executes a federated learning task according to the respective learning rate and the estimated processing batch size received from the coordination device.
  • the step of correspondingly sending the calculation task parameters to each of the participating devices so that each of the participating devices can execute a federated learning task according to the respective calculation task parameters includes:
  • the calculation task parameters are correspondingly sent to each of the participating devices, and the estimated duration of the current round of global model update is sent to each of the participating devices, so that each of the participating devices can perform a local model based on the calculation task parameters.
  • the number of local model training is adjusted according to the estimated duration.
  • the step of obtaining device resource information of each participating device participating in horizontal federated learning includes:
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for optimizing a horizontal federated learning system according to this application. It should be noted that although the logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.
  • the method for optimizing the horizontal federated learning system is applied to the coordination device participating in the horizontal federated learning.
  • the coordination device and each participating device participating in the horizontal federated learning may be devices such as smart phones, personal computers, and servers.
  • the optimization method of the horizontal federated learning system includes:
  • Step S10 Obtain device resource information of each participating device participating in horizontal federated learning
  • the coordination device and each participating device can establish a communication connection in advance through inquiry handshake authentication and identity authentication, and determine the model to be trained for this federated learning, such as a neural network model or other machine learning models. It may be that each participating device locally constructs the model to be trained with the same or similar structure, or the coordination device constructs the model to be trained and then sends it to each participating device. Each participating device locally has training data for training the model to be trained.
  • Model update refers to updating the model parameters of the model to be trained, such as the connection weight value between neurons in the neural network. Finally, a model that meets the quality requirements is obtained.
  • each participating device uses their local training data to perform local training on the local model to be trained to obtain local model parameter updates.
  • the local model parameter updates can be gradient information used to update model parameters or local Updated model parameters; each participating device sends its own local model parameter update to the coordination device; the coordination device merges each local model parameter update, such as weighted average, to obtain the global model parameter update, and send it to each participating device; Each participating device uses the global model parameter update to update the model parameters of the local model to be trained, that is, to update the local model to be trained, and so far complete a global model update. After each global model update, the model parameters of the model to be trained locally on each participating device are synchronized.
  • the coordinating device can obtain the device resource information of each participating device.
  • the device resource information can be resource information related to computing efficiency and communication efficiency in the participating device, for example, computing resource information, power resource information, and communication resource information.
  • the computing resource can be measured by the number of CPUs and GPUs owned by the participating device. Indicates that the power resource can be represented by the time that the participating device can continue to work, and the communication resource can be represented by the communication rate of the participating device. It may be that the coordination device sends a device resource query request to each participating device, and each participating device uploads its current device resource information to the coordination device after receiving the request.
  • Step S20 Configure calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size;
  • the coordinating device can obtain the device resource information of each participating device before the first global model update, so as to configure the calculation task parameters in the subsequent global model update for each participating device; or, after entering a certain or every global model update Previously, the device resource information of each participating device was obtained to configure the calculation task parameters in the current global model update for each participating device. That is, the coordination device can obtain the device resource information of each participating device and configure the computing task parameters for each participating device to participate in the training of the federated learning model during the training process of the federated learning model.
  • the calculation task parameters include the estimated processing time step and/or the estimated processing batch size, that is, the calculation task parameters include the estimated processing time step, or the estimated processing batch size, or the estimated processing time step and the estimated processing batch size.
  • the time step refers to the number of time steps.
  • the time step is a concept in the cyclic neural network model and is for sequence data.
  • the estimated processing time step refers to the number of time steps that the participating equipment is expected to process.
  • the mini-batch size refers to the size of the data batch used for model update each time, and the expected processing batch size refers to the data batch size expected to be used when the participating devices perform local model updates.
  • the coordinating device After obtaining the device resource information of each participating device, the coordinating device configures calculation task parameters for each participating device according to the respective device resource information. Specifically, the richer the computing resources of the participating devices, the higher the computing efficiency, the richer the power resources, the longer the time that they can continue to participate in federated learning, the richer the communication resources (large communication bandwidth, short communication delay), and data transmission. The higher the efficiency. Then, for the same computing task, a device with richer device resource information will take less time.
  • the principle of configuring computing task parameters is to configure more computing tasks for participating devices with richer device resources, and configure fewer computing tasks for participating devices with fewer device resources, so as to make the processing time of each participating device the same as possible.
  • the number of calculation tasks can be quantified by the estimated processing time step or the estimated processing batch size; the larger the estimated processing time step or the larger the estimated processing batch size, the more data the participating equipment has to process, and the expected processing The more time steps, the more computing tasks, and the larger the expected processing batch size, the more computing tasks.
  • the participating device locally uploads gradient information or model parameter information, the amount of data uploaded will not be different due to the batch size or time step size. If the communication resources of the participating device are relatively abundant, Then it shows that the data transmission speed of the participating devices is fast, and the time required to upload data is relatively small, and more time can be spent on local model training. Therefore, the coordination device can allocate more computing tasks to the participating devices with richer communication resources. , That is, configure more expected processing time steps or expected processing batch sizes.
  • the coordination device can configure the calculation task parameters according to the device resource information of the participating devices.
  • the corresponding relationship between the device resource information and the computing task parameters can be preset; the device resource information is divided into several segments according to the value, and each segment represents a different degree of resource richness.
  • the device resource information is When the number of CPUs is used, the number of CPUs can be divided into several segments. The more segments the CPU number, the richer the technical resources; the calculation task parameters corresponding to each segment are set in advance, that is, the corresponding relationship.
  • step S30 the calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices can execute a federated learning task according to the respective calculation task parameters.
  • the coordination device correspondingly sends the calculation task parameters of each participating device to each participating device. After the participating device receives the calculation task parameter, it uses the received calculation task parameter to execute the federated learning task. Specifically, when the coordination device sends the calculation task parameters to be used in each subsequent global model update, the participating device participates in each subsequent global model update based on the calculation task parameters, thereby completing the federated learning task; when the coordination device sends When calculating task parameters to be used in a global model update of the, the participating device participates in the global model update based on the calculation task parameters.
  • the participating device selects the data of the estimated processing time step from the local sequence data used for model training (the selected data is called Sequence selection data); Specifically, for each sequence data composed of multiple time step data, the participating device selects a part of the time step data from the sequence data as the sequence selection data, and the time step length of this part of the time step data can be less than Or equal to the expected processing time step, where, when the time step of the sequence data is less than or equal to the expected processing time step, the sequence data is not selected, and the sequence data is directly used as the sequence selection data; for example, for a There are sequence data with 32 time step data, and the processing time step is expected to be 15, then the participating device can select 15 time step data from the sequence data as the sequence selection data; there are many ways to select, such as one interval Choose one, or choose 15 randomly, etc.
  • Different selection methods can be set according to specific model application scenarios; in addition, the selection methods used when participating in the global model parameter update can be different, so as to ensure that most of the local data can be Used to participate in model training; select data based on each sequence whose selected time step is less than or equal to the estimated processing time step, and perform one or more local model training on the training model to obtain local model parameter updates and perform local model
  • the training process is the same as the general training process of the recurrent neural network, and will not be described in detail here;
  • the local model parameter update is sent to the coordination device, and the coordination device merges the local model parameter update of each participating device to obtain the global model parameter update, and sends it to Each participating device; after the participating device receives the global model parameter update, it uses the global model parameter update to update the model parameters of the local model to be trained.
  • the participating equipment can divide multiple pieces of local training data into multiple batches, and the size of each batch is The number of pieces of training data contained in each batch is less than or equal to the expected processing batch size received from the coordination device; for example, the expected processing batch size received by the participating device is 100 pieces, and the participating device has 1000 pieces of training data locally.
  • the participating device can divide the local training data into 10 batches; after the participating device divides the local training data into batches according to the expected processing batch size, in the process of participating in a global model update, one batch of data is used to update the local model. .
  • the participating equipment When the model to be trained is a cyclic neural network and the calculation task parameters are the estimated processing time step and the estimated processing batch size, the participating equipment combines the operations in the above two cases to select the time step for each piece of training data, and select Each obtained sequence selection data is divided into batches, and each batch of sequence selection data is used to participate in the global model update.
  • the input data of the recurrent neural network can be variable, that is, the time step of the input data can be different, This can support each participating device to perform local model training based on different estimated processing time steps.
  • the device resource information of each participating device is obtained through the coordination device; the calculation task parameters are configured for each participating device according to the device resource information of each participating device, and the calculation task parameters include the estimated processing time step and/or the estimated processing Batch size:
  • the calculation task parameters of each participating device are correspondingly sent to each participating device, so that each participating device can perform a federated learning task according to the calculation task parameters.
  • the computing task parameters to be processed locally by each participating device are coordinated by configuring the computing task parameters for each participating device by the coordination device.
  • the computing task parameters include the expected processing time step and/or the expected processing batch size, and also That is, by configuring the participating devices with the expected processing time step and/or the expected processing batch size, the number of computing tasks to be processed locally by each participating device is coordinated; and the device resource information of each participating device is assigned to each Participating equipment is configured with different computing task parameters, taking into account the differences in equipment resource conditions of each participating equipment; for participating equipment with richer equipment resources, more computing tasks are allocated to it, and for participating equipment with less equipment resources, Assign less computing tasks to it, so that participating devices with less equipment resources can quickly complete local model parameter updates without the need for participating devices with more computing resources to spend time waiting, thereby improving the horizontal federated learning of each participating device At the same time, it can also use the contribution of the local data of each participating device to model training, including the ability to use the contribution of the participating devices with less resources, and increase the stability of the model, thereby achieving a horizontal federation Optimization of learning system efficiency and model performance.
  • the step S20 includes:
  • Step S201 Classify each of the participating devices according to each of the device resource information, and determine the resource category to which each of the participating devices belongs;
  • the coordination device classifies each participating device according to the device resource information of each participating device, and determines the respective resource category of each participating device.
  • the coordinating device can arrange the resource information of each device according to the numerical value.
  • the number of categories to be classified can be set in advance, and then the interval formed by the minimum and maximum values in the sorting can be divided equally to obtain the number of preset categories.
  • Each divided segment is a category, and the value corresponding to the resource information of each participating device is judged to belong to which category.
  • the device resource information includes computing resource information
  • the computing resource information is represented by the number of CPUs of participating devices
  • the number of CPUs of each participating device may be arranged. It is understandable that, compared to the way of pre-setting each resource category, the resource category is determined according to the resource information of each device in this embodiment, so that the division of the resource category is more in line with the actual resource situation of the participating device, and can be adapted to the participating device.
  • the resource situation is not static in the process of federated learning.
  • the equipment resource information includes data of multiple types of equipment resources
  • the data of various types of equipment resources can be normalized, so that the data of various types of resources can be calculated and compared.
  • the normalization method can be a commonly used normalization method, which will not be described in detail here.
  • device resources include computing resources, power resources, and communication resources
  • by normalizing computing resource data, power resource data, and communication resource data the computing resources, power resources, and communication resources can be normalized.
  • the normalized data of the various device resources of the participating device can be obtained, and various types can be set in advance according to the degree of influence of various resources on the local computing efficiency of the participating device
  • the weight value of the resource, and then the normalized data of various equipment resources are weighted and averaged to obtain a value that can evaluate the overall resource richness of the participating equipment; the coordination equipment is based on the calculated value of each participating equipment, and then the above Sorting, division and classification operations.
  • the complex device resource information is quantified, so that it is more convenient and accurate to configure the calculation task parameters for each participating device; and through The quantification of complex device resource information can realize the division of resource categories for each participating device, and thus can more quickly configure computing tasks for the participating devices.
  • Step S202 According to the resource category to which each participating device belongs, respectively configure the calculation task parameters in the training process of the federated learning model corresponding to each participating device.
  • the coordination device After determining the resource category to which each participating device belongs, the coordination device configures the computing task parameters corresponding to each participating device according to the resource category to which each participating device belongs.
  • a maximum estimated processing time step can be set in advance, and each resource category can be numbered according to resource richness from low to high: 1, 2, 3..., and then according to the maximum estimated processing time step, each The estimated processing time step corresponding to each category can be specifically obtained by dividing the maximum estimated processing time step by the number of resource categories to obtain a minimum time step, and then multiply the number of each resource category by the minimum time step to obtain The estimated processing time step corresponding to each resource category.
  • the maximum estimated processing time step is 32 and the number of resource types is 4, then, from low to high resource type richness, the estimated processing time steps corresponding to the four resource types are: 8, 16, 24 And 32.
  • a maximum expected processing batch size can be set in advance, and then the expected processing batch size corresponding to each category is calculated according to the maximum expected processing batch size.
  • step S202 includes:
  • Step S2021 Determine the candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs;
  • the coordination device may determine the candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs.
  • the calculation task parameters corresponding to each resource category can be calculated in a manner similar to the above-mentioned method of calculating the estimated processing time step corresponding to each resource category based on the maximum estimated processing time step, and then the calculation task parameters corresponding to each participating device
  • the resource category determines the calculation task parameter corresponding to each participating device, and first the calculation task parameter is used as the candidate task parameter.
  • Step S2022 Determine the expected processing duration corresponding to each of the participating devices based on the candidate task parameters, and detect whether each of the expected processing durations meets a preset duration consistency condition;
  • the coordinating device can determine the estimated processing time corresponding to each participating device based on the candidate task parameters corresponding to each participating device, that is, determine the time required for each participating device to perform the federated learning task according to its candidate task parameters, which can be specifically participating When the device participates in the next global model update according to the candidate task parameters, the time required for local model training and model parameter update upload.
  • the coordination device may estimate the time required for each participating device to perform local model training and upload model parameter updates according to the candidate task parameters based on the device resource information of each participating device.
  • the unit resource processing unit time step or unit batch size unit time can be set according to the test or experience in advance, then the unit time can be based on the unit time and the actual resources of the participating equipment, as well as the estimated processing time in the candidate task parameters.
  • Step size or estimated processing batch size calculate the estimated processing time of the participating equipment, the whole process can be similar to the principle of multiplying the unit by the quantity to get the total.
  • the participating device has 1 CPU and processes data of 1 time step in advance based on experience, it will take x time to train the local model, and y time to upload the model parameter update, and a participating device has 3 If a CPU needs to process 10 time steps, then the estimated processing time of the participating device is (10x+10y)/3.
  • the coordination device After the coordination device obtains the estimated processing time corresponding to each participating device by calculation, it can detect whether the estimated processing time of each participating device meets the preset duration consistency condition.
  • the preset duration consistency condition can be preset.
  • the preset duration consistency condition can be set such that the difference between the maximum value and the minimum value of each estimated processing duration needs to be less than a set threshold.
  • the coordination device detects the difference between the maximum value and the minimum value of each estimated processing duration When the difference is less than the threshold, it means that the estimated processing time of each participating device meets the preset time-length consistency condition, that is, the estimated processing time of each participating device is approximately the same.
  • the coordinating device When the coordinating device detects that the difference between the maximum value and the minimum value of each estimated processing time is not less than the threshold, it indicates that the estimated processing time of each participating device does not meet the preset duration consistency condition, that is, the expected processing time of each participating device The difference between the processing time is large.
  • step S2023 if the predetermined duration consistency condition is satisfied between the estimated processing durations, the candidate task parameters of each participating device are correspondingly used as the computing task parameters of each participating device.
  • the coordination device detects that each estimated processing time meets the preset duration consistency condition, the candidate task parameter of each participating device can be correspondingly used as the final calculation task parameter of each participating device.
  • the coordinating device detects that the expected processing time does not meet the preset duration consistency condition, it means that the expected processing time of each participating device has a large difference. At this time, if each participating device executes subsequent tasks based on the corresponding candidate task parameters For federated learning tasks, there will be a big difference in processing time. Some participating devices may need to wait for other participating devices. For example, during a global model update process, the time it takes for part of the participating devices to update the local model and upload the model update parameters. It is relatively short, and the other part of the participating equipment takes a long time, so it is necessary to coordinate the equipment and the participating equipment with a relatively short time to wait. Until the participating equipment with a relatively long time uploads the model parameter update, the coordination equipment can be integrated.
  • the global model parameters are updated, and then the global model update is completed. Therefore, the coordination device can make adjustments on the basis of the parameters of each candidate task when detecting that the expected processing duration does not meet the preset duration consistency condition. For example, you can reduce the candidate task parameters of the participating equipment with the largest expected processing time. The reduction can be to reduce the expected processing time step, or to reduce the expected processing batch size, or to reduce the expected processing time step. And the expected processing batch size is reduced. After adjusting on the basis of each candidate task parameter, based on the adjusted each candidate task parameter, the coordination device again estimates the expected processing time of each participating device, and detects whether each expected processing time meets the preset duration consistency condition, and so on. Until it is detected that the preset duration coincidence condition is met.
  • each participating device by configuring each participating device with a calculation task parameter that enables the expected processing time of each participating device to meet the preset duration consistency condition, the expected processing time of each participating device is as consistent as possible, so that the expected processing time of each participating device is as consistent as possible.
  • it can also make use of the contributions brought by the data of each participating device. Even the participating devices with poor device resources can also use the training they own. Contribution of data.
  • the method further includes:
  • Step S40 Configure a time step selection strategy corresponding to each participating device according to the estimated processing time step corresponding to each participating device;
  • the coordination device can configure each participating device according to the estimated processing time step corresponding to each participating device.
  • the recurrent neural network (RNN, Recurrent The Neural Network model can be an ordinary RNN, or a deep RNN, LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit, gated recurrent unit) and IndRNN (Independently Recurrent) Neural Network, independent recurrent neural network) and so on.
  • the time step selection strategy is the strategy for selecting part of the time step data from the sequence data.
  • the principle is that each participating device selects from the sequence data
  • the position of the time step data in the sequence data is as complementary as possible. For example, if the estimated processing time steps of participating device A and participating device B are both 15, the time step selection strategy that the coordinating device configures for participating device A can be to select the time step data of the odd sequence number in the sequence data and configure it for participating device B
  • the time step selection strategy can be to select the time step data of the even number in the sequence data.
  • the sequence data used by each participating device during local model training can be distributed differently in time steps.
  • the model to be trained can learn features from more different sequence data, thereby improving the generalization ability of the model, that is, it can have better predictive ability for various forms of new samples.
  • Step S50 the time step selection strategy is correspondingly sent to each of the participating devices, so that each of the participating devices selects from their respective sequence data according to their respective time step selection strategies to obtain sequence selection data, and according to all The sequence selection data performs a federated learning task, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
  • the coordination device correspondingly sends the time step selection strategy of each participating device to each participating device. It should be noted that the coordinating device may send the time step selection strategy of each participating device and the estimated processing time step to the participating device together, or separately. After receiving the estimated processing time step and the time step selection strategy, the participating equipment selects sequence selection data from the local sequence data according to the time step selection strategy, and the time step of the sequence selection data is less than or equal to the expected processing Time Step.
  • the method when the calculation task parameter includes the estimated processing batch size, after the step S30, the method further includes:
  • Step S60 Configure the learning rate corresponding to each participating device according to the estimated processing batch size corresponding to each participating device;
  • the coordination device may configure the learning rate corresponding to each participating device according to the estimated processing batch size corresponding to each participating device.
  • the learning rate is a hyperparameter in the model training process.
  • the coordination device can set a reference learning rate. If the expected processing batch size of the participating device is less than the batch size corresponding to the reference learning rate, configure the participating device with a learning rate less than the reference learning rate. If the expected processing of the participating device is If the batch size is greater than the batch size corresponding to the reference learning rate, the participating device is configured with a learning rate greater than the reference learning rate.
  • Step S70 The learning rate is correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective learning rate and the estimated processing batch size received from the coordinating device .
  • the coordinating device After configuring the learning rate for each participating device, the coordinating device sends the learning rate of each participating device to each participating device correspondingly. It should be noted that the coordination device may send the learning rate of each participating device and the estimated processing batch size to the participating device together, or separately. After receiving the learning rate and the estimated processing batch size, the participating equipment executes the federated learning task according to the learning rate and the estimated processing batch size. For example, in the local model training, the data batch of the estimated processing batch size is used for model training , The learning rate is used to update the model parameters when updating the model parameters.
  • the coordinating device configures the learning rate for each participating device based on the estimated processing batch size of each participating device, so that the coordinating device can control the model convergence speed of each participating device as a whole.
  • the devices set different learning rates so that the model convergence speed of each participating device tends to be consistent, so that the model to be trained can converge better during the federated learning process.
  • the aforementioned coordination device configuring the time step selection strategy and learning rate of each participating device can also be implemented in combination. Specifically, the coordinating device configures the time step selection strategy of each participating device according to the estimated processing time step of each participating device, and configures the learning rate of each participating device according to the expected processing batch size of each participating device; the coordination device adjusts the time of each participating device Step selection strategy and learning rate, as well as calculation task parameters are sent to each participating device correspondingly; the participating device selects sequence selection data from local sequence data according to the selection strategy according to the time step, and uses sequence selection data of the expected processing batch size to proceed. Local model training, the received learning rate is used to update the model during the training process.
  • step S20 includes:
  • Step S203 Send the calculation task parameters to each of the participating devices correspondingly, and send the estimated duration of the current round of global model update to each of the participating devices, so that each of the participating devices can use the calculation task parameters according to the calculation task parameters.
  • the number of local model training is adjusted according to the estimated duration.
  • the coordinating device sends the calculation task parameters of each participating device to each participating device, and at the same time sends the estimated duration of the current round of global model update to each participating device.
  • the estimated duration of this round of global model update may be determined according to the estimated processing duration of each participating device, for example, the maximum of the estimated processing duration of each participating device is taken as the estimated duration of this round of global model update.
  • the participating device can calculate the length of time 1 spent for a local model training after a local model training, and then determine whether the result of subtracting the expected time length from the time length 1 is greater than the time length 1, and if it is greater than the time length 1, it will do it again.
  • Local model training that is, increase the number of local model training; then calculate the duration 2 of the local model training, and then determine whether the result obtained by subtracting the duration 1 and duration 2 from the expected duration is greater than duration 2, and if it is greater than, Then perform another local model training; in this way, until it is detected that the remaining time is less than the time spent in the last local model training, the local model training is no longer performed, and the local model parameters obtained from the local model training are updated and uploaded to the coordination device . That is, when the participating device judges that the remaining time is sufficient for one local model training based on the estimated duration, it will add another local model training.
  • the coordinating device sends the estimated duration of this round of global model update to each participating device, so that when each participating device actually performs local model training faster, the number of local model training can be increased to avoid wasting Spend time waiting for other participating devices.
  • step S10 includes:
  • Step S101 Receive device resource information sent by each participating device participating in horizontal federated learning, where the device resource information includes at least one or more of power resource information, computing resource information, and communication resource information.
  • each participating device actively uploads its own device resource information to the coordinating device.
  • the coordination device receives the device resource information uploaded by each participating device.
  • the device resource information may at least include one or more of power resource information, computing resource information, and communication resource information.
  • computing resources can be represented by the number of CPUs and GPUs owned by the participating device
  • power resources can be represented by the time the participant can continue to work
  • communication resources can be represented by the communication rate of the participating device.
  • each participating device may be a remote sensing satellite with different sequence of image data, and each remote sensing satellite uses their respective image data to perform horizontal federated learning, and train RNN to complete the meteorological prediction task.
  • the coordination device can be one of the remote sensing satellites or a base station located on the ground.
  • the coordination equipment obtains the equipment resource information of each remote sensing satellite, and then configures the calculation task parameters for each remote sensing satellite according to the equipment resource information of each remote sensing satellite.
  • the calculation task parameters include the estimated processing time step and/or the estimated processing batch size;
  • the satellite's calculation task parameters are correspondingly sent to each remote sensing satellite, so that each remote sensing satellite can perform a federated learning task according to the calculation task parameters to complete the training of the RNN.
  • each remote sensing satellite can input the recently captured sequence remote sensing image data into the RNN, and the next weather conditions can be predicted by the RNN.
  • the coordination equipment coordinates the calculation tasks of each remote sensing satellite according to the equipment resource information of each remote sensing satellite, so that there is no need for remote sensing satellites with rich computing resources to spend time waiting during the training process, thereby improving each remote sensing satellite
  • the overall efficiency of horizontal federated learning can speed up the deployment of weather forecasting RNNs.
  • the contribution of the data of each remote sensing satellite to the model training increases the stability of the model, and makes the prediction obtained through RNN The weather conditions and the results are more reliable.
  • an embodiment of the present application also proposes a horizontal federated learning system optimization device, which is deployed on coordination equipment participating in horizontal federated learning.
  • the device includes:
  • the obtaining module 10 is used to obtain the device resource information of each participating device participating in the horizontal federated learning
  • the configuration module 20 is configured to configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size;
  • the sending module 30 is configured to correspondingly send the computing task parameters to each of the participating devices, so that each of the participating devices can execute a federated learning task according to the respective computing task parameters.
  • configuration module 20 includes:
  • the classification unit is configured to classify each of the participating devices according to each of the device resource information, and determine the resource category to which each of the participating devices belongs;
  • the configuration unit is used to configure the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource category to which each participating device belongs.
  • the configuration unit includes:
  • the first determining subunit is configured to determine the candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs;
  • the detection subunit is configured to respectively determine the expected processing duration corresponding to each of the participating devices based on the candidate task parameters, and to detect whether each of the expected processing durations meets a preset duration consistency condition;
  • the second determining subunit is configured to correspond to the candidate task parameter of each participating device as the calculation task parameter of each participating device if the preset duration consistency condition is satisfied between the expected processing durations.
  • the configuration module 20 is further configured to perform processing according to the expected processing corresponding to each of the participating devices. Time step, configure the time step selection strategy corresponding to each of the participating devices;
  • the sending module 30 is also configured to send the time step selection strategy to each of the participating devices correspondingly, so that each of the participating devices selects the sequence selection from their respective sequence data according to their respective time step selection strategies. Data, and perform a federated learning task according to the sequence selection data, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
  • the configuration module 20 is further configured to configure the respective learning rate of each participating device according to the estimated processing batch size corresponding to each participating device;
  • the sending module 30 is also configured to send the learning rate to each of the participating devices correspondingly, so that each of the participating devices according to the respective learning rate and the expected processing batch received from the coordination device The size performs federated learning tasks.
  • the sending module 30 is also configured to send the calculation task parameters to each participating device correspondingly, and send the estimated duration of the current round of global model update to each participating device, so as to provide each participating device.
  • the device performs local model training according to the calculation task parameters, it adjusts the number of local model training according to the estimated duration.
  • the acquisition module 10 is also configured to receive device resource information sent by each participating device participating in horizontal federated learning, where the device resource information includes at least one of power resource information, computing resource information, and communication resource information. Or multiple.
  • the expanded content of the specific implementation of the horizontal federated learning system optimization device of this application is basically the same as each embodiment of the above-mentioned horizontal federated learning system optimization method, and will not be repeated here.
  • the embodiment of the present application also proposes a computer-readable storage medium, the storage medium stores a horizontal federated learning system optimization program, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation as described below Learn the steps of the system optimization method.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed are a transverse federated learning system optimization method, apparatus and device, and a readable storage medium. The method comprises: acquiring device resource information of each participation device that participates in transverse federated learning; respectively configuring, according to the device resource information, a calculation task parameter during a federated learning model training process corresponding to each participation device, wherein the calculation task parameter comprises a predicted processing time step and/or a predicted processing batch size; and correspondingly sending the calculation task parameter to each participation device, such that each participation device executes a federated learning task according to the respective calculation task parameter.

Description

横向联邦学习***优化方法、装置、设备及可读存储介质Horizontal federated learning system optimization method, device, equipment and readable storage medium
本申请要求2020年4月29日申请的,申请号为202010359198.8,名称为“横向联邦学习***优化方法、装置、设备及可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。This application claims the priority of the Chinese patent application filed on April 29, 2020, the application number is 202010359198.8, titled "Horizontal federated learning system optimization method, device, equipment and readable storage medium", the full text of which is hereby introduced Reference.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及一种横向联邦学习***优化方法、装置、设备及可读存储介质。This application relates to the field of artificial intelligence, and in particular to a method, device, equipment and readable storage medium for optimizing a horizontal federated learning system.
背景技术Background technique
随着联邦学习技术的逐渐发展,联邦学习愈来愈多地被运用于各个领域去解决“数据孤岛”的问题。随着联邦学习技术的越发成熟,在很多领域中对联邦学习过程的效率要求也越来越高,如在实时数据处理的领域。然而,在实际的联邦学习过程中,各个数据方的设备的技术资源情况是不同的,一部分设备在计算资源上较为欠缺,并不能够很好地支持高效率的联邦学习,甚至会拉低联邦学习整体的效率;如果为了不让这些设备影响联邦学习整体效率而不让这些设备参与联邦学习,就无法利这些设备中的数据所带来的贡献。因此,如何实现在联邦学习中即利用到各个数据方的数据所带来的贡献,又不影响联邦学习的整体效率,成为了一个亟待解决的问题。With the gradual development of federated learning technology, federated learning is increasingly used in various fields to solve the problem of "data islands". As the federated learning technology becomes more mature, the requirements for the efficiency of federated learning process in many fields are getting higher and higher, such as in the field of real-time data processing. However, in the actual federated learning process, the technical resources of the equipment of each data party are different. Some devices are lacking in computing resources and cannot support efficient federated learning well, and may even lower the federation. The overall efficiency of learning; if these devices are not allowed to participate in the federated learning in order to prevent these devices from affecting the overall efficiency of federated learning, it will not be possible to benefit from the contribution of the data in these devices. Therefore, how to realize the contribution brought by the data of each data party in federated learning without affecting the overall efficiency of federated learning has become an urgent problem to be solved.
技术问题technical problem
本申请的主要目的在于提供一种横向联邦学习***优化方法、装置、设备及可读存储介质,旨在解决如何实现在联邦学习中既利用到各个数据方的数据所带来的贡献,又不影响联邦学习的整体效率的问题。The main purpose of this application is to provide a horizontal federated learning system optimization method, device, equipment and readable storage medium, aiming to solve how to realize the contribution brought by the data of each data party in the federated learning without Issues affecting the overall efficiency of federated learning.
技术解决方案Technical solutions
为实现上述目的,本申请提供一种横向联邦学习***优化方法,所述方法应用于参与横向联邦学习的协调设备,所述横向联邦学习***优化方法包括以下步骤:To achieve the above objective, the present application provides a method for optimizing a horizontal federated learning system, which is applied to coordination devices participating in horizontal federated learning, and the method for optimizing a horizontal federated learning system includes the following steps:
获取参与横向联邦学习的各参与设备的设备资源信息;Obtain device resource information of each participating device participating in horizontal federated learning;
根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;Configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, the calculation task parameters including the estimated processing time step and/or the estimated processing batch size;
将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。The calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
为实现上述目的,本申请提供一种横向联邦学习***优化装置,所述装置部署于参与横向联邦学习的协调设备,所述装置包括:In order to achieve the above objective, the present application provides an optimization device for a horizontal federated learning system, the device is deployed in a coordination device that participates in a horizontal federated learning, and the device includes:
获取模块,用于获取参与横向联邦学习的各参与设备的设备资源信息;The acquisition module is used to acquire the device resource information of each participating device participating in the horizontal federated learning;
配置模块,用于根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;The configuration module is used to configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size ;
发送模块,用于将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。The sending module is configured to send the calculation task parameters to each of the participating devices correspondingly, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
为实现上述目的,本申请还提供一种横向联邦学习***优化设备,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如上所述的横向联邦学习***优化方法的步骤。In order to achieve the above objective, this application also provides a horizontal federated learning system optimization device. The horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system stored in the memory and capable of running on the processor. A federated learning system optimization program, which implements the steps of the above-mentioned horizontal federated learning system optimization method when the horizontal federated learning system optimization program is executed by the processor.
此外,为实现上述目的,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如上所述的横向联邦学习***优化方法的步骤。In addition, in order to achieve the above object, this application also proposes a computer-readable storage medium, the computer-readable storage medium stores a horizontal federated learning system optimization program, which is implemented when the horizontal federated learning system optimization program is executed by a processor. The steps of the horizontal federated learning system optimization method as described above.
有益效果Beneficial effect
本申请中,通过协调设备获取各个参与设备的设备资源信息;根据各个参与设备的设备资源信息为各个参与设备分别配置计算任务参数,计算任务参数包括预计处理时间步长和/或预计处理批大小;将各个参与设备的计算任务参数对应发送给各个参与设备,供各个参与设备根据计算任务参数来执行联邦学习任务。在本申请中,通过协调设备为各个参与设备配置计算任务参数,来协调各个参与设备本地要处理的计算任务,其中,计算任务参数包括预计处理时间步长和/或预计处理批大小,也即通过给参与设备配置预计处理的时间步长和/或预计处理的批大小的方式,来协调各个参与设备本地要处理的计算任务的多少;并通过根据各个参与设备的设备资源信息,给各个参与设备配置不同的计算任务参数,考虑到了各个参与设备在设备资源情况上的差异性;对于设备资源较丰富的参与设备,给其分配较多的计算任务,对于设备资源较欠缺的参与设备,给其分配较少的计算任务,使得设备资源较欠缺的参与设备也能够快速地完成本地模型参数更新,而无需计算资源较为丰富的参与设备花费时间等待,从而提高了各个参与设备进行横向联邦学习的整体效率,同时,又能够运用到各个参与设备本地数据对模型训练的贡献,包括能够利用资源较为欠缺的参与设备所带来的贡献,更加加大模型的稳定性,从而实现了对横向联邦学习***的效率和模型性能的优化。In this application, the device resource information of each participating device is obtained through the coordination device; the calculation task parameters are configured for each participating device according to the device resource information of each participating device, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size ; The calculation task parameters of each participating device are correspondingly sent to each participating device, so that each participating device can perform a federated learning task according to the calculation task parameters. In this application, the computing task parameters to be processed locally by each participating device are coordinated by configuring the computing task parameters for each participating device through the coordination device. The computing task parameters include the estimated processing time step and/or the estimated processing batch size, that is, By configuring the expected processing time step and/or the expected processing batch size for the participating devices, the number of computing tasks to be processed locally by each participating device is coordinated; and through the device resource information of each participating device, each participating device The equipment is configured with different computing task parameters, taking into account the differences in the equipment resources of each participating device; for participating devices with richer equipment resources, more computing tasks are allocated to them, and for participating devices with less equipment resources, It allocates fewer computing tasks, so that participating devices with less equipment resources can quickly complete local model parameter updates without the need for participating devices with more computing resources to spend time waiting, thereby improving the ability of each participating device to perform horizontal federated learning. The overall efficiency, at the same time, can be used to contribute to the model training of the local data of each participating device, including the ability to use the contribution of the participating devices with less resources to increase the stability of the model, thereby realizing the horizontal federation learning Optimization of system efficiency and model performance.
附图说明Description of the drawings
图1是本申请实施例方案涉及的硬件运行环境的结构示意图;FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present application;
图2为本申请横向联邦学习***优化方法第一实施例的流程示意图;2 is a schematic flowchart of a first embodiment of a method for optimizing a horizontal federated learning system according to this application;
图3为本申请横向联邦学习***优化装置较佳实施例的功能示意图模块图。Fig. 3 is a functional schematic block diagram of a preferred embodiment of an optimization device for a horizontal federated learning system according to this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的实施方式Embodiments of the present invention
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图。As shown in FIG. 1, FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
需要说明的是,本申请实施例横向联邦学习***优化设备可以是智能手机、个人计算机和服务器等设备,在此不做具体限制。It should be noted that the optimization device of the horizontal federated learning system in the embodiment of the present application may be a smart phone, a personal computer, a server, and other devices, which are not specifically limited here.
如图1所示,该横向联邦学习***优化设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1, the horizontal federated learning system optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 can be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
本领域技术人员可以理解,图1中示出的设备结构并不构成对横向联邦学习***优化设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the device structure shown in FIG. 1 does not constitute a limitation on the optimization device of the horizontal federated learning system, and may include more or fewer components than shown in the figure, or a combination of certain components, or different components. Component arrangement.
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及横向联邦学习***优化程序。其中,操作***是管理和控制设备硬件和软件资源的程序,支持横向联邦学习***优化程序以及其它软件或程序的运行。As shown in FIG. 1, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a horizontal federated learning system optimization program. Among them, the operating system is a program that manages and controls the hardware and software resources of the device, and supports the operation of the optimization program of the horizontal federated learning system and other software or programs.
当横向联邦学习***优化设备是参与横向联邦学习的协调设备时,在图1所示的设备中,用户接口1003主要用于与客户端进行数据通信;网络接口1004主要用于与参与横向联邦学习的参与设备建立通信连接;处理器1001可以用于调用存储器1005中存储的横向联邦学习***优化程序,并执行以下操作:When the horizontal federated learning system optimization device is a coordination device that participates in horizontal federated learning, in the device shown in Figure 1, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used for participating in horizontal federated learning Participating devices of Establish a communication connection; the processor 1001 can be used to call the horizontal federated learning system optimization program stored in the memory 1005, and perform the following operations:
获取参与横向联邦学习的各参与设备的设备资源信息;Obtain device resource information of each participating device participating in horizontal federated learning;
根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;Configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, the calculation task parameters including the estimated processing time step and/or the estimated processing batch size;
将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。The calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
进一步地,所述根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数的步骤包括:Further, the step of configuring the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource information of each device includes:
根据各所述设备资源信息将各所述参与设备进行分类,确定各所述参与设备分别所属的资源类别;Classify each participating device according to the device resource information, and determine the resource category to which each participating device belongs;
根据各所述参与设备所属的资源类别,分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数。According to the resource category to which each participating device belongs, the calculation task parameters in the training process of the federated learning model corresponding to each participating device are respectively configured.
进一步地,所述根据各所述参与设备所属的资源类别,分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数的步骤包括:Further, the step of configuring the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource category to which each participating device belongs includes:
根据各所述参与设备所属的资源类别,分别确定各所述参与设备对应的候选任务参数;Respectively determine candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs;
基于所述候选任务参数分别确定各所述参与设备对应的预计处理时长,并检测各所述预计处理时长之间是否满足预设时长一致条件;Respectively determine the expected processing duration corresponding to each of the participating devices based on the candidate task parameters, and detect whether each of the expected processing durations meets a preset duration consistency condition;
若各所述预计处理时长之间满足所述预设时长一致条件,则将各所述参与设备的候选任务参数对应作为各所述参与设备的计算任务参数。If the expected processing durations meet the preset duration consistency condition, the candidate task parameters of each participating device are correspondingly used as the computing task parameters of each participating device.
进一步地,当横向联邦学习中的待训练模型是循环神经网络模型,所述计算任务参数包括预计处理时间步长时,所述将所述计算任务参数对应发送给各所述参与设备的步骤之后,处理器1001可以用于调用存储器1005中存储的横向联邦学习***优化程序,还执行以下操作:Further, when the model to be trained in the horizontal federated learning is a cyclic neural network model, and the calculation task parameters include the estimated processing time step, after the step of correspondingly sending the calculation task parameters to each of the participating devices , The processor 1001 may be used to call the horizontal federated learning system optimization program stored in the memory 1005, and also perform the following operations:
根据各所述参与设备分别对应的预计处理时间步长,配置各所述参与设备分别对应的时间步选取策略;According to the estimated processing time step corresponding to each of the participating devices, configure a time step selection strategy corresponding to each of the participating devices;
将所述时间步选取策略对应发送给各所述参与设备,以供各所述参与设备根据各自的所述时间步选取策略从各自的序列数据中选取得到序列选取数据,并根据所述序列选取数据执行联邦学习任务,其中,所述序列选取数据的时间步长小于或等于各所述参与设备各自的所述预计处理时间步长。The time step selection strategy is correspondingly sent to each of the participating devices, so that each of the participating devices selects sequence selection data from their respective sequence data according to their respective time step selection strategies, and selects sequence selection data according to the sequence The data performs a federated learning task, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
进一步地,当所述计算任务参数包括预计处理批大小时,所述将所述计算任务参数对应发送给各所述参与设备的步骤之后,处理器1001可以用于调用存储器1005中存储的横向联邦学习***优化程序,还执行以下操作:Further, when the calculation task parameters include the estimated processing batch size, after the step of correspondingly sending the calculation task parameters to each of the participating devices, the processor 1001 may be used to call the horizontal federation stored in the memory 1005 Learn system optimization procedures, and also perform the following operations:
根据各所述参与设备分别对应的预计处理批大小,配置各所述参与设备分别对应的学习率;According to the estimated processing batch size corresponding to each of the participating devices, configure the learning rate corresponding to each of the participating devices;
将所述学习率对应发送给各所述参与设备,以供各所述参与设备根据各自的所述学习率和从所述协调设备接收到的所述预计处理批大小执行联邦学习任务。The learning rate is correspondingly sent to each participating device, so that each participating device executes a federated learning task according to the respective learning rate and the estimated processing batch size received from the coordination device.
进一步地,所述将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务的步骤包括:Further, the step of correspondingly sending the calculation task parameters to each of the participating devices so that each of the participating devices can execute a federated learning task according to the respective calculation task parameters includes:
将所述计算任务参数对应发送给各所述参与设备,以及将本轮全局模型更新的预计时长发送给各所述参与设备,以供各所述参与设备在根据所述计算任务参数进行本地模型训练时,按照所述预计时长调整本地模型训练的次数。The calculation task parameters are correspondingly sent to each of the participating devices, and the estimated duration of the current round of global model update is sent to each of the participating devices, so that each of the participating devices can perform a local model based on the calculation task parameters. During training, the number of local model training is adjusted according to the estimated duration.
进一步地,所述获取参与横向联邦学习的各参与设备的设备资源信息的步骤包括:Further, the step of obtaining device resource information of each participating device participating in horizontal federated learning includes:
接收参与横向联邦学习的各参与设备发送的设备资源信息,其中,所述设备资源信息至少包括电量资源信息、计算资源信息和通信资源信息中的一项或多项。Receive device resource information sent by each participating device participating in horizontal federated learning, where the device resource information includes at least one or more of power resource information, computing resource information, and communication resource information.
基于上述的结构,提出横向联邦学习***优化方法的各实施例。Based on the above structure, various embodiments of the optimization method of the horizontal federated learning system are proposed.
参照图2,图2为本申请横向联邦学习***优化方法第一实施例的流程示意图。需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a method for optimizing a horizontal federated learning system according to this application. It should be noted that although the logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.
在本实施例中,横向联邦学习***优化方法应用于参与横向联邦学习的协调设备,协调设备和参与横向联邦学习的各个参与设备可以是智能手机、个人计算机和服务器等设备。在本实施例中,横向联邦学习***优化方法包括:In this embodiment, the method for optimizing the horizontal federated learning system is applied to the coordination device participating in the horizontal federated learning. The coordination device and each participating device participating in the horizontal federated learning may be devices such as smart phones, personal computers, and servers. In this embodiment, the optimization method of the horizontal federated learning system includes:
步骤S10,获取参与横向联邦学习的各参与设备的设备资源信息;Step S10: Obtain device resource information of each participating device participating in horizontal federated learning;
在本实施例中,协调设备与各参与设备可通过询问握手认证、身份认证预先建立通信连接,并确定本次联邦学习的待训练模型,如可以是神经网络模型或其他机器学习模型。可以是各个参与设备本地都构建相同或相似结构的该待训练模型,也可以是由协调设备构建待训练模型后发送给各个参与设备。各个参与设备本地拥有用于训练待训练模型的训练数据。In this embodiment, the coordination device and each participating device can establish a communication connection in advance through inquiry handshake authentication and identity authentication, and determine the model to be trained for this federated learning, such as a neural network model or other machine learning models. It may be that each participating device locally constructs the model to be trained with the same or similar structure, or the coordination device constructs the model to be trained and then sends it to each participating device. Each participating device locally has training data for training the model to be trained.
在横向联邦学***均,得到全局模型参数更新,并发送给各个参与设备;各个参与设备采用全局模型参数更新来更新本地的待训练模型的模型参数,即对本地的待训练模型进行模型更新,至此完成一次全局模型更新。每一次全局模型更新后,各个参与设备本地的待训练模型的模型参数是同步的。In horizontal federated learning, the coordination device and participating devices cooperate with each other to perform multiple global model updates to the training model. Model update refers to updating the model parameters of the model to be trained, such as the connection weight value between neurons in the neural network. Finally, a model that meets the quality requirements is obtained. In a global model update, each participating device uses their local training data to perform local training on the local model to be trained to obtain local model parameter updates. The local model parameter updates can be gradient information used to update model parameters or local Updated model parameters; each participating device sends its own local model parameter update to the coordination device; the coordination device merges each local model parameter update, such as weighted average, to obtain the global model parameter update, and send it to each participating device; Each participating device uses the global model parameter update to update the model parameters of the local model to be trained, that is, to update the local model to be trained, and so far complete a global model update. After each global model update, the model parameters of the model to be trained locally on each participating device are synchronized.
在联邦学习过程中,协调设备可以获取各个参与设备的设备资源信息。其中,设备资源信息可以是参与设备中与计算效率和通信效率有关的资源信息,例如,计算资源信息、电量资源信息和通信资源信息等,计算资源可以通过参与设备拥有的CPU和GPU个数来表示,电量资源可以通过参与设备能够继续工作的时间来表示,通信资源可采用参与设备的通信速率来表示。可以是协调设备向各个参与设备发送设备资源查询请求,各个参与设备在接收到该请求后,向协调设备上传各自当前的设备资源信息。In the process of federated learning, the coordinating device can obtain the device resource information of each participating device. Among them, the device resource information can be resource information related to computing efficiency and communication efficiency in the participating device, for example, computing resource information, power resource information, and communication resource information. The computing resource can be measured by the number of CPUs and GPUs owned by the participating device. Indicates that the power resource can be represented by the time that the participating device can continue to work, and the communication resource can be represented by the communication rate of the participating device. It may be that the coordination device sends a device resource query request to each participating device, and each participating device uploads its current device resource information to the coordination device after receiving the request.
步骤S20,根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;Step S20: Configure calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size;
协调设备可以在进行第一次全局模型更新之前,获取各个参与设备的设备资源信息,以为各参与设备配置后续全局模型更新中的计算任务参数;或者是,在进入某一次或每一次全局模型更新之前,获取各个参与设备的设备资源信息,以为各参与设备配置当次全局模型更新中的计算任务参数。也即,协调设备可以通过获取各个参与设备的设备资源信息,在联邦学习模型训练过程中,为各个参与设备配置其参与联邦学习模型训练的计算任务参数。The coordinating device can obtain the device resource information of each participating device before the first global model update, so as to configure the calculation task parameters in the subsequent global model update for each participating device; or, after entering a certain or every global model update Previously, the device resource information of each participating device was obtained to configure the calculation task parameters in the current global model update for each participating device. That is, the coordination device can obtain the device resource information of each participating device and configure the computing task parameters for each participating device to participate in the training of the federated learning model during the training process of the federated learning model.
其中,计算任务参数包括预计处理时间步长和/或预计处理批大小,也即,计算任务参数包括预计处理时间步长,或者包括预计处理批大小,或者包括预计处理时间步长和预计处理批大小。时间步长是指时间步的个数,时间步是循环神经网络模型中的概念,是针对序列数据而言的,预计处理时间步长是指参与设备预计要处理的时间步个数。批大小(mini-batch size)是指每次用于模型更新的数据批的大小,预计处理批大小是指参与设备进行本地模型更新时预计要采用的数据批大小。Among them, the calculation task parameters include the estimated processing time step and/or the estimated processing batch size, that is, the calculation task parameters include the estimated processing time step, or the estimated processing batch size, or the estimated processing time step and the estimated processing batch size. The time step refers to the number of time steps. The time step is a concept in the cyclic neural network model and is for sequence data. The estimated processing time step refers to the number of time steps that the participating equipment is expected to process. The mini-batch size refers to the size of the data batch used for model update each time, and the expected processing batch size refers to the data batch size expected to be used when the participating devices perform local model updates.
协调设备在获取到各个参与设备的设备资源信息后,根据各个设备资源信息,分别为各个参与设备配置计算任务参数。具体地,参与设备的计算资源越丰富,计算效率就越高,电量资源越丰富,能够持续参与联邦学习的时间就越长,通信资源越丰富(通信带宽大,通信延迟短),传输数据的效率就越高。那么,对于相同的计算任务,设备资源信息越丰富的设备,花费的时间就越少。配置计算任务参数的原理是给设备资源较丰富的参与设备配置较多的计算任务,给设备资源较少的参与设备配置较少的计算任务,以尽量让各个参与设备的处理时间相同。计算任务的多少可以由预计处理时间步长或预计处理批大小来量化;预计处理时间步长越多或预计处理批大小越大,都表示参与设备要处理的数据量越多,从而,预计处理时间步长越多,表示计算任务越多,预计处理批大小越大,表示计算任务越多。需要说明的是,由于参与设备本地上传的是梯度信息或模型参数信息,上传的数据在数据量上,不会因为批大小或时间步大小不同而不同,而如果参与设备的通信资源比较丰富,那么说明参与设备的数据传输速度快,在上传数据上需要花费的时间比较少,可以花费较多时间在本地模型训练上,因此协调设备可以给通信资源较丰富的参与设备分配较多的计算任务,即配置较多的预计处理时间步长或预计处理批大小。After obtaining the device resource information of each participating device, the coordinating device configures calculation task parameters for each participating device according to the respective device resource information. Specifically, the richer the computing resources of the participating devices, the higher the computing efficiency, the richer the power resources, the longer the time that they can continue to participate in federated learning, the richer the communication resources (large communication bandwidth, short communication delay), and data transmission. The higher the efficiency. Then, for the same computing task, a device with richer device resource information will take less time. The principle of configuring computing task parameters is to configure more computing tasks for participating devices with richer device resources, and configure fewer computing tasks for participating devices with fewer device resources, so as to make the processing time of each participating device the same as possible. The number of calculation tasks can be quantified by the estimated processing time step or the estimated processing batch size; the larger the estimated processing time step or the larger the estimated processing batch size, the more data the participating equipment has to process, and the expected processing The more time steps, the more computing tasks, and the larger the expected processing batch size, the more computing tasks. It should be noted that because the participating device locally uploads gradient information or model parameter information, the amount of data uploaded will not be different due to the batch size or time step size. If the communication resources of the participating device are relatively abundant, Then it shows that the data transmission speed of the participating devices is fast, and the time required to upload data is relatively small, and more time can be spent on local model training. Therefore, the coordination device can allocate more computing tasks to the participating devices with richer communication resources. , That is, configure more expected processing time steps or expected processing batch sizes.
协调设备根据参与设备的设备资源信息来配置计算任务参数的方式有多种。例如,可以是预先设置了设备资源信息与计算任务参数之间的对应关系;将设备资源信息按照数值划分为几个分段,各个分段表示的资源丰富程度不同,例如,当设备资源信息是CPU个数时,可以将CPU个数分为几个分段,CPU个数越多的分段,表示技术资源越丰富;预先设置每个分段对应的计算任务参数,即对应关系,所表示的资源丰富程度越多的分段,对应设置的计算任务参数所限定的计算任务就越多;协调设备判断参与设备的设备资源信息落入哪一个分段,就对应给该参与设备配置哪一个分段对应的计算任务参数。There are many ways for the coordination device to configure the calculation task parameters according to the device resource information of the participating devices. For example, the corresponding relationship between the device resource information and the computing task parameters can be preset; the device resource information is divided into several segments according to the value, and each segment represents a different degree of resource richness. For example, when the device resource information is When the number of CPUs is used, the number of CPUs can be divided into several segments. The more segments the CPU number, the richer the technical resources; the calculation task parameters corresponding to each segment are set in advance, that is, the corresponding relationship. The more resource-rich segments, the more computing tasks defined by the corresponding set of computing task parameters; the coordination device determines which segment the device resource information of the participating device falls into, which corresponds to which one is configured for the participating device The calculation task parameters corresponding to the segment.
步骤S30,将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。In step S30, the calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices can execute a federated learning task according to the respective calculation task parameters.
协调设备将各个参与设备的计算任务参数对应发送给各个参与设备。参与设备在接收到计算任务参数后,采用接收到的该计算任务参数执行联邦学习任务。具体地,当协调设备发送的是后续每一次全局模型更新中要使用的计算任务参数时,参与设备基于该计算任务参数参与后续的每一次全局模型更新,从而完成联邦学习任务;当协调设备发送的一次全局模型更新中要使用的计算任务参数时,参与设备基于该计算任务参数参与该次全局模型更新。The coordination device correspondingly sends the calculation task parameters of each participating device to each participating device. After the participating device receives the calculation task parameter, it uses the received calculation task parameter to execute the federated learning task. Specifically, when the coordination device sends the calculation task parameters to be used in each subsequent global model update, the participating device participates in each subsequent global model update based on the calculation task parameters, thereby completing the federated learning task; when the coordination device sends When calculating task parameters to be used in a global model update of the, the participating device participates in the global model update based on the calculation task parameters.
以参与设备基于计算任务参数参与一次全局模型更新为例:Take the participating equipment participating in a global model update based on the calculation task parameters as an example:
当待训练模型是循环神经网络,计算任务参数是预计处理时间步长时,参与设备从本地用于进行模型训练的序列数据中,选取该预计处理时间步长的数据(选取出的数据称为序列选取数据);具体地,对于每一条由多个时间步数据构成的序列数据,参与设备从序列数据中选取一部分时间步数据作为序列选取数据,这一部分时间步数据的时间步长可以是小于或等于预计处理时间步长的,其中,当序列数据的时间步长小于或等于该预计处理时间步长时,则不对序列数据进行选取,直接采用该序列数据作为序列选取数据;例如,对于一条有32个时间步数据的序列数据,预计处理时间步长为15,则参与设备可以从该条序列数据中选取出15个时间步数据作为序列选取数据;选取的方式有多种,例如间隔一个选取一个,或者随机选取15个等方式,根据具体的模型应用场景可以设置不同的选取方式;此外,各次参与全局模型参更新时所采用的选取方式可以不同,从而保证本地的数据大都能够被用于参与模型训练;基于选取得到的时间步长都小于或等于预计处理时间步长的各条序列选取数据,对待训练模型进行一次或多次本地模型训练,得到本地模型参数更新,进行本地模型训练的过程与循环神经网络的一般训练过程一致,在此不进行详细赘述;将本地模型参数更新发送给协调设备,协调设备融合各个参与设备的本地模型参数更新得到全局模型参数更新,并发送给各个参与设备;参与设备接收到全局模型参数更新后,采用全局模型参数更新来更新本地的待训练模型的模型参数。When the model to be trained is a cyclic neural network and the calculation task parameter is the estimated processing time step, the participating device selects the data of the estimated processing time step from the local sequence data used for model training (the selected data is called Sequence selection data); Specifically, for each sequence data composed of multiple time step data, the participating device selects a part of the time step data from the sequence data as the sequence selection data, and the time step length of this part of the time step data can be less than Or equal to the expected processing time step, where, when the time step of the sequence data is less than or equal to the expected processing time step, the sequence data is not selected, and the sequence data is directly used as the sequence selection data; for example, for a There are sequence data with 32 time step data, and the processing time step is expected to be 15, then the participating device can select 15 time step data from the sequence data as the sequence selection data; there are many ways to select, such as one interval Choose one, or choose 15 randomly, etc. Different selection methods can be set according to specific model application scenarios; in addition, the selection methods used when participating in the global model parameter update can be different, so as to ensure that most of the local data can be Used to participate in model training; select data based on each sequence whose selected time step is less than or equal to the estimated processing time step, and perform one or more local model training on the training model to obtain local model parameter updates and perform local model The training process is the same as the general training process of the recurrent neural network, and will not be described in detail here; the local model parameter update is sent to the coordination device, and the coordination device merges the local model parameter update of each participating device to obtain the global model parameter update, and sends it to Each participating device; after the participating device receives the global model parameter update, it uses the global model parameter update to update the model parameters of the local model to be trained.
当待训练模型是某种神经网络模型或其他机器学习模型,计算任务参数是预计处理批大小时,参与设备可以将本地的多条训练数据划分为多个批,每个批的大小,也即每个批所包含的训练数据条数,小于或者等于从协调设备接收到的该预计处理批大小;例如,参与设备接收到的预计处理批大小为100条,参与设备本地有1000条训练数据,则参与设备可将本地训练数据划分为10批;参与设备在根据预计处理批大小对本地训练数据进行分批后,在参与一次全局模型更新的过程中,采用一个批的数据来进行本地模型更新。When the model to be trained is a certain neural network model or other machine learning model, and the calculation task parameter is the expected processing batch size, the participating equipment can divide multiple pieces of local training data into multiple batches, and the size of each batch is The number of pieces of training data contained in each batch is less than or equal to the expected processing batch size received from the coordination device; for example, the expected processing batch size received by the participating device is 100 pieces, and the participating device has 1000 pieces of training data locally. The participating device can divide the local training data into 10 batches; after the participating device divides the local training data into batches according to the expected processing batch size, in the process of participating in a global model update, one batch of data is used to update the local model. .
当待训练模型是循环神经网络,计算任务参数是预计处理时间步长和预计处理批大小时,参与设备结合上述两种情况中的操作,对每一条训练数据都进行时间步选取,并对选取得到的各条序列选取数据进行分批,每次采用一个批的序列选取数据来参与全局模型更新。When the model to be trained is a cyclic neural network and the calculation task parameters are the estimated processing time step and the estimated processing batch size, the participating equipment combines the operations in the above two cases to select the time step for each piece of training data, and select Each obtained sequence selection data is divided into batches, and each batch of sequence selection data is used to participate in the global model update.
可以理解的是,由于循环神经网络中不同时间步对应的神经网络节点是共享权重的,所以循环神经网络的输入数据可以是变长的,也即,输入数据的时间步长可以是不同的,从而能够支持各个参与设备基于不同的预计处理时间步长来分别进行本地模型训练。It is understandable that since the neural network nodes corresponding to different time steps in the recurrent neural network share weights, the input data of the recurrent neural network can be variable, that is, the time step of the input data can be different, This can support each participating device to perform local model training based on different estimated processing time steps.
在本实施例中,通过协调设备获取各个参与设备的设备资源信息;根据各个参与设备的设备资源信息为各个参与设备分别配置计算任务参数,计算任务参数包括预计处理时间步长和/或预计处理批大小;将各个参与设备的计算任务参数对应发送给各个参与设备,供各个参与设备根据计算任务参数来执行联邦学习任务。在本实施例中,通过协调设备为各个参与设备配置计算任务参数,来协调各个参与设备本地要处理的计算任务,其中,计算任务参数包括预计处理时间步长和/或预计处理批大小,也即通过给参与设备配置预计处理的时间步长和/或预计处理的批大小的方式,来协调各个参与设备本地要处理的计算任务的多少;并通过根据各个参与设备的设备资源信息,给各个参与设备配置不同的计算任务参数,考虑到了各个参与设备在设备资源情况上的差异性;对于设备资源较丰富的参与设备,给其分配较多的计算任务,对于设备资源较欠缺的参与设备,给其分配较少的计算任务,使得设备资源较欠缺的参与设备也能够快速地完成本地模型参数更新,而无需计算资源较为丰富的参与设备花费时间等待,从而提高了各个参与设备进行横向联邦学习的整体效率,同时,又能够运用到各个参与设备本地数据对模型训练的贡献,包括能够利用资源较为欠缺的参与设备所带来的贡献,更加加大模型的稳定性,从而实现了对横向联邦学习***的效率和模型性能的优化。In this embodiment, the device resource information of each participating device is obtained through the coordination device; the calculation task parameters are configured for each participating device according to the device resource information of each participating device, and the calculation task parameters include the estimated processing time step and/or the estimated processing Batch size: The calculation task parameters of each participating device are correspondingly sent to each participating device, so that each participating device can perform a federated learning task according to the calculation task parameters. In this embodiment, the computing task parameters to be processed locally by each participating device are coordinated by configuring the computing task parameters for each participating device by the coordination device. The computing task parameters include the expected processing time step and/or the expected processing batch size, and also That is, by configuring the participating devices with the expected processing time step and/or the expected processing batch size, the number of computing tasks to be processed locally by each participating device is coordinated; and the device resource information of each participating device is assigned to each Participating equipment is configured with different computing task parameters, taking into account the differences in equipment resource conditions of each participating equipment; for participating equipment with richer equipment resources, more computing tasks are allocated to it, and for participating equipment with less equipment resources, Assign less computing tasks to it, so that participating devices with less equipment resources can quickly complete local model parameter updates without the need for participating devices with more computing resources to spend time waiting, thereby improving the horizontal federated learning of each participating device At the same time, it can also use the contribution of the local data of each participating device to model training, including the ability to use the contribution of the participating devices with less resources, and increase the stability of the model, thereby achieving a horizontal federation Optimization of learning system efficiency and model performance.
进一步地,基于上述第一实施例,提出本申请横向联邦学习***优化方法第二实施例,在本实施例中,所述步骤S20包括:Further, based on the above-mentioned first embodiment, a second embodiment of the method for optimizing a horizontal federated learning system of the present application is proposed. In this embodiment, the step S20 includes:
步骤S201,根据各所述设备资源信息将各所述参与设备进行分类,确定各所述参与设备分别所属的资源类别;Step S201: Classify each of the participating devices according to each of the device resource information, and determine the resource category to which each of the participating devices belongs;
在本实施例中,提出一种可行的协调设备根据参与设备的设备资源信息来配置计算任务参数的方式。具体地,协调设备根据各个参与设备的设备资源信息,将各个参与设备进行分类,确定各个参与设备分别所述的资源类别。协调设备可以将各个设备资源信息按照数值大小进行排列,预先可以设置要分类的类别个数,然后对排序中的最小值和最大值所构成的区间进行等距划分,得到预设类别数的几个划分段,那么每个划分段就是一个类别,判断各个参与设备资源信息对应的数值是在哪一个划分段,就属于哪一个类别。例如,当设备资源信息包括计算资源信息,计算资源信息采用参与设备的CPU个数来表示时,可以将各个参与设备的CPU个数进行排列。可以理解的是,相比于预先设定各个资源类别的方式,本实施例中根据各个设备资源信息进行资源类别的确定,使得资源类别的划分更加符合参与设备的实际资源情况,能够适应参与设备在联邦学习过程中并不是一成不变的资源情况。In this embodiment, a feasible way for the coordination device to configure the calculation task parameters according to the device resource information of the participating devices is proposed. Specifically, the coordination device classifies each participating device according to the device resource information of each participating device, and determines the respective resource category of each participating device. The coordinating device can arrange the resource information of each device according to the numerical value. The number of categories to be classified can be set in advance, and then the interval formed by the minimum and maximum values in the sorting can be divided equally to obtain the number of preset categories. Each divided segment is a category, and the value corresponding to the resource information of each participating device is judged to belong to which category. For example, when the device resource information includes computing resource information, and the computing resource information is represented by the number of CPUs of participating devices, the number of CPUs of each participating device may be arranged. It is understandable that, compared to the way of pre-setting each resource category, the resource category is determined according to the resource information of each device in this embodiment, so that the division of the resource category is more in line with the actual resource situation of the participating device, and can be adapted to the participating device. The resource situation is not static in the process of federated learning.
当设备资源信息包括多类设备资源的数据时,可以将各类设备资源的数据进行归一化处理,让各类资源的数据之间能够进行计算和比较。归一化处理的方式可以采用常用的归一化方式,在此不进行详细赘述。例如,设备资源包括计算资源、电量资源和通信资源时,通过对计算资源的数据、电量资源的数据和通信资源的数据进行归一化处理,可以让计算资源、电量资源和通信资源之间可以进行计算和比较;对于每一个参与设备的设备资源信息,得到该参与设备的各类设备资源的归一化数据,预先可以根据各类资源对参与设备本地的计算效率的影响程度,设置各类资源的权重值,然后将各类设备资源的归一化数据进行加权平均,得到一个能够评价该参与设备总体资源丰富度的数值;协调设备基于计算得到的各个参与设备的该数值,再进行上述的排序、划分和分类操作。通过将各个参与设备的设备资源信息进行归一化后计算一个整体的资源丰富度数值,将复杂的设备资源信息进行量化,从而能够更加方便、准确地给各个参与设备配置计算任务参数;并通过将复杂的设备资源信息进行量化,能够实现对各个参与设备进行资源类别的划分,进而能够更加快速地给参与设备配置计算任务。When the equipment resource information includes data of multiple types of equipment resources, the data of various types of equipment resources can be normalized, so that the data of various types of resources can be calculated and compared. The normalization method can be a commonly used normalization method, which will not be described in detail here. For example, when device resources include computing resources, power resources, and communication resources, by normalizing computing resource data, power resource data, and communication resource data, the computing resources, power resources, and communication resources can be normalized. Perform calculations and comparisons; for the device resource information of each participating device, the normalized data of the various device resources of the participating device can be obtained, and various types can be set in advance according to the degree of influence of various resources on the local computing efficiency of the participating device The weight value of the resource, and then the normalized data of various equipment resources are weighted and averaged to obtain a value that can evaluate the overall resource richness of the participating equipment; the coordination equipment is based on the calculated value of each participating equipment, and then the above Sorting, division and classification operations. By normalizing the device resource information of each participating device and calculating an overall resource richness value, the complex device resource information is quantified, so that it is more convenient and accurate to configure the calculation task parameters for each participating device; and through The quantification of complex device resource information can realize the division of resource categories for each participating device, and thus can more quickly configure computing tasks for the participating devices.
步骤S202,根据各所述参与设备所属的资源类别,分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数。Step S202: According to the resource category to which each participating device belongs, respectively configure the calculation task parameters in the training process of the federated learning model corresponding to each participating device.
在确定各个参与设备所属的资源类别后,协调设备根据各参与设备所属的资源类别,分别配置各参与设备对应的计算任务参数。具体地,预先可以设置一个最大预计处理时间步长,并设置按照资源丰富度从低到高给各个资源类别进行编号:1、2、3…,然后根据该最大预计处理时间步长,计算每个类别所对应的预计处理时间步长,具体可将最大预计处理时间步长除以资源类别个数得到一个最小时间步长,然后将各个资源类别的编号分别乘以该最小时间步长,得到各个资源类别对应的预计处理时间步长。例如,最大预计处理时间步长为32,资源类别个数是4个,那么,从资源类别丰富度从低到高,4个资源类别对应的预计处理时间步长分别为:8、16、24和32。类似地,预先也可以设置一个最大预计处理批大小,然后根据该最大预计处理批大小计算每个类别所对应的预计处理批大小。After determining the resource category to which each participating device belongs, the coordination device configures the computing task parameters corresponding to each participating device according to the resource category to which each participating device belongs. Specifically, a maximum estimated processing time step can be set in advance, and each resource category can be numbered according to resource richness from low to high: 1, 2, 3..., and then according to the maximum estimated processing time step, each The estimated processing time step corresponding to each category can be specifically obtained by dividing the maximum estimated processing time step by the number of resource categories to obtain a minimum time step, and then multiply the number of each resource category by the minimum time step to obtain The estimated processing time step corresponding to each resource category. For example, if the maximum estimated processing time step is 32 and the number of resource types is 4, then, from low to high resource type richness, the estimated processing time steps corresponding to the four resource types are: 8, 16, 24 And 32. Similarly, a maximum expected processing batch size can be set in advance, and then the expected processing batch size corresponding to each category is calculated according to the maximum expected processing batch size.
进一步地,所述步骤S202包括:Further, the step S202 includes:
步骤S2021,根据各所述参与设备所属的资源类别,分别确定各所述参与设备对应的候选任务参数;Step S2021: Determine the candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs;
进一步地,协调设备可以根据各个参与设备所属的资源类别,分别确定各参与设备对应的候选任务参数。具体地,可以按照类似上述根据该最大预计处理时间步长,计算每个资源类别所对应的预计处理时间步长的方式,计算得到每个资源类别对应计算任务参数,进而根据各个参与设备所属的资源类别,确定各个参与设备对应的计算任务参数,先该计算任务参数作为候选任务参数。Further, the coordination device may determine the candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs. Specifically, the calculation task parameters corresponding to each resource category can be calculated in a manner similar to the above-mentioned method of calculating the estimated processing time step corresponding to each resource category based on the maximum estimated processing time step, and then the calculation task parameters corresponding to each participating device The resource category determines the calculation task parameter corresponding to each participating device, and first the calculation task parameter is used as the candidate task parameter.
步骤S2022,基于所述候选任务参数分别确定各所述参与设备对应的预计处理时长,并检测各所述预计处理时长之间是否满足预设时长一致条件;Step S2022: Determine the expected processing duration corresponding to each of the participating devices based on the candidate task parameters, and detect whether each of the expected processing durations meets a preset duration consistency condition;
协调设备可基于各个参与设备对应的候选任务参数,确定各个参与设备对应的预计处理时长,也即,确定各个参与设备按照各自的候选任务参数执行联邦学习任务所需要花费的时间,具体可以是参与设备按照候选任务参数参与下一次全局模型更新时,进行本地模型训练以及模型参数更新上传所需要的时间。The coordinating device can determine the estimated processing time corresponding to each participating device based on the candidate task parameters corresponding to each participating device, that is, determine the time required for each participating device to perform the federated learning task according to its candidate task parameters, which can be specifically participating When the device participates in the next global model update according to the candidate task parameters, the time required for local model training and model parameter update upload.
具体地,协调设备可以根据各个参与设备的设备资源信息,预估各个参与设备按照候选任务参数来进行本地模型训练和上传模型参数更新所需的时间。其中,预先可以根据测试或根据经验设置采用单位资源处理单位时间步长或单位批大小的单位时间,那么可以根据这个单位时间和参与设备实际所拥有的资源,以及候选任务参数中的预计处理时间步长或预计处理批大小,计算出该参与设备的预计处理时长,整个过程可以类似于单位乘以数量得到总量的原理。例如,预先根据经验设置参与设备拥有1个CPU,处理1个时间步长的数据时,在本地模型训练时需要花费x时长,上传模型参数更新时需要花费y时长,而某个参与设备拥有3个CPU,要处理10个时间步长,那么该参与设备的预计处理时长是(10x+10y)/3。Specifically, the coordination device may estimate the time required for each participating device to perform local model training and upload model parameter updates according to the candidate task parameters based on the device resource information of each participating device. Among them, the unit resource processing unit time step or unit batch size unit time can be set according to the test or experience in advance, then the unit time can be based on the unit time and the actual resources of the participating equipment, as well as the estimated processing time in the candidate task parameters. Step size or estimated processing batch size, calculate the estimated processing time of the participating equipment, the whole process can be similar to the principle of multiplying the unit by the quantity to get the total. For example, if the participating device has 1 CPU and processes data of 1 time step in advance based on experience, it will take x time to train the local model, and y time to upload the model parameter update, and a participating device has 3 If a CPU needs to process 10 time steps, then the estimated processing time of the participating device is (10x+10y)/3.
协调设备在计算得到得到各个参与设备对应的预计处理时长后,可以检测各个参与设备的预计处理时长是否满足预设时长一致条件。其中,可以根据尽量保证各个参与设备的预计处理时长一致的原理,预先设置该预设时长一致条件。例如,预设时长一致条件可以设置为各预计处理时长中最大值与最小值的差值需要小于一个设定的阈值,当协调设备检测到各个预计处理时长中的最大值与最小值之间的差值小于该阈值时,说明各个参与设备的预计处理时长满足预设时长一致条件,也即各个参与设备的预计处理时长大致相同。当协调设备检测到各个预计处理时长中的最大值与最小值之间的差值不小于该阈值时,说明各个参与设备的预计处理时长不满足预设时长一致条件,也即各个参与设备的预计处理时长之间的差别较大。After the coordination device obtains the estimated processing time corresponding to each participating device by calculation, it can detect whether the estimated processing time of each participating device meets the preset duration consistency condition. Among them, according to the principle of ensuring that the expected processing duration of each participating device is consistent as much as possible, the preset duration consistency condition can be preset. For example, the preset duration consistency condition can be set such that the difference between the maximum value and the minimum value of each estimated processing duration needs to be less than a set threshold. When the coordination device detects the difference between the maximum value and the minimum value of each estimated processing duration When the difference is less than the threshold, it means that the estimated processing time of each participating device meets the preset time-length consistency condition, that is, the estimated processing time of each participating device is approximately the same. When the coordinating device detects that the difference between the maximum value and the minimum value of each estimated processing time is not less than the threshold, it indicates that the estimated processing time of each participating device does not meet the preset duration consistency condition, that is, the expected processing time of each participating device The difference between the processing time is large.
步骤S2023,若各所述预计处理时长之间满足所述预设时长一致条件,则将各所述参与设备的候选任务参数对应作为各所述参与设备的计算任务参数。In step S2023, if the predetermined duration consistency condition is satisfied between the estimated processing durations, the candidate task parameters of each participating device are correspondingly used as the computing task parameters of each participating device.
若协调设备检测到各个预计处理时长满足预设时长一致条件,则可将各个参与设备的候选任务参数对应作为各个参与设备最终的计算任务参数。If the coordination device detects that each estimated processing time meets the preset duration consistency condition, the candidate task parameter of each participating device can be correspondingly used as the final calculation task parameter of each participating device.
若协调设备检测到各个预计处理时长不满足预设时长一致条件,则说明各个参与设备的预计处理时长之间的差别较大,此时,若各个参与设备基于对应的候选任务参数来执行后续的联邦学习任务,则在处理时间上会相差较大,一些参与设备可能需要等待另一些参与设备,例如,在一次全局模型更新过程中,一部分参与设备本地模型更新和模型更新参数上传所花费的时间比较短,而另一部分参与设备则花费的时间比较长,从而需要协调设备和花费时间比较短的参与设备等待,直到花费时间比较长的参与设备上传了模型参数更新,协调设备才能够进行融合得到全局模型参数更新,进而完成该次全局模型更新。因此,协调设备可以在检测到各个预计处理时长不满足预设时长一致条件时,在各个候选任务参数的基础上进行调整。例如,可以将预计处理时长最大的参与设备的候选任务参数调小,调小可以是将预计处理时间步长调小,或者是将预计处理批大小调小,还可以是将预计处理时间步长和预计处理批大小都调小。在各个候选任务参数的基础上进行调整后,基于调整后的各个候选任务参数,协调设备再次预估各个参与设备的预计处理时长,检测各个预计处理时长是否满足预设时长一致条件,如此循环,直到检测到满足预设时长一致条件为止。If the coordinating device detects that the expected processing time does not meet the preset duration consistency condition, it means that the expected processing time of each participating device has a large difference. At this time, if each participating device executes subsequent tasks based on the corresponding candidate task parameters For federated learning tasks, there will be a big difference in processing time. Some participating devices may need to wait for other participating devices. For example, during a global model update process, the time it takes for part of the participating devices to update the local model and upload the model update parameters. It is relatively short, and the other part of the participating equipment takes a long time, so it is necessary to coordinate the equipment and the participating equipment with a relatively short time to wait. Until the participating equipment with a relatively long time uploads the model parameter update, the coordination equipment can be integrated. The global model parameters are updated, and then the global model update is completed. Therefore, the coordination device can make adjustments on the basis of the parameters of each candidate task when detecting that the expected processing duration does not meet the preset duration consistency condition. For example, you can reduce the candidate task parameters of the participating equipment with the largest expected processing time. The reduction can be to reduce the expected processing time step, or to reduce the expected processing batch size, or to reduce the expected processing time step. And the expected processing batch size is reduced. After adjusting on the basis of each candidate task parameter, based on the adjusted each candidate task parameter, the coordination device again estimates the expected processing time of each participating device, and detects whether each expected processing time meets the preset duration consistency condition, and so on. Until it is detected that the preset duration coincidence condition is met.
在本实施例中,通过为各个参与设备配置能够使得各个参与设备的预计处理时长满足预设时长一致条件的计算任务参数,使得各个参与设备在预计处理时长上尽量一致,从而使得各个参与设备之间不需要互相等待过长时间,甚至是不需要等待,使得即使是设备资源情况较差的参与设备,也能够跟上设备资源丰富的参与设备的步伐,从而使得各个参与设备都能够参与到横向联邦学习中,从而实现提高横向联邦学习整体效率的同时,还能够利用到各个参与设备的数据所带来的贡献,即使是设备资源情况较差的参与设备,也能够利用其所拥有的的训练数据所带来的贡献。In this embodiment, by configuring each participating device with a calculation task parameter that enables the expected processing time of each participating device to meet the preset duration consistency condition, the expected processing time of each participating device is as consistent as possible, so that the expected processing time of each participating device is as consistent as possible. There is no need to wait too long for each other, or even wait, so that even participating equipment with poor equipment resources can keep up with the pace of participating equipment with rich equipment resources, so that all participating equipment can participate in the horizontal In federated learning, while improving the overall efficiency of horizontal federated learning, it can also make use of the contributions brought by the data of each participating device. Even the participating devices with poor device resources can also use the training they own. Contribution of data.
进一步地,基于上述第一和二实施例,提出本申请横向联邦学习***优化方法第三实施例。在本实施例中,所述步骤S30之后,还包括:Further, based on the above-mentioned first and second embodiments, a third embodiment of the optimization method of the horizontal federated learning system of the present application is proposed. In this embodiment, after the step S30, the method further includes:
步骤S40,根据各所述参与设备分别对应的预计处理时间步长,配置各所述参与设备分别对应的时间步选取策略;Step S40: Configure a time step selection strategy corresponding to each participating device according to the estimated processing time step corresponding to each participating device;
进一步地,当横向联邦学习中的待训练模型是循环神经网络模型,计算任务参数包括预计处理时间步长时,协调设备可以根据各个参与设备分别对应的预计处理时间步长,配置各个参与设备分别对应的时间步选取策略。其中,本申请各个实施例中所涉及的循环神经网络(RNN,Recurrent Neural Network)模型,可以是普通的RNN,还可以是深度RNN、LSTM(Long Short-Term Memory,长短期记忆网络)、GRU(Gated Recurrent Unit,门控循环单元)和IndRNN(Independently Recurrent Neural Network,独立循环神经网络)等。Further, when the model to be trained in the horizontal federated learning is a cyclic neural network model, and the calculation task parameters include the estimated processing time step, the coordination device can configure each participating device according to the estimated processing time step corresponding to each participating device. The corresponding time step selection strategy. Among them, the recurrent neural network (RNN, Recurrent The Neural Network model can be an ordinary RNN, or a deep RNN, LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit, gated recurrent unit) and IndRNN (Independently Recurrent) Neural Network, independent recurrent neural network) and so on.
时间步选取策略即从序列数据中选取出部分时间步数据的策略,协调设备为各个参与设备配置时间步选取策略的方法可以有多种,基于的原理是使得各个参与设备从序列数据中选取的时间步数据在序列数据中的位置尽量形成互补。例如,参与设备A和参与设备B的预计处理时间步长都为15,则协调设备为参与设备A配置的时间步选取策略可以是选取序列数据中单数序号的时间步数据,为参与设备B配置的时间步选取策略可以是选取序列数据中双数序号的时间步数据。通过为各个参与设备配置不同的时间步选取策略,甚至是能够形成互补的时间步选取策略,可以使得各个参与设备在进行本地模型训练时,所采用的序列数据在时间步上的分布不同,从而能够使得待训练模型从更多不同的序列数据上学习到特征,从而能够提升模型的泛化能力,即对于各种形式的新样本,都能够有较好的预测能力。The time step selection strategy is the strategy for selecting part of the time step data from the sequence data. There can be multiple methods for coordinating the equipment to configure the time step selection strategy for each participating device. The principle is that each participating device selects from the sequence data The position of the time step data in the sequence data is as complementary as possible. For example, if the estimated processing time steps of participating device A and participating device B are both 15, the time step selection strategy that the coordinating device configures for participating device A can be to select the time step data of the odd sequence number in the sequence data and configure it for participating device B The time step selection strategy can be to select the time step data of the even number in the sequence data. By configuring different time step selection strategies for each participating device, and even a complementary time step selection strategy can be formed, the sequence data used by each participating device during local model training can be distributed differently in time steps. The model to be trained can learn features from more different sequence data, thereby improving the generalization ability of the model, that is, it can have better predictive ability for various forms of new samples.
步骤S50,将所述时间步选取策略对应发送给各所述参与设备,以供各所述参与设备根据各自的所述时间步选取策略从各自的序列数据中选取得到序列选取数据,并根据所述序列选取数据执行联邦学习任务,其中,所述序列选取数据的时间步长小于或等于各所述参与设备各自的所述预计处理时间步长。Step S50, the time step selection strategy is correspondingly sent to each of the participating devices, so that each of the participating devices selects from their respective sequence data according to their respective time step selection strategies to obtain sequence selection data, and according to all The sequence selection data performs a federated learning task, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
协调设备将各个参与设备的时间步选取策略对应发送给各个参与设备。需要说明的是,协调设备可以将各个参与设备的时间步选取策略和预计处理时间步长一起发送给参与设备,也可以分开发送。参与设备在接收到预计处理时间步长和时间步选取策略后,根据该时间步选取策略,从本地的序列数据中选取出序列选取数据,该序列选取数据的时间步长小于或等于该预计处理时间步长。The coordination device correspondingly sends the time step selection strategy of each participating device to each participating device. It should be noted that the coordinating device may send the time step selection strategy of each participating device and the estimated processing time step to the participating device together, or separately. After receiving the estimated processing time step and the time step selection strategy, the participating equipment selects sequence selection data from the local sequence data according to the time step selection strategy, and the time step of the sequence selection data is less than or equal to the expected processing Time Step.
进一步地,在一实施例中,当计算任务参数包括预计处理批大小时,所述步骤S30之后,还包括:Further, in an embodiment, when the calculation task parameter includes the estimated processing batch size, after the step S30, the method further includes:
步骤S60,根据各所述参与设备分别对应的预计处理批大小,配置各所述参与设备分别对应的学习率;Step S60: Configure the learning rate corresponding to each participating device according to the estimated processing batch size corresponding to each participating device;
进一步地,当计算任务参数包括预计处理批大小时,协调设备可以根据各个参与设备分别对应的预计处理批大小,配置各个参与设备分别对应的学习率。其中,学习率是模型训练过程中的一个超参数,协调设备为各个参与设备配置学习率的方法有多种,基于的原理可以是预计处理批大小与学习率成正比。例如,协调设备可以设置一个基准学习率,若参与设备的预计处理批大小小于该基准学习率对应的批大小,则为该参与设备配置小于该基准学习率的学习率,若参与设备的预计处理批大小大于该基准学习率对应的批大小,则为该参与设备配置大于该基准学习率的学习率。Further, when the calculation task parameters include the estimated processing batch size, the coordination device may configure the learning rate corresponding to each participating device according to the estimated processing batch size corresponding to each participating device. Among them, the learning rate is a hyperparameter in the model training process. There are many methods for coordinating the equipment to configure the learning rate for each participating device. The principle can be based on the expected processing batch size to be proportional to the learning rate. For example, the coordination device can set a reference learning rate. If the expected processing batch size of the participating device is less than the batch size corresponding to the reference learning rate, configure the participating device with a learning rate less than the reference learning rate. If the expected processing of the participating device is If the batch size is greater than the batch size corresponding to the reference learning rate, the participating device is configured with a learning rate greater than the reference learning rate.
步骤S70,将所述学习率对应发送给各所述参与设备,以供各所述参与设备根据各自的所述学习率和从所述协调设备接收到的所述预计处理批大小执行联邦学习任务。Step S70: The learning rate is correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective learning rate and the estimated processing batch size received from the coordinating device .
协调设备在为各个参与设备配置学习率后,将各个参与设备的学习率对应发送给各个参与设备。需要说明的是,协调设备可以将各个参与设备的学习率和预计处理批大小一起发送给参与设备,也可以分开发送。参与设备在接收到学习率和预计处理批大小后,根据该学习率和该预计处理批大小执行联邦学习任务,如,在本地模型训练时,采用该预计处理批大小的数据批来进行模型训练,更新模型参数时采用该学习率来更新模型参数。After configuring the learning rate for each participating device, the coordinating device sends the learning rate of each participating device to each participating device correspondingly. It should be noted that the coordination device may send the learning rate of each participating device and the estimated processing batch size to the participating device together, or separately. After receiving the learning rate and the estimated processing batch size, the participating equipment executes the federated learning task according to the learning rate and the estimated processing batch size. For example, in the local model training, the data batch of the estimated processing batch size is used for model training , The learning rate is used to update the model parameters when updating the model parameters.
在本实施例中,通过协调设备基于各个参与设备的预计处理批大小,为各个参与设备配置学习率,使得协调设备能够在整体上把控各个参与设备的模型收敛速度,从而能够通过为各个参与设备设置不同的学习率,使得各个参与设备的模型收敛速度趋于一致,从而能够使得在联邦学习过程中,待训练模型能够较好地收敛。In this embodiment, the coordinating device configures the learning rate for each participating device based on the estimated processing batch size of each participating device, so that the coordinating device can control the model convergence speed of each participating device as a whole. The devices set different learning rates so that the model convergence speed of each participating device tends to be consistent, so that the model to be trained can converge better during the federated learning process.
需要说明的是,当计算任务参数包括预计处理时间步长和预计处理批大小时,上述协调设备配置各个参与设备的时间步选取策略和学习率的方案也可以结合实施。具体地,协调设备根据各个参与设备的预计处理时间步长配置各个参与设备的时间步选取策略,根据各个参与设备的预计处理批大小配置各个参与设备的学习率;协调设备将各个参与设备的时间步选取策略和学习率,以及计算任务参数对应发送给各个参与设备;参与设备根据根据该时间步选取策略,从本地的序列数据中选取出序列选取数据,采用预计处理批大小的序列选取数据进行本地模型训练,训练过程中采用接收到的学习率进行模型更新。It should be noted that when the calculation task parameters include the estimated processing time step and the estimated processing batch size, the aforementioned coordination device configuring the time step selection strategy and learning rate of each participating device can also be implemented in combination. Specifically, the coordinating device configures the time step selection strategy of each participating device according to the estimated processing time step of each participating device, and configures the learning rate of each participating device according to the expected processing batch size of each participating device; the coordination device adjusts the time of each participating device Step selection strategy and learning rate, as well as calculation task parameters are sent to each participating device correspondingly; the participating device selects sequence selection data from local sequence data according to the selection strategy according to the time step, and uses sequence selection data of the expected processing batch size to proceed. Local model training, the received learning rate is used to update the model during the training process.
进一步地,所述步骤S20包括:Further, the step S20 includes:
步骤S203,将所述计算任务参数对应发送给各所述参与设备,以及将本轮全局模型更新的预计时长发送给各所述参与设备,以供各所述参与设备在根据所述计算任务参数进行本地模型训练时,按照所述预计时长调整本地模型训练的次数。Step S203: Send the calculation task parameters to each of the participating devices correspondingly, and send the estimated duration of the current round of global model update to each of the participating devices, so that each of the participating devices can use the calculation task parameters according to the calculation task parameters. When performing local model training, the number of local model training is adjusted according to the estimated duration.
协调设备将各个参与设备的计算任务参数发送给各个参与设备,同时将本轮全局模型更新的预计时长发送给各个参与设备。其中,本轮全局模型更新的预计时长可以是根据各个参与设备的预计处理时长来确定的,如,将各个参与设备的预计处理时长中的最大值作为本轮全局模型更新的预计时长。参与设备在接收到计算任务参数和预计时长后,参与设备按照该计算任务参数来进行本地模型训练,并可按照该预计时长来调整本地模型训练的次数。具体地,参与设备可以计算在进行一次本地模型训练后,计算一次本地模型训练所花费的时长1,然后判断预计时长减去该时长1得到的结果是否大于时长1,若大于,则再进一次本地模型训练,即增加一次本地模型训练的次数;再计算该次本地模型训练所花费的时长2,然后判断预计时长减去该时长1和时长2得到的结果,是否大于时长2,若大于,则再进行一次本地模型训练;如此,直到检测到剩余的时长小于最近一次进行本地模型训练所花费的时长时,就不再进行本地模型训练,将本地模型训练得到的本地模型参数更新上传协调设备。也即,参与设备根据预计时长,判断剩余时长还够进行一次本地模型训练时,就增加一次本地模型训练。The coordinating device sends the calculation task parameters of each participating device to each participating device, and at the same time sends the estimated duration of the current round of global model update to each participating device. Wherein, the estimated duration of this round of global model update may be determined according to the estimated processing duration of each participating device, for example, the maximum of the estimated processing duration of each participating device is taken as the estimated duration of this round of global model update. After the participating device receives the calculation task parameters and the estimated duration, the participating device performs local model training according to the calculation task parameters, and can adjust the number of local model training according to the estimated duration. Specifically, the participating device can calculate the length of time 1 spent for a local model training after a local model training, and then determine whether the result of subtracting the expected time length from the time length 1 is greater than the time length 1, and if it is greater than the time length 1, it will do it again. Local model training, that is, increase the number of local model training; then calculate the duration 2 of the local model training, and then determine whether the result obtained by subtracting the duration 1 and duration 2 from the expected duration is greater than duration 2, and if it is greater than, Then perform another local model training; in this way, until it is detected that the remaining time is less than the time spent in the last local model training, the local model training is no longer performed, and the local model parameters obtained from the local model training are updated and uploaded to the coordination device . That is, when the participating device judges that the remaining time is sufficient for one local model training based on the estimated duration, it will add another local model training.
在本实施例中,通过协调设备发送本轮全局模型更新的预计时长给各个参与设备,使得各个参与设备实际进行本地模型训练的速度较快时,能够通过增加本地模型训练的次数,来避免白花费时间等待其他参与设备。In this embodiment, the coordinating device sends the estimated duration of this round of global model update to each participating device, so that when each participating device actually performs local model training faster, the number of local model training can be increased to avoid wasting Spend time waiting for other participating devices.
进一步地,所述步骤S10包括:Further, the step S10 includes:
步骤S101,接收参与横向联邦学习的各参与设备发送的设备资源信息,其中,所述设备资源信息至少包括电量资源信息、计算资源信息和通信资源信息中的一项或多项。Step S101: Receive device resource information sent by each participating device participating in horizontal federated learning, where the device resource information includes at least one or more of power resource information, computing resource information, and communication resource information.
进一步地,可以是各个参与设备主动给协调设备上传自己的设备资源信息。协调设备接收各个参与设备上传的设备资源信息。其中,设备资源信息可至少包括电量资源信息、计算资源信息和通信资源信息中的一项或多项。具体地,计算资源可以采用参与设备拥有的CPU和GPU个数来表示,电量资源可以采用参与设备能够继续工作的时间来表示,通信资源可采用参与设备的通信速率来表示。Further, it may be that each participating device actively uploads its own device resource information to the coordinating device. The coordination device receives the device resource information uploaded by each participating device. Wherein, the device resource information may at least include one or more of power resource information, computing resource information, and communication resource information. Specifically, computing resources can be represented by the number of CPUs and GPUs owned by the participating device, power resources can be represented by the time the participant can continue to work, and communication resources can be represented by the communication rate of the participating device.
进一步地,在一实施例中,各参与设备可以是拥有不同序列图像数据的遥感卫星,各遥感卫星之间利用各自的图像数据进行横向联邦学习,训练RNN完成气象预测任务。协调设备可以是其中一个遥感卫星,也可以是位于地面的基站。协调设备获取各个遥感卫星的设备资源信息,然后根据各个遥感卫星的设备资源信息为各个遥感卫星分别配置计算任务参数,计算任务参数包括预计处理时间步长和/或预计处理批大小;将各个遥感卫星的计算任务参数对应发送给各个遥感卫星,供各个遥感卫星根据计算任务参数来执行联邦学习任务,完成对RNN的训练。在得到训练完成的RNN后,各个遥感卫星可以将最近拍摄到的序列遥感图像数据输入该RNN,通过RNN预测得到接下来的天气状况。由于在RNN训练过程中,协调设备通过根据各个遥感卫星的设备资源信息来协调各个遥感卫星的计算任务,使得在训练过程中无需计算资源较为丰富的遥感卫星花费时间等待,从而提高了各个遥感卫星进行横向联邦学习的整体效率,从而能够加快气象预测RNN的部署。并且,训练过程中,能够运用到各个遥感卫星的数据对模型训练的贡献,包括能够利用资源较为欠缺的遥感卫星所带来的贡献,更加加大模型的稳定性,从而使得通过RNN预测得到的天气状况加结果的可信度更高。Further, in one embodiment, each participating device may be a remote sensing satellite with different sequence of image data, and each remote sensing satellite uses their respective image data to perform horizontal federated learning, and train RNN to complete the meteorological prediction task. The coordination device can be one of the remote sensing satellites or a base station located on the ground. The coordination equipment obtains the equipment resource information of each remote sensing satellite, and then configures the calculation task parameters for each remote sensing satellite according to the equipment resource information of each remote sensing satellite. The calculation task parameters include the estimated processing time step and/or the estimated processing batch size; The satellite's calculation task parameters are correspondingly sent to each remote sensing satellite, so that each remote sensing satellite can perform a federated learning task according to the calculation task parameters to complete the training of the RNN. After the trained RNN is obtained, each remote sensing satellite can input the recently captured sequence remote sensing image data into the RNN, and the next weather conditions can be predicted by the RNN. In the RNN training process, the coordination equipment coordinates the calculation tasks of each remote sensing satellite according to the equipment resource information of each remote sensing satellite, so that there is no need for remote sensing satellites with rich computing resources to spend time waiting during the training process, thereby improving each remote sensing satellite The overall efficiency of horizontal federated learning can speed up the deployment of weather forecasting RNNs. In addition, in the training process, the contribution of the data of each remote sensing satellite to the model training, including the contribution brought by the remote sensing satellites that can use less resources, increases the stability of the model, and makes the prediction obtained through RNN The weather conditions and the results are more reliable.
此外本申请实施例还提出一种横向联邦学习***优化装置,所述装置部署于参与横向联邦学习的协调设备,参照图3,所述装置包括:In addition, an embodiment of the present application also proposes a horizontal federated learning system optimization device, which is deployed on coordination equipment participating in horizontal federated learning. Referring to FIG. 3, the device includes:
获取模块10,用于获取参与横向联邦学习的各参与设备的设备资源信息;The obtaining module 10 is used to obtain the device resource information of each participating device participating in the horizontal federated learning;
配置模块20,用于根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;The configuration module 20 is configured to configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size;
发送模块30,用于将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。The sending module 30 is configured to correspondingly send the computing task parameters to each of the participating devices, so that each of the participating devices can execute a federated learning task according to the respective computing task parameters.
进一步地,所述配置模块20包括:Further, the configuration module 20 includes:
分类单元,用于根据各所述设备资源信息将各所述参与设备进行分类,确定各所述参与设备分别所属的资源类别;The classification unit is configured to classify each of the participating devices according to each of the device resource information, and determine the resource category to which each of the participating devices belongs;
配置单元,用于根据各所述参与设备所属的资源类别,分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数。The configuration unit is used to configure the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource category to which each participating device belongs.
进一步地,所述配置单元包括:Further, the configuration unit includes:
第一确定子单元,用于根据各所述参与设备所属的资源类别,分别确定各所述参与设备对应的候选任务参数;The first determining subunit is configured to determine the candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs;
检测子单元,用于基于所述候选任务参数分别确定各所述参与设备对应的预计处理时长,并检测各所述预计处理时长之间是否满足预设时长一致条件;The detection subunit is configured to respectively determine the expected processing duration corresponding to each of the participating devices based on the candidate task parameters, and to detect whether each of the expected processing durations meets a preset duration consistency condition;
第二确定子单元,用于若各所述预计处理时长之间满足所述预设时长一致条件,则将各所述参与设备的候选任务参数对应作为各所述参与设备的计算任务参数。The second determining subunit is configured to correspond to the candidate task parameter of each participating device as the calculation task parameter of each participating device if the preset duration consistency condition is satisfied between the expected processing durations.
进一步地,当横向联邦学习中的待训练模型是循环神经网络模型,所述计算任务参数包括预计处理时间步长时,所述配置模块20还用于根据各所述参与设备分别对应的预计处理时间步长,配置各所述参与设备分别对应的时间步选取策略;Further, when the model to be trained in the horizontal federated learning is a cyclic neural network model, and the calculation task parameters include an estimated processing time step, the configuration module 20 is further configured to perform processing according to the expected processing corresponding to each of the participating devices. Time step, configure the time step selection strategy corresponding to each of the participating devices;
所述发送模块30还用于将所述时间步选取策略对应发送给各所述参与设备,以供各所述参与设备根据各自的所述时间步选取策略从各自的序列数据中选取得到序列选取数据,并根据所述序列选取数据执行联邦学习任务,其中,所述序列选取数据的时间步长小于或等于各所述参与设备各自的所述预计处理时间步长。The sending module 30 is also configured to send the time step selection strategy to each of the participating devices correspondingly, so that each of the participating devices selects the sequence selection from their respective sequence data according to their respective time step selection strategies. Data, and perform a federated learning task according to the sequence selection data, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
进一步地,当所述计算任务参数包括预计处理批大小时,所述配置模块20还用于根据各所述参与设备分别对应的预计处理批大小,配置各所述参与设备分别对应的学习率;Further, when the calculation task parameters include the estimated processing batch size, the configuration module 20 is further configured to configure the respective learning rate of each participating device according to the estimated processing batch size corresponding to each participating device;
所述发送模块30还用于将所述学习率对应发送给各所述参与设备,以供各所述参与设备根据各自的所述学习率和从所述协调设备接收到的所述预计处理批大小执行联邦学习任务。The sending module 30 is also configured to send the learning rate to each of the participating devices correspondingly, so that each of the participating devices according to the respective learning rate and the expected processing batch received from the coordination device The size performs federated learning tasks.
进一步地,所述发送模块30还用于将所述计算任务参数对应发送给各所述参与设备,以及将本轮全局模型更新的预计时长发送给各所述参与设备,以供各所述参与设备在根据所述计算任务参数进行本地模型训练时,按照所述预计时长调整本地模型训练的次数。Further, the sending module 30 is also configured to send the calculation task parameters to each participating device correspondingly, and send the estimated duration of the current round of global model update to each participating device, so as to provide each participating device. When the device performs local model training according to the calculation task parameters, it adjusts the number of local model training according to the estimated duration.
进一步地,所述获取模块10还用于接收参与横向联邦学习的各参与设备发送的设备资源信息,其中,所述设备资源信息至少包括电量资源信息、计算资源信息和通信资源信息中的一项或多项。Further, the acquisition module 10 is also configured to receive device resource information sent by each participating device participating in horizontal federated learning, where the device resource information includes at least one of power resource information, computing resource information, and communication resource information. Or multiple.
本申请横向联邦学习***优化装置的具体实施方式的拓展内容与上述横向联邦学习***优化方法各实施例基本相同,在此不做赘述。The expanded content of the specific implementation of the horizontal federated learning system optimization device of this application is basically the same as each embodiment of the above-mentioned horizontal federated learning system optimization method, and will not be repeated here.
此外,本申请实施例还提出一种计算机可读存储介质,所述存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如下所述的横向联邦学习***优化方法的步骤。In addition, the embodiment of the present application also proposes a computer-readable storage medium, the storage medium stores a horizontal federated learning system optimization program, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation as described below Learn the steps of the system optimization method.
本申请横向联邦学习***优化设备和计算机可读存储介质的各实施例,均可参照本申请横向联邦学习***优化方法各实施例,此处不再赘述。For each embodiment of the horizontal federated learning system optimization device and computer-readable storage medium of the present application, reference may be made to the various embodiments of the horizontal federated learning system optimization method of the present application, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or device that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种横向联邦学习***优化方法,其中,所述方法应用于参与横向联邦学习的协调设备,所述横向联邦学习***优化方法包括以下步骤:A method for optimizing a horizontal federated learning system, wherein the method is applied to a coordination device participating in a horizontal federated learning, and the method for optimizing a horizontal federated learning system includes the following steps:
    获取参与横向联邦学习的各参与设备的设备资源信息;Obtain device resource information of each participating device participating in horizontal federated learning;
    根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;Configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, the calculation task parameters including the estimated processing time step and/or the estimated processing batch size;
    将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。The calculation task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
  2. 如权利要求1所述的横向联邦学习***优化方法,其中,所述根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数的步骤包括:The optimization method of a horizontal federated learning system according to claim 1, wherein the step of configuring the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the resource information of each device comprises:
    根据各所述设备资源信息将各所述参与设备进行分类,确定各所述参与设备分别所属的资源类别;Classify each participating device according to the device resource information, and determine the resource category to which each participating device belongs;
    根据各所述参与设备所属的资源类别,分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数。According to the resource category to which each participating device belongs, the calculation task parameters in the training process of the federated learning model corresponding to each participating device are respectively configured.
  3. 如权利要求2所述的横向联邦学习***优化方法,其中,所述根据各所述参与设备所属的资源类别,分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数的步骤包括:3. The method for optimizing a horizontal federated learning system according to claim 2, wherein the step of configuring the calculation task parameters in the training process of the federated learning model corresponding to each participating device according to the resource category to which each participating device belongs include:
    根据各所述参与设备所属的资源类别,分别确定各所述参与设备对应的候选任务参数;Respectively determine candidate task parameters corresponding to each participating device according to the resource category to which each participating device belongs;
    基于所述候选任务参数分别确定各所述参与设备对应的预计处理时长,并检测各所述预计处理时长之间是否满足预设时长一致条件;Respectively determine the expected processing duration corresponding to each of the participating devices based on the candidate task parameters, and detect whether each of the expected processing durations meets a preset duration consistency condition;
    若各所述预计处理时长之间满足所述预设时长一致条件,则将各所述参与设备的候选任务参数对应作为各所述参与设备的计算任务参数。If the expected processing durations meet the preset duration consistency condition, the candidate task parameters of each participating device are correspondingly used as the computing task parameters of each participating device.
  4. 如权利要求1所述的横向联邦学习***优化方法,其中,当横向联邦学习中的待训练模型是循环神经网络模型,所述计算任务参数包括预计处理时间步长时,所述将所述计算任务参数对应发送给各所述参与设备的步骤之后,还包括:The method for optimizing a horizontal federated learning system according to claim 1, wherein when the model to be trained in the horizontal federated learning is a recurrent neural network model, and the calculation task parameter includes an estimated processing time step, the calculation is performed After the task parameters are correspondingly sent to each of the participating devices, it also includes:
    根据各所述参与设备分别对应的预计处理时间步长,配置各所述参与设备分别对应的时间步选取策略;According to the estimated processing time step corresponding to each of the participating devices, configure a time step selection strategy corresponding to each of the participating devices;
    将所述时间步选取策略对应发送给各所述参与设备,以供各所述参与设备根据各自的所述时间步选取策略从各自的序列数据中选取得到序列选取数据,并根据所述序列选取数据执行联邦学习任务,其中,所述序列选取数据的时间步长小于或等于各所述参与设备各自的所述预计处理时间步长。The time step selection strategy is correspondingly sent to each of the participating devices, so that each of the participating devices selects sequence selection data from their respective sequence data according to their respective time step selection strategies, and selects sequence selection data according to the sequence The data performs a federated learning task, wherein the time step of the sequence selection data is less than or equal to the estimated processing time step of each of the participating devices.
  5. 如权利要求1所述的横向联邦学习***优化方法,其中,当所述计算任务参数包括预计处理批大小时,所述将所述计算任务参数对应发送给各所述参与设备的步骤之后,还包括:The optimization method of a horizontal federated learning system according to claim 1, wherein when the calculation task parameters include the estimated processing batch size, after the step of correspondingly sending the calculation task parameters to each of the participating devices, further include:
    根据各所述参与设备分别对应的预计处理批大小,配置各所述参与设备分别对应的学习率;According to the estimated processing batch size corresponding to each of the participating devices, configure the learning rate corresponding to each of the participating devices;
    将所述学习率对应发送给各所述参与设备,以供各所述参与设备根据各自的所述学习率和从所述协调设备接收到的所述预计处理批大小执行联邦学习任务。The learning rate is correspondingly sent to each participating device, so that each participating device executes a federated learning task according to the respective learning rate and the estimated processing batch size received from the coordination device.
  6. 如权利要求1所述的横向联邦学习***优化方法,其中,所述将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务的步骤包括:The optimization method for a horizontal federated learning system according to claim 1, wherein the said computing task parameters are correspondingly sent to each of the participating devices, so that each of the participating devices can execute the federation according to their respective computing task parameters. The steps of the learning task include:
    将所述计算任务参数对应发送给各所述参与设备,以及将本轮全局模型更新的预计时长发送给各所述参与设备,以供各所述参与设备在根据所述计算任务参数进行本地模型训练时,按照所述预计时长调整本地模型训练的次数。The calculation task parameters are correspondingly sent to each of the participating devices, and the estimated duration of the current round of global model update is sent to each of the participating devices, so that each of the participating devices can perform a local model based on the calculation task parameters. During training, the number of local model training is adjusted according to the estimated duration.
  7. 如权利要求1至6任一项所述的横向联邦学习***优化方法,其中,所述获取参与横向联邦学习的各参与设备的设备资源信息的步骤包括:The method for optimizing a horizontal federated learning system according to any one of claims 1 to 6, wherein the step of obtaining device resource information of each participating device participating in the horizontal federated learning comprises:
    接收参与横向联邦学习的各参与设备发送的设备资源信息,其中,所述设备资源信息至少包括电量资源信息、计算资源信息和通信资源信息中的一项或多项。Receive device resource information sent by each participating device participating in horizontal federated learning, where the device resource information includes at least one or more of power resource information, computing resource information, and communication resource information.
  8. 一种横向联邦学习***优化装置,其中,所述装置部署于参与横向联邦学习的协调设备,所述装置包括:An optimization device for a horizontal federated learning system, wherein the device is deployed in a coordination device participating in a horizontal federated learning, and the device includes:
    获取模块,用于获取参与横向联邦学习的各参与设备的设备资源信息;The acquisition module is used to acquire the device resource information of each participating device participating in the horizontal federated learning;
    配置模块,用于根据各所述设备资源信息分别配置各所述参与设备对应的联邦学习模型训练过程中的计算任务参数,所述计算任务参数包括预计处理时间步长和/或预计处理批大小;The configuration module is used to configure the calculation task parameters in the training process of the federated learning model corresponding to each of the participating devices according to the device resource information, and the calculation task parameters include the estimated processing time step and/or the estimated processing batch size ;
    发送模块,用于将所述计算任务参数对应发送给各所述参与设备,以供各所述参与设备根据各自的所述计算任务参数执行联邦学习任务。The sending module is configured to send the calculation task parameters to each of the participating devices correspondingly, so that each of the participating devices executes the federated learning task according to the respective calculation task parameters.
  9. 一种横向联邦学习***优化设备,其中,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如权利要求1所述的横向联邦学习***优化方法的步骤。A horizontal federated learning system optimization device, wherein the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system optimization program stored on the memory and running on the processor, so When the horizontal federated learning system optimization program is executed by the processor, the steps of the horizontal federated learning system optimization method according to claim 1 are realized.
  10. 一种横向联邦学习***优化设备,其中,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如权利要求2所述的横向联邦学习***优化方法的步骤。A horizontal federated learning system optimization device, wherein the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system optimization program stored on the memory and running on the processor, so When the horizontal federated learning system optimization program is executed by the processor, the steps of the horizontal federated learning system optimization method according to claim 2 are realized.
  11. 一种横向联邦学习***优化设备,其中,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如权利要求3所述的横向联邦学习***优化方法的步骤。A horizontal federated learning system optimization device, wherein the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system optimization program stored on the memory and running on the processor, so When the horizontal federated learning system optimization program is executed by the processor, the steps of the horizontal federated learning system optimization method according to claim 3 are realized.
  12. 一种横向联邦学习***优化设备,其中,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如权利要求4所述的横向联邦学习***优化方法的步骤。A horizontal federated learning system optimization device, wherein the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system optimization program stored on the memory and running on the processor, so When the horizontal federated learning system optimization program is executed by the processor, the steps of the horizontal federated learning system optimization method according to claim 4 are realized.
  13. 一种横向联邦学习***优化设备,其中,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如权利要求5所述的横向联邦学习***优化方法的步骤。A horizontal federated learning system optimization device, wherein the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system optimization program stored on the memory and running on the processor, so When the horizontal federated learning system optimization program is executed by the processor, the steps of the horizontal federated learning system optimization method according to claim 5 are realized.
  14. 一种横向联邦学习***优化设备,其中,所述横向联邦学习***优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的横向联邦学习***优化程序,所述横向联邦学习***优化程序被所述处理器执行时实现如权利要求6所述的横向联邦学习***优化方法的步骤。A horizontal federated learning system optimization device, wherein the horizontal federated learning system optimization device includes a memory, a processor, and a horizontal federated learning system optimization program stored on the memory and running on the processor, so When the horizontal federated learning system optimization program is executed by the processor, the steps of the horizontal federated learning system optimization method according to claim 6 are realized.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如权利要求1所述的横向联邦学习***优化方法的步骤。A computer-readable storage medium, wherein a horizontal federated learning system optimization program is stored on the computer-readable storage medium, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation according to claim 1. Learn the steps of the system optimization method.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如权利要求2所述的横向联邦学习***优化方法的步骤。A computer-readable storage medium, wherein a horizontal federated learning system optimization program is stored on the computer-readable storage medium, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation according to claim 2 Learn the steps of the system optimization method.
  17. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如权利要求3所述的横向联邦学习***优化方法的步骤。A computer-readable storage medium, wherein a horizontal federated learning system optimization program is stored on the computer-readable storage medium, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation according to claim 3. Learn the steps of the system optimization method.
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如权利要求4所述的横向联邦学习***优化方法的步骤。A computer-readable storage medium, wherein a horizontal federated learning system optimization program is stored on the computer-readable storage medium, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation according to claim 4 Learn the steps of the system optimization method.
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如权利要求5所述的横向联邦学习***优化方法的步骤。A computer-readable storage medium, wherein a horizontal federated learning system optimization program is stored on the computer-readable storage medium, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation according to claim 5 Learn the steps of the system optimization method.
  20. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有横向联邦学习***优化程序,所述横向联邦学习***优化程序被处理器执行时实现如权利要求6所述的横向联邦学习***优化方法的步骤。A computer-readable storage medium, wherein a horizontal federated learning system optimization program is stored on the computer-readable storage medium, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation according to claim 6 Learn the steps of the system optimization method.
PCT/CN2021/090825 2020-04-29 2021-04-29 Transverse federated learning system optimization method, apparatus and device, and readable storage medium WO2021219054A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010359198.8A CN111522669A (en) 2020-04-29 2020-04-29 Method, device and equipment for optimizing horizontal federated learning system and readable storage medium
CN202010359198.8 2020-04-29

Publications (1)

Publication Number Publication Date
WO2021219054A1 true WO2021219054A1 (en) 2021-11-04

Family

ID=71905586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090825 WO2021219054A1 (en) 2020-04-29 2021-04-29 Transverse federated learning system optimization method, apparatus and device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN111522669A (en)
WO (1) WO2021219054A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065864A (en) * 2021-11-19 2022-02-18 北京百度网讯科技有限公司 Federal learning method, federal learning device, electronic device, and storage medium
CN114465722A (en) * 2022-01-29 2022-05-10 深圳前海微众银行股份有限公司 Information processing method, apparatus, device, storage medium, and program product
CN114548429A (en) * 2022-04-27 2022-05-27 蓝象智联(杭州)科技有限公司 Safe and efficient transverse federated neural network model training method
CN114564731A (en) * 2022-02-28 2022-05-31 大连理工大学 Intelligent wind power plant wind condition prediction method based on transverse federal learning
CN114675965A (en) * 2022-03-10 2022-06-28 北京百度网讯科技有限公司 Federal learning method, apparatus, device and medium
CN114841368A (en) * 2022-04-22 2022-08-02 华南理工大学 Client selection optimization method and device for unstable federated learning scene
CN115037669A (en) * 2022-04-27 2022-09-09 东北大学 Cross-domain data transmission method based on federal learning
CN115277689A (en) * 2022-04-29 2022-11-01 国网天津市电力公司 Yun Bianwang network communication optimization method and system based on distributed federal learning
CN116015416A (en) * 2022-12-26 2023-04-25 北京鹏鹄物宇科技发展有限公司 Air-space-ground cooperative training method and architecture for flow detection
CN116033028A (en) * 2022-12-29 2023-04-28 江苏奥都智能科技有限公司 Hierarchical federal edge learning scheduling method and system applied to Internet of things
CN116384502A (en) * 2022-09-09 2023-07-04 京信数据科技有限公司 Method, device, equipment and medium for calculating contribution of participant value in federal learning
CN116681126A (en) * 2023-06-06 2023-09-01 重庆邮电大学空间通信研究院 Asynchronous weighted federation learning method capable of adapting to waiting time
CN117742928A (en) * 2024-02-20 2024-03-22 蓝象智联(杭州)科技有限公司 Algorithm component execution scheduling method for federal learning

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522669A (en) * 2020-04-29 2020-08-11 深圳前海微众银行股份有限公司 Method, device and equipment for optimizing horizontal federated learning system and readable storage medium
CN114095969A (en) * 2020-08-24 2022-02-25 华为技术有限公司 Intelligent wireless access network
CN112182102A (en) * 2020-09-23 2021-01-05 西安纸贵互联网科技有限公司 Method and device for processing data in federal learning, electronic equipment and storage medium
CN112016703B (en) * 2020-10-15 2021-02-09 北京瑞莱智慧科技有限公司 Conversion system and method of machine learning algorithm and electronic equipment
CN112650583B (en) * 2020-12-23 2024-07-02 新奥新智科技有限公司 Resource allocation method and device, readable medium and electronic equipment
CN113010305B (en) * 2021-02-08 2022-09-23 北京邮电大学 Federal learning system deployed in edge computing network and learning method thereof
US11755954B2 (en) 2021-03-11 2023-09-12 International Business Machines Corporation Scheduled federated learning for enhanced search
CN113222169B (en) * 2021-03-18 2023-06-23 中国地质大学(北京) Federal machine combination service method and system combining big data analysis feedback
CN113139341B (en) * 2021-04-23 2023-02-10 广东安恒电力科技有限公司 Electric quantity demand prediction method and system based on federal integrated learning
CN113391897B (en) * 2021-06-15 2023-04-07 电子科技大学 Heterogeneous scene-oriented federal learning training acceleration method
CN113360514B (en) * 2021-07-02 2022-05-17 支付宝(杭州)信息技术有限公司 Method, device and system for jointly updating model
CN114328432A (en) * 2021-12-02 2022-04-12 京信数据科技有限公司 Big data federal learning processing method and system
CN116264684A (en) * 2021-12-10 2023-06-16 华为技术有限公司 Artificial intelligence AI model training method and device in wireless network
CN114626615B (en) * 2022-03-21 2023-02-03 江苏仪化信息技术有限公司 Production process monitoring and management method and system
WO2024113092A1 (en) * 2022-11-28 2024-06-06 华为技术有限公司 Model training method, server, and client device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528230A (en) * 2015-12-23 2016-04-27 北京奇虎科技有限公司 Method and device for setting configuration parameters
CN108259555A (en) * 2017-11-30 2018-07-06 新华三大数据技术有限公司 The configuration method and device of parameter
CN110633805A (en) * 2019-09-26 2019-12-31 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium
CN110766169A (en) * 2019-10-31 2020-02-07 深圳前海微众银行股份有限公司 Transfer training optimization method and device for reinforcement learning, terminal and storage medium
CN111522669A (en) * 2020-04-29 2020-08-11 深圳前海微众银行股份有限公司 Method, device and equipment for optimizing horizontal federated learning system and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528230A (en) * 2015-12-23 2016-04-27 北京奇虎科技有限公司 Method and device for setting configuration parameters
CN108259555A (en) * 2017-11-30 2018-07-06 新华三大数据技术有限公司 The configuration method and device of parameter
CN110633805A (en) * 2019-09-26 2019-12-31 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium
CN110766169A (en) * 2019-10-31 2020-02-07 深圳前海微众银行股份有限公司 Transfer training optimization method and device for reinforcement learning, terminal and storage medium
CN111522669A (en) * 2020-04-29 2020-08-11 深圳前海微众银行股份有限公司 Method, device and equipment for optimizing horizontal federated learning system and readable storage medium

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065864A (en) * 2021-11-19 2022-02-18 北京百度网讯科技有限公司 Federal learning method, federal learning device, electronic device, and storage medium
CN114065864B (en) * 2021-11-19 2023-08-11 北京百度网讯科技有限公司 Federal learning method, federal learning device, electronic apparatus, and storage medium
CN114465722A (en) * 2022-01-29 2022-05-10 深圳前海微众银行股份有限公司 Information processing method, apparatus, device, storage medium, and program product
CN114465722B (en) * 2022-01-29 2024-04-02 深圳前海微众银行股份有限公司 Information processing method, apparatus, device, storage medium, and program product
CN114564731B (en) * 2022-02-28 2024-06-04 大连理工大学 Intelligent wind power plant wind condition prediction method based on transverse federal learning
CN114564731A (en) * 2022-02-28 2022-05-31 大连理工大学 Intelligent wind power plant wind condition prediction method based on transverse federal learning
CN114675965B (en) * 2022-03-10 2023-05-02 北京百度网讯科技有限公司 Federal learning method, apparatus, device and medium
CN114675965A (en) * 2022-03-10 2022-06-28 北京百度网讯科技有限公司 Federal learning method, apparatus, device and medium
CN114841368A (en) * 2022-04-22 2022-08-02 华南理工大学 Client selection optimization method and device for unstable federated learning scene
CN114841368B (en) * 2022-04-22 2024-05-28 华南理工大学 Client selection optimization method and device for unstable federal learning scene
CN114548429A (en) * 2022-04-27 2022-05-27 蓝象智联(杭州)科技有限公司 Safe and efficient transverse federated neural network model training method
CN115037669B (en) * 2022-04-27 2023-05-02 东北大学 Cross-domain data transmission method based on federal learning
CN115037669A (en) * 2022-04-27 2022-09-09 东北大学 Cross-domain data transmission method based on federal learning
CN115277689B (en) * 2022-04-29 2023-09-22 国网天津市电力公司 Cloud edge network communication optimization method and system based on distributed federal learning
CN115277689A (en) * 2022-04-29 2022-11-01 国网天津市电力公司 Yun Bianwang network communication optimization method and system based on distributed federal learning
CN116384502A (en) * 2022-09-09 2023-07-04 京信数据科技有限公司 Method, device, equipment and medium for calculating contribution of participant value in federal learning
CN116384502B (en) * 2022-09-09 2024-02-20 京信数据科技有限公司 Method, device, equipment and medium for calculating contribution of participant value in federal learning
CN116015416A (en) * 2022-12-26 2023-04-25 北京鹏鹄物宇科技发展有限公司 Air-space-ground cooperative training method and architecture for flow detection
CN116033028A (en) * 2022-12-29 2023-04-28 江苏奥都智能科技有限公司 Hierarchical federal edge learning scheduling method and system applied to Internet of things
CN116681126B (en) * 2023-06-06 2024-03-12 重庆邮电大学空间通信研究院 Asynchronous weighted federation learning method capable of adapting to waiting time
CN116681126A (en) * 2023-06-06 2023-09-01 重庆邮电大学空间通信研究院 Asynchronous weighted federation learning method capable of adapting to waiting time
CN117742928A (en) * 2024-02-20 2024-03-22 蓝象智联(杭州)科技有限公司 Algorithm component execution scheduling method for federal learning
CN117742928B (en) * 2024-02-20 2024-04-26 蓝象智联(杭州)科技有限公司 Algorithm component execution scheduling method for federal learning

Also Published As

Publication number Publication date
CN111522669A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
WO2021219054A1 (en) Transverse federated learning system optimization method, apparatus and device, and readable storage medium
CN112181666B (en) Equipment assessment and federal learning importance aggregation method based on edge intelligence
Alfakih et al. Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA
Liu et al. An edge network orchestrator for mobile augmented reality
WO2022252456A1 (en) Task scheduling method and apparatus, electronic device, and readable storage medium
US20210256423A1 (en) Methods, apparatuses, and computing devices for trainings of learning models
CN108805611A (en) Advertisement screening technique and device
US11843516B2 (en) Federated learning in telecom communication system
CN108255141B (en) A kind of assembling schedule information generating method and system
CN111988787B (en) Task network access and service placement position selection method and system
CN113989561A (en) Parameter aggregation updating method, equipment and system based on asynchronous federal learning
CN110519849B (en) Communication and computing resource joint allocation method for mobile edge computing
Huang et al. Enabling DNN acceleration with data and model parallelization over ubiquitous end devices
CN113778691B (en) Task migration decision method, device and system
Chen et al. Mobility-aware offloading and resource allocation for distributed services collaboration
CN111275188B (en) Method and device for optimizing horizontal federated learning system and readable storage medium
Albaseer et al. Semi-supervised federated learning over heterogeneous wireless iot edge networks: Framework and algorithms
CN113645637B (en) Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium
CN111124439B (en) Intelligent dynamic unloading algorithm with cloud edge cooperation
CN116781788A (en) Service decision method and service decision device
CN114253728B (en) Heterogeneous multi-node cooperative distributed neural network deployment system based on webpage ecology
CN114266324B (en) Model visualization modeling method and device, computer equipment and storage medium
CN112804304B (en) Task node distribution method and device based on multi-point output model and related equipment
Guo et al. Microservice selection in edge-cloud collaborative environment: A deep reinforcement learning approach
CN110928683B (en) Edge computing resource allocation method based on two types of intensive virtual machines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21795558

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21795558

Country of ref document: EP

Kind code of ref document: A1