CN116204307A - Federal learning method and federal learning system compatible with different computing frameworks - Google Patents

Federal learning method and federal learning system compatible with different computing frameworks Download PDF

Info

Publication number
CN116204307A
CN116204307A CN202310105164.XA CN202310105164A CN116204307A CN 116204307 A CN116204307 A CN 116204307A CN 202310105164 A CN202310105164 A CN 202310105164A CN 116204307 A CN116204307 A CN 116204307A
Authority
CN
China
Prior art keywords
computing
service
node
entity
federal learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310105164.XA
Other languages
Chinese (zh)
Inventor
王德健
林博
董科雄
李进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yikang Huilian Technology Co ltd
Original Assignee
Hangzhou Yikang Huilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yikang Huilian Technology Co ltd filed Critical Hangzhou Yikang Huilian Technology Co ltd
Priority to CN202310105164.XA priority Critical patent/CN116204307A/en
Publication of CN116204307A publication Critical patent/CN116204307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Multi Processors (AREA)

Abstract

The application relates to a federal learning method and a federal learning system compatible with different computing frameworks, wherein each participant comprises a plurality of computing nodes, and each computing node comprises a computing entity engine, computing resources called by the computing entity engine and a computing driving unit called by the computing entity engine; the computing resource is used for providing a computing entity service according to a calling request of the computing entity engine, and the computing driving unit is used for matching a computing framework; for one of the participants, if the computing frames used by the computing nodes participating in federal learning are at least partially different, the federal learning method includes: the computing node obtains a service request of the central node; the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource; the computing entity service calls a computing driving unit matched with different computing frameworks through a computing entity engine; the computational entity engine creates a computational execution body, completes service requests, and achieves compatibility of unused computational frameworks and using different computational frameworks.

Description

Federal learning method and federal learning system compatible with different computing frameworks
Technical Field
The application relates to the technical field of distributed machine learning, in particular to a federal learning method and a federal learning system compatible with different computing frameworks.
Background
In recent years, artificial intelligence technology has rapidly developed, and has been gradually applied to aspects of social life. The rapid development of artificial intelligence technology brings convenience to social life and also brings problems. The process of conventional artificial intelligence model deployment faces certain problems. : the training of the model is based on all the data in the set. In the application process of actual social life, data is often scattered in various entity parts, including different enterprises and institutions, or different departments of the same entity unit. Whether limited by inter-enterprise or privacy concerns of individual data owners, data often cannot be focused on a center for training of global datasets, which can be summarized as "islanding problems" of data. Furthermore, in addition to the inherent limitations of real world conditions, artificial intelligence model training of centralized data suffers from the following drawbacks: the centralized storage and transmission of data have high risk of data leakage; the data is easy to be damaged and infringed by malicious attack; the centralized data training is faced with the problems of insufficient calculation power, slow calculation, long training period, difficult adjustment and the like when facing the ultra-large scale training.
To address the above issues, researchers have proposed concepts and theories of federal learning (Federated Learning, FL). Federal learning proposes a novel solution to the scheme of data sharing: the data is not required to be uploaded to a data center, the model can be trained only locally, the local model obtained by carrying out model training according to the local data is uploaded to a global center model, and contribution is made to improvement of the global model. To deploy federal learning models, training and experimentation of federal learning models has been performed, and there is currently a mature federal learning framework comprising: tensorFlow Federated, pySyft, federated AI Technology Enabler (FATE), etc. have been widely used in model deployment, training, experimentation for federal learning.
However, the existing federal learning framework cannot be deployed specifically according to the condition limitation of the current real computing resources. For one of the participants, the interior of the participant often includes a cluster of servers affiliated with different departments, each server within the cluster of servers providing a computing node, and the computing frameworks of the servers may differ. Obviously, completing distributed machine learning requires several participants to participate together.
Currently existing federal learning frameworks (e.g., FATEs) that can use a distributed computing framework within a participant need to rely on all participants to use a uniform distributed computing framework. Because the current implementation is data-centric and requires unified operation of the computing behavior of the participants, it is necessary to set all the participants to be a unified computing framework. However, if a unified computing framework is redeployed for each server of a different cluster of participant servers, the effort is greater.
Disclosure of Invention
Based on this, it is necessary to provide a federal learning method compatible with different computing frameworks in order to solve the above-mentioned technical problems.
The federal learning method compatible with different computing frameworks is implemented between a central node and a plurality of participants, each participant comprises a plurality of computing nodes, and each computing node comprises a computing entity engine, computing resources called by the computing entity engine and a computing driving unit called by the computing entity engine;
the computing resource is used for providing a computing entity service according to a call request of the computing entity engine, and the computing driving unit is used for matching a computing framework;
for one of the participants, if each computing node participating in federal learning does not use a computing framework, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service completes the service request;
for one of the participants, if each computing node participating in federal learning uses a computing framework and the computing frameworks used are at least partially different, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service invokes the computing drive units matching different computing frameworks through the computing entity engine;
the computational entity engine creates a computational execution body that completes the service request.
Optionally, the computing resource includes a plurality of computing entity services in a queue, and the computing resource is configured to provide the computing entity services according to a call request of the computing entity engine, including: the computing resource is used for providing one of the computing entity services in the queue according to the call request of the computing entity engine.
Optionally, the computational entity engine comprises a dispatch management module and a computational driver module,
the computing resource is called by the dispatching management module;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource, and the method specifically comprises the following steps: the scheduling management module receives the service request and invokes corresponding computing entity service in the computing resource;
the computing driving unit is called by the computing driving module;
the computing entity service calls the computing driving unit matched with different computing frameworks through the computing entity engine, and specifically comprises the following steps: the computing entity service matches the computing drive units of different computing frameworks through the computing drive module;
the computing entity engine creates a computing execution body, and specifically includes: the computing driver module creates a computing execution body.
Optionally, each participant includes a node management service provided for receiving the operational status of each computing node and the type of computing framework used by each computing node.
Optionally, each computing node includes a provided computing node service for information interaction with the node management service and for sending the service request to the computing entity engine.
Optionally, the central node includes a task coordinator, where the task coordinator is configured to communicate with node management services of each participant;
and after the service request is completed, providing task content feedback to the task coordinator through the computing node service and the node management service in sequence.
Optionally, the scheduling management module includes two types: the cloud scheduling system comprises a local scheduling module and a cloud scheduling module;
the node management service judges whether each computing node uses a computing frame or not and whether the computing frames used by the computing nodes are at least partially different or not;
if each computing node participating in federal learning does not use a computing framework, the computing entity engine receives the service request and invokes a corresponding computing entity service in the computing resource, which specifically includes: the local scheduling module receives the service request and invokes corresponding computing entity service in the computing resource;
if the computing frames used by the computing nodes participating in federal learning are at least partially different, the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource, which specifically includes: and the cloud scheduling module receives the service request and invokes corresponding computing entity service in the computing resource.
Optionally, the computing execution body is part of the computing resource.
Optionally, the computing framework includes at least one of: a mapreduce frame, a spark frame, a flink frame, a ray frame.
The application also provides a federal learning system compatible with different computing frameworks, comprising a central node and a plurality of participants, wherein each participant comprises a plurality of computing nodes, and each computing node comprises a computing entity engine, computing resources called by the computing entity engine and a computing driving unit called by the computing entity engine;
the computing resource is used for providing a computing entity service according to a call request of the computing entity engine, and the computing driving unit is used for matching a computing framework;
the federal learning system is used for implementing federal learning methods compatible with different computing frameworks;
for one of the participants, if each computing node participating in federal learning does not use a computing framework, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service completes the service request;
for one of the participants, if each computing node participating in federal learning uses a computing framework and the computing frameworks used are at least partially different, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service invokes the computing drive units matching different computing frameworks through the computing entity engine;
the computational entity engine creates a computational execution body that completes the service request.
The federal learning method compatible with different computing frameworks has at least the following effects:
if each computing node participating in federal learning does not use a computing framework, local computing is performed, and at this time, a server providing the computing node in the participant directly runs the computing code, so that the computing driving unit does not need to be started.
When the computing frames used by all the computing nodes are different, the method calls the computing driving units matched with the different computing frames, so that the compatibility of two federal learning working states of unused computing frames and using different computing frames is realized.
The federal learning method reduces deployment and upper hand difficulty, has only the influence on the size of the framework in calculating the driving unit and calling the part (calculating driving module) of the calculating driving unit, and has small influence on the resource occupation of the existing calculating framework on the basis of meeting compatible functions.
Drawings
FIG. 1 is a schematic diagram of an application scenario of a federal learning method compatible with different computing frameworks in an embodiment of the present application;
FIG. 2 is a schematic diagram of an application scenario of a federal learning method compatible with different computing frameworks in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Reference numerals in the drawings are described as follows:
110. a central node; 210. a participant; 220. a node management service; 230. calculating nodes; 240. computing node services; 250. a computational entity engine; 251. a dispatch management module; 252. calculating a driving module; 260. computing resources; 261. computing an entity service; 270. a calculation driving unit; 280. a calculation execution body; 300. different computing frameworks.
Detailed Description
The prior art relies on a unified distributed computing framework for all participants in federal learning. However, there may be an incompatibility problem with the distributed computing frameworks (abbreviated as computing frameworks) of the servers in the participant server cluster, for example, different computing nodes in the participant (provided by the servers in the participant) may use different computing frameworks, such as several of the distributed computing frameworks including mapreduce, spark, flink, ray, etc. respectively. When federation learning is performed, each computing node cannot perform federation learning tasks at the same time due to different frameworks. When federal learning is performed, the volume of the participants is huge, and different servers in each participant simultaneously deploy the same computing framework, so that the workload is large. Even if the same computing framework is installed in each server in the participant, repeated operation is required if the use requirements of different computing frameworks exist, and the federal learning method is limited to be widely applied.
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
To solve the above technical problem, referring to fig. 1-2, in an embodiment of the present application, a federal learning method compatible with different computing frameworks is provided, where the federal learning method is implemented between a central node 110 and a plurality of participants 210, each participant 210 includes a plurality of computing nodes 230, and each computing node 230 includes a computing entity engine 250, a computing resource 260 called by the computing entity engine 250, and a computing driving unit 270 called by the computing entity engine 250. Wherein computing resource 260 is configured to provide computing entity service 261 upon a call request from computing entity engine 250, and computing driver 270 is configured to match the computing framework.
The central node 110 may be understood as an organizer performing federal learning, further understood as a server organizing federal learning, and the identity is a task coordinator of federal learning. The number of the participants 210 means that the number of the participants 210 participating in federal learning is at least two, the participants 210 are a server cluster including different servers belonging to the same department and the same unit, and the number of the computing nodes 230 are provided by different servers in the participants 210. Computing resource 260 refers to the computational resource of the server where computing node 230 resides, which is invoked by computing entity engine 250. In the underlying operating logic of the server, computing resources 260 include a plurality of computing entity services 261 that are partitioned into queues to complete the corresponding services. The computing resource 260 includes a plurality of computing entity services 261 in a queue, and the computing resource 260 is configured to provide one of the computing entity services 261 in the queue according to a call request from the computing entity engine 250. The computing framework includes at least one of: a mapreduce frame, a spark frame, a flink frame, a ray frame.
For the calculation driving unit 270 to match the calculation frame, it may be performed in the following manner. It is first determined that the code that the participants compute upon receiving the federal learning task is code written to invoke according to the distributed computing framework used.
Each distributed computing framework has its own computing interface, and the task of the computing drive unit is the connection adaptation work of the computing framework, including at least one of the following: setting an interface communication address of a distributed computing framework; setting environment variables of the distributed computing framework configuration. All the computing drive unit needs to do is have to connect and use the interfaces of the computing framework when invoking the computing framework using code written with the corresponding distributed computing framework. After the computing driving unit finishes the above work, the code which calls the computing frame can successfully call the corresponding computing frame, and the exception handling function can be added after the call failure exception.
For one of the participants 210, if each computing node 230 participating in federal learning does not use a computing framework, the federal learning method includes steps S11 to S13.
In step S11, the computing node 230 obtains a service request of the center node 110.
In step S12, the computing entity engine 250 receives the service request and invokes the corresponding computing entity service 261 in the computing resource 260. Step S12 is specifically completed by the schedule management module 251 included in the computation entity engine 250.
In step S13, the computing entity service 261 completes the service request.
Specifically, steps S11 to S13 include the central node 110 organizing federal learning, sending out a service request, the computing node 230 obtaining the service request, organizing local processes to build a local model, and returning model parameters after organizing computation to complete the service request.
For one of the participants 210, if each computing node 230 participating in federal learning uses a computing framework and the computing frameworks used are at least partially different, the federal learning method includes steps S21 to S24.
In step S21, the computing node 230 obtains a service request of the center node 110.
In step S22, the computing entity engine 250 receives the service request and invokes the corresponding computing entity service 261 in the computing resource 260.
In step S23, the computing entity service 261 calls the computing drive unit 270 matching the different computing frameworks 300 through the computing entity engine 250. Step S23 specifically matches the computing driver units 270 of the different computing frameworks 300 through the computing driver modules 252 included in the computing entity engine 250.
In step S24, the computational entity engine 250 creates a computational execution body 280, and the computational execution body 280 completes the service request. In step S24, the computing entity module 252 included in the computing entity engine 250 creates a computing execution body 280, and the computing execution body 280 is a part of the computing resource 260.
Further, each participant 210 includes a node management service 220 provided, the node management service 220 being configured to receive the operating status of each computing node 230 and the type of computing framework used by each computing node 230.
According to the federal learning method compatible with different computing frameworks, the computing driving unit is used for being compatible with different computing frameworks, compared with other compatible modes, the deployment and the handedness of the user are reduced, namely, different computing frameworks can be compatible and enter the federal learning method provided by the embodiment, the influence of the federal learning method provided by the embodiment on the size of the framework is only the computing driving unit and the computing driving module, and the influence on the resource occupation of the existing computing framework is small on the basis of meeting the compatibility function.
In addition, according to the federal learning method compatible with different computing frameworks provided in the embodiment, when different participants use different distributed computing frameworks and the same computing framework is adopted inside the same participant (belonging to a cloud computing situation), the step S21 to the step S24 can still be utilized to realize the compatibility of different computing frameworks of each participant.
The computing entity engine provides unified interfaces (for node management service) upwards through abstraction to perform functions of applying, checking, managing and the like of computing resources, and is compatible with computing interfaces under different computing frameworks; and the scheduling management module under different resource conditions and the calculation driving module under different calculation frameworks are used downwards to be compatible with the different calculation frameworks, so that the actual resource scheduling and calculation executing functions are completed.
Each compute node 230 includes a provided compute node service 240, the compute node service 240 for information interaction with the node management service 220 and for sending service requests to the compute entity engine 250.
The central node 110 includes a task coordinator for communicating with the node management services 220 of each participant 210 to ultimately obtain a computing entity service 261 within the computing resource 260 that completes the service request. After completing the service request, task content feedback is provided to the task orchestrator sequentially through the compute node service 240 and the node management service 220.
In one embodiment, the schedule management module 251 includes two types: the cloud scheduling system comprises a local scheduling module and a cloud scheduling module.
The node management service 220 determines whether each computing node 230 uses a computing framework and whether the computing frameworks used by each computing node 230 are at least partially different.
If each computing node 230 participating in federation learning does not use a computing framework, step S12 specifically includes: the local scheduling module receives the service request and invokes the corresponding computing entity service 261 in the computing resource 260;
if the computing frames used by the computing nodes 230 participating in federal learning are at least partially different, step S22 specifically includes: the cloud scheduling module receives the service request and invokes the corresponding computing entity service 261 in the computing resource 260.
In one embodiment, a federal learning method compatible with different computing frameworks 300 is described in detail.
FIG. 1 illustrates an overall structural schematic diagram of federal learning task deployment using a federal learning framework provided by embodiments of the present application, as well as the positioning and composition of computational entity engines 250 in computing nodes 230.
Each participant 210 needs to receive the computational tasks from the central node 110 and perform the calculations, transmitting the results back to the central node 110 for the central node 110 to perform the aggregate calculations of the model. Thus, each participant 210 needs to have a computing node to perform the computing tasks while at the same time managing the computing resources.
When performing the federation learning task deployment, each participant 210 may initiate a node management service 220 that manages meta information such as the task of the participant, and each computing node of the participant 210 initiates a corresponding node management service 220 to perform node management.
The node management service 220 obtains information of the computing node 230 by interacting with the computing node service 240 of the computing node 230, performs dispatching tasks, and the like. While each compute node service 240 may perform the acquisition and management of computing resources by interacting with compute entity engine 250, compute entity engine 250 manages scheduling computing resources 260.
FIG. 2 illustrates the manner in which computing resource 260 communicates interactively with compute entity engine 250, and the manner in which computing resource 260 is compatible with different computing frameworks 300 via computing drive unit 270. The computing resource 260 includes a queue of computing entities served 261 by each computing entity.
When the participant 210 receives the instruction that the task needs to be calculated, the node management service 220 in the participant 210 receives the instruction, and dispatches and acquires the communication address of the computing entity service 261 (which refers to one of the computing entity services 261) in the computing entity queue through the dispatching management module 251 in the computing entity engine 250, and returns the communication address. The central node 110, as a task coordinator, may send a task execution instruction to the service of the address using the communication address, with configuration information of the task, and with respect to the acquisition of the communication address, see fig. 1. After the computing entity service 261 receives the instruction and the task configuration, the computing entity engine 250 is obtained from the computing driver module 252 corresponding to the different computing frameworks 300, and the computing driver module 252 returns to the corresponding computing driver unit 270. The calculation execution body 280 is then created and the fetched calculation driving unit 270 is used to execute the calculation behavior in the different calculation frameworks 300. The computing execution body 280 may be part of the computing resource 260.
Referring to fig. 2, according to the situation of computing resources, two deployment scenarios are divided: the main difference between the local single-machine deployment and the multi-machine distributed deployment is the difference between the engine running environments and the computing entities.
The main difference between the two deployment methods in terms of the frame diagrams shown in the figures is whether the computing entity service 261 is involved to receive instructions, and the computing execution body 280 is created to complete the process of computing by combining different computing frames 300 by the computing driving module 252 using the computing driving unit 270.
Local stand-alone deployments typically do not use a distributed computing framework. While multi-machine distributed deployment requires the use of a computing framework, there may be different computing frameworks. The present embodiment can cope with two kinds of working scenarios simultaneously, that is, directly referring to the computing entity service 261 when a computing framework is not used, calling different computing driving units 270 according to different computing frameworks 300 when the computing framework is used and there is a difference in the computing framework, and creating a computing execution body 280 to complete the computation. The embodiment realizes the compatibility of a single computing framework and different computing frameworks.
The steps for deploying the federated learning task using the computational entity engine 250 shown in the figure are as follows:
step 101: referring to fig. 1, a central node 110 service and various participants 210 are created using a federal learning framework according to given configuration information.
Step 102: the computing nodes are initialized according to the number and configuration of computing nodes 230 of each participant 210 in a given configuration information. Upon creation of a compute node 230, a compute node service 240 is initiated that, upon creation, initializes a schedule management module 251 for resource scheduling in compute entity engine 250, and a compute driver module 252. In the case of local stand-alone deployment, the scheduling management module 251 (computing scheduling engine) is an operating program residing in the background, and the scheduling management module 251 creates a plurality of available computing entities locally according to the configuration information, and is used for providing the computing entity service 261, forming a computing entity queue, which may be a resident process or the like, and starting the background service to maintain the state information and the communication port information of the computing entities. In the case of a multi-machine distributed deployment, it is necessary to pre-deploy the cloud computing environment and associate the cloud engine with the cloud computing environment at the time of engine initialization. The cloud engine creates several computing entities in the cloud computing environment for providing a computing entity service 261 and running a service for monitoring the computing entities.
Step 103: after the computing entity engine is started, the two types of engines of the scheduling management module 251 and the computing driving module 252 provide a uniform external call interface. After the computing node service 240 receives the function call of the node management service 220 of the participant, the unified interface of the scheduling management module 251 can be called to implement the scheduling of the computing entity service 261 in the computing resource 260, and meanwhile, different programs do not need to be implemented for the local and multi-machine distributed deployment designs.
Step 104: after receiving the task execution instruction and the task configuration, the participant 210 returns to the address receiving task configuration of the available computing entity service 261, and obtains the computing drive units 270 corresponding to different computing frameworks 300 through the computing drive modules 252 in the computing entity engine 250, so as to create a computing execution body 280, execute the computing task on the corresponding computing framework, and complete the compatibility of the different computing frameworks.
In one embodiment, compute entity engine 250 is functionally divided into two modules, a schedule management module 251 and a compute driver module 252. The two functions respectively correspond to the scheduling monitoring function of the computing entity engine computing entity and the function of being compatible with different computing frameworks through the computing drive.
The scheduling management module 251 includes two types, namely a local computing entity engine resource scheduling module (abbreviated as a local scheduling module) and a cloud computing entity engine resource scheduling module (abbreviated as a cloud scheduling module), which respectively perform scheduling and management of computing resources under the condition of local single-machine computing resources and under the condition of multi-machine cloud computing environment deployment.
In this embodiment, the two different types of scheduling management modules 251, namely, the local scheduling module and the cloud scheduling module, are used to be compatible with different computing resource conditions such as single-machine and multi-machine distributed computing, and provide a function and a method for resource scheduling management through a unified external interface.
When the federal learning task is deployed on a local stand-alone machine, the dispatching management module 251 selects a local dispatching module to dispatch and manage the computing resource 260. Under the stand-alone condition, the computing resource 260 applies for a request to return to the port communication address of one locally available computing entity service 261, and the background service of the scheduling management module communicates with the computing entity service through the port communication address of each computing entity service 261 to perform query and management of the state of the computing entity service.
When the federal learning task is deployed under the condition of a multi-machine cloud computing environment, the scheduling management module 251 selects a cloud scheduling module to schedule and manage computing resources. The node management service 220 receives the service request of the central node 110, sends a resource application request to the computing entity engine 250, and the scheduling management module in the computing entity engine 250 selects a cloud scheduling module to return the communication address of the computing entity service running in the cloud computing environment. The cloud scheduling module runs background service in the cloud computing environment, performs resource allocation and resource state checking and management in the cloud computing environment, and realizes scheduling, running and management of physical computing resources of multiple machines.
The computing driver module 252 invokes the corresponding computing driver unit 270 for the different computing frameworks to make the federal learning method locally compatible with the different computing frameworks at the participant. In particular, in deploying federal learning tasks, computational entity engine 250 is compatible with the different computational frameworks used by the various computational nodes 230 within the participant by using computational driver units 270 that accommodate the different computational frameworks 300 under a unified interface.
When the federal learning task is deployed, a local scheduling module is selected under the single machine condition, a cloud scheduling module is selected under the multi-machine condition, and the scheduling management module 251 of the computing entity engine 250 uses different types of scheduling modules (the local scheduling module and the cloud scheduling module) under different resource conditions to be compatible with different computing resource conditions brought by different computing frameworks.
When the federal learning task is deployed, each participant and the local computing requirements deploy different computing frameworks, the computing entity engine 250 invokes the corresponding computing drive unit 270 for the different frameworks 300 through the computing drive module 252, and the unified computing interface realizes compatibility for the different computing frameworks 300.
The unified computing interface is a local scheduling module and a cloud scheduling module included in the scheduling management module 251, and exposes the same resource scheduling and computing execution interfaces to the outside according to different computing resource conditions (local computing or cloud computing) and different computing frameworks, where the computing execution interfaces include a computing entity application interface, a resource state query interface, a computing entity deletion interface and a computing execution interface.
In federal learning task deployment, a local scheduling module and a cloud scheduling module are selected correspondingly according to different computing resource conditions. The compute node services 240 of each participant 210 use the established parameters to initialize the compute entity engine 250, and then use the exposed unified interface of the compute entity engine 250. In the local computing scenario, the scheduling management module 251 selects a local scheduling module to schedule and manage the computing resource 260; in the cloud computing scenario, the scheduling management module 251 selects a cloud scheduling module, and invokes the computing driving unit 270 matched with different computing frameworks 300 to create the computing execution body 280 through the computing driving module 252, so as to complete cloud computing.
The given parameters are parameters necessary for the initialization of the computational entity engine 250, including the number of computational entity services 261, work ports, available port ranges, communication authentication file paths, computational frameworks used for computation, etc. that the computational entity engine 250 needs to create and maintain.
The computing entity engine 250 receives the given parameters for initialization, when the computing entity engine 250 receives the parameter initialization, it creates a plurality of computing entity services 261 according to the given parameters through the scheduling management module and maintains the states thereof, starts the background service to query and manage the working states of the computing entity services 261, and the computing drive module 252 maintains the currently supported computing framework types and uses the corresponding computing drive unit 270 when performing computation.
Each participant 210 contains a participant's node management service 220, while each participant 210 participating in federal learning contains several computing nodes 230. Each computing node 230 initiates a computing node service 240 to interact with the node management service 220, which provides computing services to the node management service 220 that run federal learning tasks, etc. When the computing node service 240 receives that a task needs to execute a computing task, the scheduling management module 251 of the computing entity engine 250 applies for and manages the computing resource 260, and when performing computation, the computing entity engine 250 uses the unified interface of the corresponding computing driving unit 270 through the computing driving module 252 to match different computing frameworks 300 to execute computing behaviors.
The embodiment of the application discloses a federal learning method compatible with different computing frameworks, which can be compatible with different computing frameworks, performs resource scheduling and computing execution and is applied to the field of distributed machine learning.
According to the embodiments, the corresponding local scheduling module and cloud scheduling module in the corresponding computing entity engine can be selected according to different computing frameworks (computing resource conditions), and the computing driving unit is utilized when the cloud scheduling module is adopted, so that the different computing frameworks are matched and used.
The embodiments unify resource scheduling and management and computing execution under different computing frameworks, and can be compatible with different computing resources and computing frameworks.
In one embodiment, a computer device is provided, which corresponds to one of the computing nodes in the providing party, and may be a terminal, whose internal structure may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is operable to provide computing and control capabilities, and in particular, computing resources 260. The memory of the computer device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a computer program, including a schedule management module 251, a calculation drive module 252, a calculation drive unit 270, and the like provided by the embodiments. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media.
The network interface of the computer device is used to communicate with external terminals through a network connection, and is controlled by the node management service 220. The computer program, when executed by the processor, is configured to cooperate with the overall federal learning system to perform service requests from the central node in the federal learning method.
The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description. When technical features of different embodiments are embodied in the same drawing, the drawing can be regarded as a combination of the embodiments concerned also being disclosed at the same time.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. Federal learning methods compatible with different computing frameworks, implemented between a central node and a number of participants, characterized in that,
each participant comprises a plurality of computing nodes, and each computing node comprises a computing entity engine, computing resources called by the computing entity engine and a computing driving unit called by the computing entity engine;
the computing resource is used for providing a computing entity service according to a call request of the computing entity engine, and the computing driving unit is used for matching a computing framework;
for one of the participants, if each computing node participating in federal learning does not use a computing framework, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service completes the service request;
for one of the participants, if each computing node participating in federal learning uses a computing framework and the computing frameworks used are at least partially different, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service invokes the computing drive units matching different computing frameworks through the computing entity engine;
the computational entity engine creates a computational execution body that completes the service request.
2. The federal learning method compatible with disparate computing frameworks of claim 1, wherein the computing resource comprises a plurality of computing entity services in a queue, the computing resource to provide computing entity services in accordance with a call request from the computing entity engine, comprising: the computing resource is used for providing one of the computing entity services in the queue according to the call request of the computing entity engine.
3. The federal learning method compatible with disparate computing frameworks of claim 2, wherein the computing entity engine comprises a schedule management module and a computing driver module,
the computing resource is called by the dispatching management module;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource, and the method specifically comprises the following steps: the scheduling management module receives the service request and invokes corresponding computing entity service in the computing resource;
the computing driving unit is called by the computing driving module;
the computing entity service calls the computing driving unit matched with different computing frameworks through the computing entity engine, and specifically comprises the following steps: the computing entity service matches the computing drive units of different computing frameworks through the computing drive module;
the computing entity engine creates a computing execution body, and specifically includes: the computing driver module creates a computing execution body.
4. A federal learning method compatible with disparate computing frameworks according to claim 3, wherein each participant includes a node management service provided for receiving the operational status of each computing node and the type of computing framework used by each computing node.
5. The federal learning method compatible with disparate computing frameworks of claim 4, wherein each computing node includes a provided computing node service for information interaction with the node management service and for sending the service request to the computational entity engine.
6. The federal learning method compatible with disparate computing frameworks of claim 5, wherein the central node comprises a task coordinator for communicating with node management services of each participant;
and after the service request is completed, providing task content feedback to the task coordinator through the computing node service and the node management service in sequence.
7. The federal learning method compatible with disparate computing frameworks of claim 4, wherein the schedule management module comprises two types: the cloud scheduling system comprises a local scheduling module and a cloud scheduling module;
the node management service judges whether each computing node uses a computing frame or not and whether the computing frames used by the computing nodes are at least partially different or not;
if each computing node participating in federal learning does not use a computing framework, the computing entity engine receives the service request and invokes a corresponding computing entity service in the computing resource, which specifically includes: the local scheduling module receives the service request and invokes corresponding computing entity service in the computing resource;
if the computing frames used by the computing nodes participating in federal learning are at least partially different, the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource, which specifically includes: and the cloud scheduling module receives the service request and invokes corresponding computing entity service in the computing resource.
8. The federal learning method compatible with disparate computing frameworks of claim 1, wherein the computing execution body is part of the computing resource.
9. The federal learning method compatible with disparate computing frameworks of claim 1, wherein the computing framework comprises at least one of: a mapreduce frame, a spark frame, a flink frame, a ray frame.
10. The federal learning system compatible with different computing frameworks comprises a central node and a plurality of participants, and is characterized in that each participant comprises a plurality of computing nodes, and each computing node comprises a computing entity engine, computing resources called by the computing entity engine and a computing driving unit called by the computing entity engine;
the computing resource is used for providing a computing entity service according to a call request of the computing entity engine, and the computing driving unit is used for matching a computing framework;
the federal learning system is used for implementing federal learning methods compatible with different computing frameworks;
for one of the participants, if each computing node participating in federal learning does not use a computing framework, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service completes the service request;
for one of the participants, if each computing node participating in federal learning uses a computing framework and the computing frameworks used are at least partially different, the federal learning method includes:
the computing node obtains a service request of the central node;
the computing entity engine receives the service request and invokes corresponding computing entity service in the computing resource;
the computing entity service invokes the computing drive units matching different computing frameworks through the computing entity engine;
the computational entity engine creates a computational execution body that completes the service request.
CN202310105164.XA 2023-02-07 2023-02-07 Federal learning method and federal learning system compatible with different computing frameworks Pending CN116204307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310105164.XA CN116204307A (en) 2023-02-07 2023-02-07 Federal learning method and federal learning system compatible with different computing frameworks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310105164.XA CN116204307A (en) 2023-02-07 2023-02-07 Federal learning method and federal learning system compatible with different computing frameworks

Publications (1)

Publication Number Publication Date
CN116204307A true CN116204307A (en) 2023-06-02

Family

ID=86510680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310105164.XA Pending CN116204307A (en) 2023-02-07 2023-02-07 Federal learning method and federal learning system compatible with different computing frameworks

Country Status (1)

Country Link
CN (1) CN116204307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573359A (en) * 2023-11-28 2024-02-20 之江实验室 Heterogeneous cluster-based computing framework management system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573359A (en) * 2023-11-28 2024-02-20 之江实验室 Heterogeneous cluster-based computing framework management system and method

Similar Documents

Publication Publication Date Title
EP2796996B1 (en) Cloud infrastructure based management system and method for performing maintenance and deployment for application system
CN108763090A (en) Test environment dispositions method, device, computer equipment and storage medium
CN103810023B (en) The intelligent deployment method of Distributed Application and system in a kind of cloud platform
CN102456185B (en) Distributed workflow processing method and distributed workflow engine system
Bogdanova et al. Multiagent approach to controlling distributed computing in a cluster grid system
CN103207965A (en) Method and device for License authentication in virtual environment
CN112291298A (en) Data transmission method and device for heterogeneous system, computer equipment and storage medium
CN115297008B (en) Collaborative training method, device, terminal and storage medium based on intelligent computing network
CN116010027A (en) Method for managing task processing cluster, method for executing task and container cluster
CN116204307A (en) Federal learning method and federal learning system compatible with different computing frameworks
CN115795929A (en) Simulation deduction evaluation system and method
CN116360918A (en) Modeling data processing method, modeling data processing device, computer equipment and storage medium
CN112799970B (en) Test data processing method, device, electronic equipment and medium
CN116800616B (en) Management method and related device of virtualized network equipment
CN113946389A (en) Federal learning process execution optimization method, device, storage medium, and program product
CN103645959A (en) Telecom real-time system multi-process SMP (shared memory pool) interaction assembly and method
JP2024501005A (en) Management method and device for container clusters
CN110427260B (en) Host job scheduling method, device and system
CN114579250A (en) Method, device and storage medium for constructing virtual cluster
CN116680209A (en) WASM-based multi-intelligent contract instance management method
CN115361382B (en) Data processing method, device, equipment and storage medium based on data group
CN115840648A (en) Simulation task processing method and device and electronic equipment
Sha et al. Performance modeling of openstack cloud computing platform using performance evaluation process algebra
CN109671140B (en) Cloud rendering service processing method adopting micro-service
CN114968525B (en) Cloud native task scheduling method and device for privacy computation and privacy data protection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination