WO2023175381A1 - Iterative training of collaborative distributed coded artificial intelligence model - Google Patents

Iterative training of collaborative distributed coded artificial intelligence model Download PDF

Info

Publication number
WO2023175381A1
WO2023175381A1 PCT/IB2022/052483 IB2022052483W WO2023175381A1 WO 2023175381 A1 WO2023175381 A1 WO 2023175381A1 IB 2022052483 W IB2022052483 W IB 2022052483W WO 2023175381 A1 WO2023175381 A1 WO 2023175381A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
iteration
redundancy
redundancy factor
training
Prior art date
Application number
PCT/IB2022/052483
Other languages
French (fr)
Inventor
Yuxuan JIANG
Qiang Ye
Emmanuel Thepie FAPI
Wenting Sun
Fudong Li
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/IB2022/052483 priority Critical patent/WO2023175381A1/en
Publication of WO2023175381A1 publication Critical patent/WO2023175381A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning

Definitions

  • the present disclosure relates generally to iterative training of a collaborative distributed coded artificial intelligence (Al) model, and related methods and apparatuses.
  • Al collaborative distributed coded artificial intelligence
  • a parameter server e.g., a master node
  • multiple Internet of Things (loT) edge devices also referred to as worker devices
  • a training dataset can be shared across the edge devices.
  • the master node trains the Al model based on the data the master node has collected.
  • the Al model training is assumed to converge after a certain number of iterations.
  • each loT device is responsible of performing a portion of processing during each training iteration.
  • the loT devices output results are then sent back to the master node for aggregation or combination.
  • worker devices e.g., loT edge devices
  • some worker devices e.g., loT edge devices
  • may not be reliable in computation e.g., computing processing unit (CPU) overloaded, system failure, etc.
  • communications e.g., limited communication bandwidth, increased latency, etc.
  • collaborative distributed Al learning may become challenging.
  • worker devices e.g., loT edge devices
  • in a low coverage zone may affect such a training process as data may arrive late. Such a scenario may not be suitable for realtime applications.
  • Worker devices may become stragglers and, thus, may have an effect of delaying the learning process.
  • Some approaches using a distributed coded Al strategy may lack intelligence and/or an online decision during the dispatching of workload to each worker device and/or in the collection of output results by a central coordinator node.
  • Potential advantages provided by various embodiments of the present disclosure may include that the method includes operations that may perform an online workload allocation decision to execute an iterative Al model using distributed coded Al model training. As a consequence, workloads may be intelligently assigned across worker devices in each iteration and latency may be reduced or minimized.
  • a method performed by a computing device for iterative training of a collaborative distributed coded Al model.
  • the method includes receiving a request from the Al model for a redundancy factor for an iteration of training of the Al model.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration.
  • the method further includes selecting the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors.
  • the method further includes sending the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
  • a computing device includes processing circuitry, and at least one memory coupled with the processing circuitry.
  • the memory stores program code that is executed by the processing circuitry to perform operations.
  • the operations include receive a request from the Al model for a redundancy factor for an iteration of training of the Al model.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration.
  • the operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors.
  • the operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
  • a computing device is provided that is adapted to perform operations comprising receive a request from the Al model for a redundancy factor for an iteration of training of the Al model.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration.
  • the operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors.
  • the operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
  • a computer program product including a non- transitory storage medium including program code to be executed by processing circuitry of a computing device is provided. Execution of the program code causes the computing device to perform operations comprising receive a request from the Al model for a redundancy factor for an iteration of training of the Al model.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration.
  • the operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors.
  • the operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
  • a computer program including program code to be executed by processing circuitry of a computing device.
  • the program code causes the computing device to perform operations comprising receive a request from the Al model for a redundancy factor for an iteration of training of the Al model.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration.
  • the operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors.
  • the operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
  • a method performed by a master node in a distributed computing cluster for iterative training of a collaborative distributed coded Al model for iterative training of a collaborative distributed coded Al model.
  • the method includes receiving, from a computing device, a redundancy factor.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration.
  • the method further includes receiving a data matrix and a vector from the Al model; encoding the data matrix into a plurality of submatrices; distributing a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor.
  • the method further includes collecting respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster.
  • the method further includes extracting an overall result from the collected respective results; and sending the overall result to the Al model to determine whether the training is completed.
  • a master node includes processing circuitry, and at least one memory coupled with the processing circuitry.
  • the memory stores program code that is executed by the processing circuitry to perform operations.
  • the operations include receive, from a computing device, a redundancy factor.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration.
  • the operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor.
  • the method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster.
  • the method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
  • a master node is provided that is adapted to perform operations comprising receive, from a computing device, a redundancy factor.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration.
  • the operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor.
  • the method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster.
  • the method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
  • a computer program product including a non- transitory storage medium including program code to be executed by processing circuitry of a master node. Execution of the program code causes the master node to perform operations comprising receive, from a computing device, a redundancy factor.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration.
  • the operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor.
  • the method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster.
  • the method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
  • a computer program including program code to be executed by processing circuitry of a master node.
  • the program code causes the master node to perform operations comprising receive, from a computing device, a redundancy factor.
  • the redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration.
  • the operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor.
  • the method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster.
  • the method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
  • Figure 1 is a schematic diagram of an overview of distributed coded Al model learning in accordance with some embodiments of the present disclosure
  • Figure 2 is a schematic diagram illustrating a cloud-based implementation as a service in accordance with some embodiments of the present disclosure
  • Figure 3 is a signalling diagram in accordance with some embodiments of the present disclosure.
  • Figure 4 is a flow chart illustrating operations of a computing device in accordance with some embodiments of the present disclosure
  • Figure 5 is a flow chart illustrating operations of a master node in accordance with some embodiments of the present disclosure
  • Figure 6 is a plot of empirical average execution time for each arm for a simulation in accordance with some embodiments of the present disclosure
  • Figures 7A-7D are plots of per-iteration reward evolution for the simulation in accordance with some embodiments of the present disclosure.
  • Figures 8-11 are plots of numbers of pulls for each arm in the simulation in accordance with some embodiments of the present disclosure.
  • Figure 12 is a block diagram of a computing device in accordance with some embodiments of the present disclosure.
  • Figure 13 is a block diagram of a master node in accordance with some embodiments of the present disclosure.
  • Figure 14 is a block diagram of a worker device in accordance with some embodiments of the present disclosure.
  • loT edge devices in a cluster may not be reliable in computation (e.g., CPU overloaded, system failure, etc.) and communications (e.g., limited communication bandwidth, increased latency, etc.), especially in wireless communication. Due to a heterogeneous and time-varying nature of loT edge devices availability, collaborative distributed Al learning may become challenging.
  • Some computing e.g., edge computing, includes distributed computing and data storage for services with low latency requirements to help enable ultra-fast interactions and/or responsiveness.
  • Resources may be unbalanced in an edge computing scenario and edge devices may be located in different fifth generation (5G) coverage zones, such as high, low, or medium coverage zones. Latency for edge devices in a low coverage zone may be higher than latency for edge devices in a high coverage zone.
  • 5G fifth generation
  • edge devices in the low coverage zone may affect the training process as data may arrive late. Such a scenario may not be suitable for real-time applications.
  • Some approaches have used distributed coded techniques to address such a scenario. These techniques may allow injection of erasure and error-correcting codes to improve the reliability via coded computation. This injection is achieved by intelligently adding some redundancy to the data assigned to loT edge devices for a subtask in each iteration.
  • some distributed coded Al approaches can increase the computational workload overhead assigned to the worker devices.
  • Some additional challenges associated with increased workload may include system disturbances, such as slow-down or failures of an individual worker device(s).
  • K. Lee et al. “Speeding up distributed machine learning using codes,” IEEE Transaction on Information theory, Vol. 64, no. 3, pp. 1514-1529 (2017) considers a homogeneous cluster with Maximum Distance Separable (MDS) codes to conduct matrix-vector multiplication.
  • D. Kim et al. "Optimal load allocation for coded distributed computation in heterogeneous clusters," IEEE Transaction on Communications, vol. 69, no. 1, pp. 44-58 (2021) considers a heterogeneous cluster with MDS code for matrix-vector multiplication.
  • A. Reisizadeh et al. "Coded computation over heterogeneous cluster," IEEE Transaction on Information Theory, vol. 65, no. 7, pp. 4227- 4242 (2019) considers a heterogeneous cluster with Random Linear Codes (RLC) for matrix-multiplication.
  • RLC Random Linear Codes
  • a method is provided for systems (e.g., large-scale systems) where collaborative distributed Al model learning performance may need to be robust against disturbances such as straggler worker devices, system failures, communication issues, etc.
  • the method includes a data matrix-vector multiplication as part of a building block.
  • the data matrix-vector multiplication is computed at a computing device (e.g., a computing device (which, in some embodiments, may be a master node) based on the outputs of worker devices in a distributed computing cluster.
  • the Al model may include one of independent component analysis (ICA), principal component analysis (PCA), a convolutional neural network (CNN), and a deep neural network (DNN). Additionally, linear transformation may be included in signal processing, including any iterative intensive computation as class of processing .
  • ICA independent component analysis
  • PCA principal component analysis
  • CNN convolutional neural network
  • DNN deep neural network
  • Potential technical advantages provided by various embodiments of the present disclosure may include that based on the coded distributed Al model deciding an amount of redundancy to be injected in each iteration, in real time applications, the method may reduce effects of disturbances when, instead, an assumption on a worker device's capability is used. Additionally, when a MAB based decision framework is included in the method, the decision may be a model free plug-and-play (e.g., online, real-time) decision on the amount of redundancy to be injected in each iteration of the distributed coded Al model training.
  • a model free plug-and-play e.g., online, real-time
  • the method may allow selection of a reliable subset of worker devices for a real-world distributed coded AL model training system such as to minimize the training time in each iteration; and the online decision framework may help make online workload allocation decisions to execute the iterative Al model using distributed coded Al training.
  • Additional potential technical advantages based on the method deciding the amount of workload to be assigned to respective worker devices may include the following:
  • Model Free Approach Deployment of the method is not restricted to a particular Al model.
  • the method may be suitable for real-word application where most of the processes are stochastic.
  • worker devices can dynamically join and leave the computing cluster. When a worker device joins the computing cluster, it may not be practical to require the worker device to report its parameters (e.g., communication, computation capability, and reliability).
  • the method may be more practical than some approaches discussed herein based on a master node treating the performance of worker devices as a black box.
  • the master node also referred to herein as a central coordinator
  • the redundancy factor in each iteration is updated or chosen from an available set (e.g., using a MAB based framework).
  • the method may allow deployment of a master node that orchestrates the processing itself without additional external intervention.
  • Stragglers Mitigation in Distributed Computing Based on the method allowing, in each training iteration, an intelligent workload allocation and data collection communication bottlenecks, system disturbances, and node failures in distributed Al model training may be efficiently addressed.
  • Reduction of Energy Consumption Upon reception of a number of outputs (e.g., decided by an encoding algorithm and MAB arm), the master node does not need to wait. As a consequence, the computing capacity of the worker devices may be optimally used and, thus, power consumption may be reduced.
  • Cloud based implementation as a service The method may be generalized to various types of collaborative computing applications as a service. Thus, a subscriber of such a service may see its master node assisted by the method (e.g., especially for latency critical applications).
  • the method may include an online decision. As such, the method may be suitable for real-time applications (e.g., where latency and task scheduling are critical).
  • the method may be deployed based on any type of existing MAB algorithm. Thus, for each application and depending on resources, the method may be generalized to any type of application and any type of MAB algorithm.
  • Embodiments of the present disclosure include a homogeneous computing cluster of worker devices.
  • the worker devices have the same statistical characteristics in terms of their computation capability and reliabilities; and these worker devices remain in the cluster during the iterative Al learning process.
  • Figure 1 is schematic diagram illustrating an overview of distributed coded Al model learning with a homogeneous distributed computing cluster of worker devices 103a. . . 103n executing an iterative Al model in accordance with some embodiments of the present disclosure.
  • Al model 105 is iteratively trained based in a data matrix-vector multiplication.
  • N homogeneous worker devices 103 are connected to a backend master node 101 (also referred to as a central coordinator 101).
  • Master node 101 also may be a computing device for the method, as discussed further herein.
  • Master node 101 and the worker devices 103a. . . 103n form a distributed computing cluster.
  • Iterative Al model 105 is run on the distributed computing cluster.
  • a data matrix-vector multiplication y A ⁇ x is computed, where y is a result, A is a data matrix having co rows and b columns, x is a vector, y e fRL A co, l e ]R A (&J ⁇ b), and x e ]R A Z).
  • k is a redundancy factor (also referred to as a recovery threshold) that is a decision variable per iteration of training of Al model 105.
  • the redundancy factor k identifies an amount of workload to be assigned per worker device 103 of the distributed computing cluster in the iteration.
  • MDS coding is a widely adopted linear block coding technique. See e.g., K. Lee et al., “Speeding up distributed machine learning using codes”, IEEE Transaction on Information theory, Vol. 64, no. 3, pp. 1514-1529 (2017); N. Ding et al, “Optimal incentive and load design for distributed coded machine learning", IEEE Journal on Selected Areas in Communications, Vol. 39, no. 7, pp. 2090-2104 (2021) ("Ding”); R. Singleton, “Maximum distance Q-nary codes", IEEE Transactions on Information Theory, vol. 10, no. 2, pp. 116-118 (1964).
  • the A sheer submatrices may be generated by coding theoretic techniques.
  • master node 101 collects local computation results from a subset of the N worker devices.
  • an objective of the method of the present disclosure may be to minimize the overall execution time of this learning by determining an appropriate k value in each iteration m EM.
  • a reliability-workload trade-off of the determined value of k may be illustrated as follows: A smaller k value translates to a smaller number of local work devices' 103 computation results needed to construct a final, overall computation y. Thus, a higher system reliability may result as the master node 101 relies on a fewer number of worker devices 103. Thus, master node 101 may tolerate a larger number of malfunctioning worker devices 103. However, a smaller value k also leads to a larger I value and a higher row dimension in the submatrix ARAT that is assigned to each worker device 103. Overall, each worker device, thus, may need to tackle a higher computation workload. An appropriate k value may balance the reliability-workload trade-off.
  • FIG. 2 is schematic diagram illustrating a cloud-based implementation as a service in accordance with some embodiments of the present disclosure.
  • the cloud-based infrastructure may include a cloud based computing device 201 communicatively connected a network (e.g., 5G network 203).
  • the network includes communication connections to a plurality of distributed computing clusters 207a/205a - 207n/205n, each including a master node 207 and a plurality of worker devices 205 (one or more of which may be generally referred to as work device 205).
  • the implementation includes a trusted edge cloud where security and confidentiality are included.
  • the cloud-based implementation includes 5G network 203 that includes an access network, such as a radio access network (RAN) and a core network (not illustrated) which includes one or more core network nodes.
  • the access network may include one or more access network nodes, such as master nodes 207a. . . 207n (one or more of which may be generally referred to as master node 207), or any other similar 3rd Generation Partnership Project (3GPP) access node or non-3GPP access point.
  • the master nodes 207 facilitate direct or indirect connection of worker devices 205, such as by connecting worker devices 205 to the 5G network 203 over one or more wireless connections.
  • Example wireless communications over a wireless connection include transmitting and/or receiving wireless signals using electromagnetic waves, radio waves, infrared waves, and/or other types of signals suitable for conveying information without the use of wires, cables, or other material conductors.
  • the cloud-based implementation may include any number of wired or wireless networks, master nodes, computing devices, worker devices, and/or any other components or systems that may facilitate or participate in the communication of data and/or signals whether via wired or wireless connections.
  • the cloud-based implementation may include and/or interface with any type of communication, telecommunication, data, cellular, radio network, and/or other similar type of system.
  • the communication systems of Figures 1 and/or 2 enable connectivity between the worker devices, master nodes, and computing devices.
  • the communication systems may be configured to operate according to predefined rules or procedures, such as specific standards that include, but are not limited to: Global System for Mobile Communications (GSM); Universal Mobile Telecommunications System (UMTS); Long Term Evolution (LTE), and/or other suitable 2G, 3G, 4G, 5G standards, or any applicable future generation standard (e.g., 6G); wireless local area network (WLAN) standards, such as the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (WiFi); and/or any other appropriate wireless communication standard, such as the Worldwide Interoperability for Microwave Access (WiMax), Bluetooth, Z-Wave, Near Field Communication (NFC) ZigBee, LiFi, and/or any low-power wide-area network (LPWAN) standards such as LoRa and Sigfox.
  • GSM Global System for Mobile Communications
  • UMTS Universal Mobile Telecommunications System
  • LTE Long Term
  • a MAB model may be used in the method of the present disclosure.
  • a MAB may be used to model a tradeoff faced by automated Al model.
  • the MAB may aim to gain new knowledge by exploring its environment, and to exploit its current, reliable knowledge.
  • MAB problems may be a class of partial-information sequential resource allocation problems concerned with allocating between multiple options, where the benefits of each option are not known at the time of allocation. A benefit may be discovered as time passes and resources are reallocated.
  • the name "MAB" refers to a visualization of this problem.
  • the available parameters to the forecaster are the number of arms (or actions) K and the number of rounds n, unknown to the forecaster.
  • a cumulative regret goal may be to maximize the cumulative gains obtained.
  • the goal is to minimize: (Equation 1)
  • the environment is stochastic.
  • the gain vector g t is sampled from an unknown product distribution x ... .x v K on [0, 1] K that is g i t « v t .
  • the environment is adversarial in the way that the gain vector g t is chosen by an adversarial (which at time t, knows all the past, but not l t .
  • the unknown parameters to the forecaster are the reward distributions Vi, ... , v K of the arms (with respective mean g ...,g K ).
  • the algorithm may be deployed as follows:
  • the goal may be to minimize the expected cumulative regret: (Equation 3)
  • the distributed coded Al model training is mapped to a MAB algorithm.
  • Four features may delimit the MAB problem within the general class of stochastic control problems:
  • the randomness of the process may be characterized by the randomness of the computation time of each the randomness of communication time to dispatch and x, and the upload of y n .
  • the execution time ⁇ i for one iteration is also a random variable. If the selection of k in iteration m is denoted by then, the associated execution time for this iteration is given by
  • Equation 4 The formulation in Equation 4 is suitable for an MAB model.
  • % set can be viewed as the set of arms. Pulling an arm is mapped by the selection of from set in iteration m. Doing so, the computation time may be minimized, which is analogous to get a reward . Finally, the overall target is to maximize the total reward during the Al model training process.
  • the master node may not have information of either ⁇ (k i ) or , i in 3 when the distributed computing cluster is just formed before any computation is executed.
  • the master node therefore, may allow an initialization step to try different k values and obtain the knowledge of .
  • This operation of the method be considered as a warm-up phase.
  • Different warm-up phases can be designed according to the type of MAB algorithm to be deployed.
  • the online decision is fit into the MAB algorithm.
  • An online framework may involve sequential decision-making under uncertainty.
  • the agent or forecaster is the master node and is initially unaware of the stochastic evolution of the environment (that is, arms/redundancy factor), aims at maximizing a common objective based on the history of actions and observations.
  • FIG. 3 An example embodiment of the method of the present disclosure is illustrated in the signalling diagram shown in Figure 3.
  • computing device 101 which includes an MAB online decision model, distributed computing cluster 205/207, and iterative Al model 105.
  • Each component receives a set of instructions and performs processing to output some metrics or variables. Sequences of the operations of this example embodiment include the following.
  • Matrix A has constant row co and constant column b. The dimensions are w*l for y, w*b for A, and b*l for x.
  • Master node 207 forms (operation 303) the distributed computing cluster of available and trusted work devices 205 (e.g., loT edge devices) of size N. Master node 207 also activates or triggers the MAB online decision framework of computing device 101.
  • available and trusted work devices 205 e.g., loT edge devices
  • Step 1 Master node 207 passes (operation 305) to the MAB online decision model of computing device 101 the number of rows ⁇ of the matrix and the size of the cluster N.
  • the MAB online decision model sets (operation 311) all execution time ⁇ i (mapped as a reward) and the number of times each arm is selected ⁇ i to zero.
  • Step 2 At iteration m, iterative Al model 105 releases (operation 321) the matrix A and the data x and requests (operation 315) the redundancy factor from computing device 101.
  • Computing device 101 activates (operation 317) the MAB model to select the appropriate ⁇ which is the arm that may lead to minimum execution time.
  • the MAB online decision model sends (operation 319) the selected redundancy factor to master node 207.
  • Step 3 Master node 207 encodes (operation 323) the matrix A into a plurality of submatrices using a coding theoretic technique such as MDS. Master node 207 multicasts (operation 323) to each worker device 205 the pair . After computation of the distributed local matrix-vector multiplication at each worker device 205 in the distribution, master node 207 collects (operation 323) results of a subset of the worker devices 205 in the distribution. Master node 207 decodes (operation 323) the results and extracts the final, overall result y.
  • a coding theoretic technique such as MDS.
  • Step 4 Master node 207 passes (operation 329) the final, overall result y to iterative Al model 105. Master node 207 also consolidates (operation 323) the final execution time associated with the redundancy factor and sends (operation 327) it to computing device 101.
  • Step 5 Computing device 101 updates (operation 331) the parameters and ⁇ i according to the algorithms used.
  • Step 6 Iterative Al model 105 verifies (operation 333) according to a criteria whether the Al model 105 has converged: If Al model 105 has converged, the iterative Al training ends and all resources are released (operation 335). If Al model 105 has not converged, the method moves to the next iteration (m + 1) and the method restarts at Step 2.
  • FIG 4 is a flowchart illustrating operations of a computing device (e.g., computing device 101) according to some embodiments of the present disclosure.
  • the computing device can be computing device 1200 of Figure 12 (as discussed further herein) that is configured for iterative training of a collaborative distributed coded Al model (e.g., Al model 105).
  • the method includes receiving (409) a request from the Al model for a redundancy factor for an iteration of training of the Al model.
  • the redundancy factor comprises an amount of workload to be assigned per worker device (e.g., worker device 205a) of a distributed computing cluster (e.g., distributed computing cluster 207/205) in the iteration.
  • the method further includes selecting (411) the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors.
  • the method further includes sending (413) the selected redundancy factor to a master node (e.g., master node 207) for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
  • the method may further include receiving (415), from the master node, an overall execution time for the distributed coded execution of the multiplication of the data matrix and the vector in the iteration by a subset of a plurality of worker devices in the distributed computing cluster.
  • the use of the ML model may comprise (i) per iteration in a set of iterations, choosing a redundancy factor from the set of redundancy factors, (ii) per iteration in the set of iterations, receiving a reward value for the chosen redundancy factor, and (ii) in the iteration, selecting the redundancy factor from the set of redundancy factors that has a highest reward value.
  • the reward value has an inverse relationship to an overall execution time for the distributed coded execution of the multiplication of the data matrix and the vector in the iteration.
  • the method further includes receiving (401), from the master node, a first parameter defining a size of the distributed computing cluster; receiving (403) from the Al model a second parameter defining a number of rows in the data matrix; and identifying (405) the set of redundancy factors based on the number of rows in the data matrix.
  • the method may further include initializing (407) values of a plurality of parameters in the ML model to zero.
  • the plurality of parameters may comprise (i) a number of times that redundancy factors are selected from the set of redundancy factors, and (ii) an average reward value of the selected redundancy factors.
  • the method may further include updating (417) the ML model with (i) the number of times that redundancy factors are selected, and (ii) the average reward value of the selected redundancy factors where the average reward value has an inverse relationship with the received overall execution time.
  • the selecting (413) may include an online decision that selects the redundancy factor.
  • the selected redundancy factor may be suitable for a mission critical operation.
  • the ML model may comprise a MAB model.
  • the plurality of worker devices may comprise a plurality of Internet of
  • FIG. 5 is a flowchart illustrating operations of a master node (e.g., master node 207) according to some embodiments of the present disclosure.
  • the master node may be master node 1300 of Figure 13 (as discussed further herein).
  • the master node is in a distributed computing cluster for iterative training of a collaborative distributed coded Al model (e.g., Al model 105).
  • the method includes receiving (501), from a computing device (e.g., computing device 101), a redundancy factor.
  • the redundancy factor comprises an amount of workload to be assigned per worker device (e.g., worker device 205a) of the distributed computing cluster in an iteration.
  • the method further includes receiving (503) a data matrix and a vector from the Al model; encoding (505) the data matrix into a plurality of submatrices; and distributing (507) a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor.
  • the method further includes collecting (509) respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extracting (511) an overall result from the collected respective results; and sending (513) the overall result to the Al model to determine whether the training is completed.
  • the method may further include identifying (515) an overall execution time for the distributed coded execution of the multiplication of the respective submatrix and the vector in the iteration of the training of the Al model by the respective worker devices in the subset of worker devices in the distributed computing cluster.
  • the method further includes sending (517) the overall execution time to the computing device.
  • Operations 515-517 from the flow chart of Figure 5 may be optional with respect to some embodiments of master nodes and related methods.
  • the MAB online decision model may be, without limitation, an ⁇ -greedy or a upper confidence bound 1 (UCB1) model.
  • UMB1 upper confidence bound 1
  • N 500 loT edge devices (i.e., worker devices).
  • the simulation also assumed that the number of iterations up to convergence is much larger than the set of redundancy factors (that is M » I).
  • the list of redundancy factors, therefore, is given by:
  • K ⁇ 1, 2, 4, 5, 8, 10, 16, 20, 25, 32, 40, 50, 64, 80, 100, 125, 160, 200, 250, 320, 400, 500 ⁇
  • the master node triggers the example algorithm above to make an online decision.
  • the number of pulls for arm k _ c . is denoted by ⁇ i .
  • the reward harvested by selecting .
  • the first I iterations of the example algorithm are used to pull every arm once and get an initial result of its resulting performance (Line 2 to Line 5). Other variants of initialization may be used in this sequence.
  • Simulation Step 2 In the example algorithm, the empirical average execution time by selecting arm k _ c . is stored as After pulling each arm once, the main body of the example algorithm is performed for the remaining iterations (Line 6 to Line 10). An arm is selected in each iteration according to a certain equation in Line 7 of the example MAB algorithm.
  • e-greedy a hyperparameter e is included, where 0 ⁇ e ⁇ l.
  • e-greedy the following approach for arm selection is used: _ ua tion
  • mapping was used in the simulation to estimate the overall execution time.
  • D distri is the downlink distribution
  • ⁇ b is the number of floatingpoint elements in ARIC and x that need to be distributed to a worker.
  • V distri is a coefficient that translates a floating number into its corresponding size in a data packet
  • ⁇ distri is a discrete random number that represents the number of transmissions required to successfully distribute A réelle and x to a worker.
  • P distri is the probability of a successful transmission. See e.g., S. Dhakal et al., Proceeding of IEEE 90 th Vehicular Technology Conference (VTC2019-Fall) (2019), pp. 1-6; "Dhakal") H. Karl and W. Willig, "Protocols and architectures for wireless senso networks", John Wiley & Sons, 2007 (“Karl”).
  • ⁇ up is the uplink bandwidth
  • b is the number of floating-point elements in y n to be uploaded back to the master node
  • V up is a coefficient that translated a floating-point number to its corresponding size in a data packet
  • /3 up is a discrete random number that represents the number of transmission for successfully uploading y n to the master node.
  • the random variable /3 up follows a geometric distribution given by: (Equation 15)
  • P up is the probability of a successful transmission. See e.g., Dhakal and Karl.
  • Simulation results are now discussed, including extrapolation of simulation steps 4-6 from analysis of the simulation results.
  • the UCB1 model achieves the best average execution time (e.g., much better that the random algorithm which did not make an intelligent decision).
  • Table 2 lists the total number of pulls for the arms in the simulation.
  • the simulation includes 5000 iterations a set of 23 arms.
  • Table 2 illustrates the total numbers of pull for the arms whose k t values are no larger than 100, and the number of pulls for each individual arm whose k t is larger than 100:
  • Figures 7A-7C The per-iteration reward evolution for the simulation is plotted in Figures 7A-7C versus random ( Figure 7D), and the number of pulls for each arm in the simulation is illustrated in the plots of Figures 8-11.
  • Figure 7C is a plot of the simulation results for the UCB1 model
  • Figure 7D is a plot of results from a random selection.
  • Figure 10 is a plot for the simulation of the number of pulls for each arm for the UCB1 model
  • Figure 11 is a plot for the random selection of each arm.
  • FIG. 12 is a block diagram illustrating elements of a computing device 1200 (also referred to as a central node, a central coordinating node, a server, a base station, gNodeB/gNB, etc.) according to embodiments of inventive concepts.
  • the computing device may include transceiver circuitry 1201 (also referred to as a transceiver) including a transmitter and a receiver configured to provide uplink and downlink communications with worker devices, other computing devices, etc.
  • the computing device may include network interface circuitry 1207 (also referred to as a network interface) configured to provide communications with worker devices and other computing devices.
  • the computing device may also include processing circuitry 1203 (also referred to as a processor) coupled to the transceiver circuitry, and memory circuitry 1205 (also referred to as memory) coupled to the processing circuitry.
  • the memory circuitry 1205 may include computer readable program code that when executed by the processing circuitry 1203 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1203 may be defined to include memory so that a separate memory circuitry is not required.
  • the computing device may include a ML model 1209 (e.g., an MAB model). [00133] As discussed herein, operations of the computing device may be performed by processing circuitry 1203, ML 1209, network interface 1207, and/or transceiver 1201.
  • processing circuitry 1203 may control transceiver 1201 to transmit downlink communications through transceiver 1201 to one or more worker devices and/or master node and/or to receive uplink communications through transceiver 1201 from one or more worker devices and/or master node.
  • modules may be stored in memory 1205 and/or ML model 1209, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1203, processing circuitry 1203 performs respective operations (e.g., operations discussed herein with respect to example embodiments relating to computing devices).
  • computing device 1200 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
  • FIG. 13 is a block diagram illustrating elements of a master node 1300 (also referred to as a server, a gNodeB/gNB, base station, etc.) according to embodiments of inventive concepts.
  • the master node may include transceiver circuitry 1301 (also referred to as a transceiver) including a transmitter and a receiver configured to provide uplink and downlink communications with other computing devices, worker devices, etc.
  • the master node includes network interface circuitry 1307 (also referred to as a network interface) configured to provide communications with other computing devices and worker devices (e.g., with computing devices, etc.).
  • the master node may also include processing circuitry 1303 (also referred to as a processor) coupled to the transceiver circuitry, and memory circuitry 1305 (also referred to as memory) coupled to the processing circuitry.
  • the memory circuitry 1305 may include computer readable program code that when executed by the processing circuitry 1303 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1303 may be defined to include memory so that a separate memory circuitry is not required.
  • the master node may include an Al model 1309.
  • operations of the inaccessible computing device may be performed by processing circuitry 1303, Al model 1309, network interface 1307, and/or transceiver 1301.
  • processing circuitry 1303 may control transceiver 1301 to transmit downlink communications through transceiver 1301 to one or more computing devices and/or work devices and/or to receive uplink communications through transceiver 1301 from one or more computing devices and/or worker devices.
  • modules may be stored in memory 1305 and/or Al model 1309, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1303, processing circuitry 1303 performs respective operations (e.g., operations discussed herein with respect to example embodiments relating to master nodes).
  • master node 1300 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
  • FIG 14 is a block diagram illustrating elements of a worker device 1400 according to embodiments of inventive concepts.
  • the worker device may include transceiver circuitry 1401 (also referred to as a transceiver) including a transmitter and a receiver configured to provide uplink and downlink communications with computing devices, master nodes, other worker devices, etc.
  • the worker device may include network interface circuitry 1407 (also referred to as a network interface) configured to provide communications with computing devices, master nodes and/or worker devices.
  • the worker device may also include processing circuitry 1403 (also referred to as a processor) coupled to the transceiver circuitry, and memory circuitry 1405 (also referred to as memory) coupled to the processing circuitry.
  • the memory circuitry 1405 may include computer readable program code that when executed by the processing circuitry 1403 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1403 may be defined to include memory so that a separate memory circuitry is not required.
  • operations of the worker device may be performed by processing circuitry 1403, network interface 1407, and/or transceiver 1401.
  • processing circuitry 1403 may control transceiver 1401 to transmit downlink communications through transceiver 1401 to one or more computing devices, master nodes, worker devices and/or to receive uplink communications through transceiver 1401 from one or more computing devices, master nodes, and/or worker devices.
  • modules may be stored in memory 1405, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1403, processing circuitry 1403 performs respective operations (e.g., operations discussed herein with respect to example embodiments relating to worker devices).
  • worker device 1400 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
  • the worker devices may be any of a wide variety of communication devices, including wireless devices arranged, configured, and/or operable to communicate wirelessly with the computing device 1200, the master node 1300, and other communication devices.
  • the master node 1300 and/pr the computing device are arranged, capable, configured, and/or operable to communicate directly or indirectly with the worker devices and/or with other computing devices, master nodes, network nodes or equipment in network to enable and/or provide communications and operations of example embodiments discussed herein with respect to worker devices.
  • a worker device refers to a device capable, configured, arranged and/or operable to communicate wirelessly with computing devices, master nodes, and/or other worker devices.
  • Examples of a worker device UE include, but are not limited to, a smart phone, mobile phone, cell phone, voice over IP (VoIP) phone, wireless local loop phone, desktop computer, personal digital assistant (PDA), wireless cameras, gaming console or device, music storage device, playback appliance, wearable terminal device, wireless endpoint, mobile station, tablet, laptop, laptop-embedded equipment (LEE), laptop-mounted equipment (LME), smart device, wireless customer-premise equipment (CPE), vehicle-mounted or vehicle embedded/integrated wireless device, etc.
  • VoIP voice over IP
  • LME laptop-embedded equipment
  • LME laptop-mounted equipment
  • CPE wireless customer-premise equipment
  • UE any UE identified by the 3rd Generation Partnership Project (3GPP), including a narrow band internet of things (NB-loT) user equipment (UE), a machine type communication (MTC) UE, and/or an enhanced MTC (eMTC) UE.
  • 3GPP 3rd Generation Partnership Project
  • NB-loT narrow band internet of things
  • MTC machine type communication
  • eMTC enhanced MTC
  • a worker device may support device-to-device (D2D) communication, for example by implementing a 3GPP standard for sidelink communication, Dedicated Short- Range Communication (DSRC), vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), or vehicle-to-everything (V2X).
  • D2D device-to-device
  • DSRC Dedicated Short- Range Communication
  • V2V vehicle-to-vehicle
  • V2I vehicle-to-infrastructure
  • V2X vehicle-to-everything
  • a worker device may not necessarily have a user in the sense of a human user who owns and/or operates the relevant device. Instead, a worker device may represent a device that is intended for sale to, or operation by, a human user but which may not, or which may not initially, be associated with a specific human user (e.g., a smart sprinkler controller).
  • a worker device may represent a device that is not intended for sale to, or operation by, an end user but which may be associated with or operated for the benefit of a user (e.g., a smart power meter).
  • a user e.g., a smart power meter.
  • the computing devices, nodes, and worker devices described herein may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices, nodes, and worker devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions, and methods disclosed herein.
  • Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the computing device, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination.
  • processing circuitry may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the computing device, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination.
  • computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components.
  • a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface.
  • non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive
  • processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non- transitory computer-readable storage medium.
  • some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner.
  • the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device, but are enjoyed by the computing device as a whole, and/or by end users and a wireless network generally.
  • the terms “comprise”, “comprising”, “comprises”, “include”, “including”, “includes”, “have”, “has”, “having”, or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof.
  • the common abbreviation “e.g.”, which derives from the Latin phrase “exempli gratia” may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item.
  • the common abbreviation “i.e.”, which derives from the Latin phrase “id est,” may be used to specify a particular item from a more general recitation.
  • Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits.
  • These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Hardware Redundancy (AREA)

Abstract

A method performed by a computing device (101, 1200) for iterative training of a collaborative distributed coded AI model is provided. The method includes receiving (409) a request from the AI model for a redundancy factor for an iteration of training of the AI model. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration. The method further includes selecting (411) the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The method further includes sending (413) the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the AI model.

Description

ITERATIVE TRAINING OF COLLABORATIVE DISTRIBUTED CODED ARTIFICIAL INTELLIGENCE MODEL
TECHNICAL FIELD
[0001] The present disclosure relates generally to iterative training of a collaborative distributed coded artificial intelligence (Al) model, and related methods and apparatuses.
BACKGROUND
[0002] In the context of collaborative distributed Al/machine learning (ML) (referred to herein as an Al model), a parameter server (e.g., a master node) and multiple Internet of Things (loT) edge devices (also referred to as worker devices) cooperatively work to complete AI/ML training. See e.g., W. Y. B. Lim et al., "Incentive mechanism design for resource sharing in collaborative learning", arXiv preprint arXiv:2016.00511, https://doi.org/10.48550/arXiv.2006.00511 (2020) (accessed on 14 March 2022).
[0003] Considering a trusted cluster of loT devices, a training dataset can be shared across the edge devices. The master node trains the Al model based on the data the master node has collected. The Al model training is assumed to converge after a certain number of iterations. In collaborative distributed Al model training, and in ideal conditions of communication and computation, each loT device is responsible of performing a portion of processing during each training iteration. The loT devices output results are then sent back to the master node for aggregation or combination.
SUMMARY
[0004] In collaborative distributed Al model training, some worker devices (e.g., loT edge devices) in a distributed computing cluster may not be reliable in computation (e.g., computing processing unit (CPU) overloaded, system failure, etc.) and communications (e.g., limited communication bandwidth, increased latency, etc.), especially in wireless communication. Thus, collaborative distributed Al learning may become challenging. Moreover, worker devices (e.g., loT edge devices) in a low coverage zone may affect such a training process as data may arrive late. Such a scenario may not be suitable for realtime applications. [0005] Worker devices may become stragglers and, thus, may have an effect of delaying the learning process. Some approaches using a distributed coded Al strategy may lack intelligence and/or an online decision during the dispatching of workload to each worker device and/or in the collection of output results by a central coordinator node. [0006] Potential advantages provided by various embodiments of the present disclosure may include that the method includes operations that may perform an online workload allocation decision to execute an iterative Al model using distributed coded Al model training. As a consequence, workloads may be intelligently assigned across worker devices in each iteration and latency may be reduced or minimized.
[0007] In various embodiments, a method performed by a computing device is provided for iterative training of a collaborative distributed coded Al model. The method includes receiving a request from the Al model for a redundancy factor for an iteration of training of the Al model. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration. The method further includes selecting the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The method further includes sending the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
[0008] In various embodiments, a computing device is provided. The computing device includes processing circuitry, and at least one memory coupled with the processing circuitry. The memory stores program code that is executed by the processing circuitry to perform operations. The operations include receive a request from the Al model for a redundancy factor for an iteration of training of the Al model. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration. The operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model. [0009] In various embodiments, a computing device is provided that is adapted to perform operations comprising receive a request from the Al model for a redundancy factor for an iteration of training of the Al model. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration. The operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model. [0010] In various embodiments, a computer program product including a non- transitory storage medium including program code to be executed by processing circuitry of a computing device is provided. Execution of the program code causes the computing device to perform operations comprising receive a request from the Al model for a redundancy factor for an iteration of training of the Al model. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration. The operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
[0011] In various embodiments, a computer program including program code to be executed by processing circuitry of a computing device is provided. The program code causes the computing device to perform operations comprising receive a request from the Al model for a redundancy factor for an iteration of training of the Al model. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration. The operations further include select the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The operations further include send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
[0012] In various embodiments, a method performed by a master node in a distributed computing cluster for iterative training of a collaborative distributed coded Al model is provided for iterative training of a collaborative distributed coded Al model. The method includes receiving, from a computing device, a redundancy factor. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration. The method further includes receiving a data matrix and a vector from the Al model; encoding the data matrix into a plurality of submatrices; distributing a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor. The method further includes collecting respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster. The method further includes extracting an overall result from the collected respective results; and sending the overall result to the Al model to determine whether the training is completed.
[0013] In various embodiments, a master node is provided. The master node includes processing circuitry, and at least one memory coupled with the processing circuitry. The memory stores program code that is executed by the processing circuitry to perform operations. The operations include receive, from a computing device, a redundancy factor. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration. The operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor. The method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster. The method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
[0014] In various embodiments, a master node is provided that is adapted to perform operations comprising receive, from a computing device, a redundancy factor.
The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration. The operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor. The method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster. The method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
[0015] In various embodiments, a computer program product including a non- transitory storage medium including program code to be executed by processing circuitry of a master node is provided. Execution of the program code causes the master node to perform operations comprising receive, from a computing device, a redundancy factor.
The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration. The operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor. The method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster. The method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed. [0016] In various embodiments, a computer program including program code to be executed by processing circuitry of a master node is provided. The program code causes the master node to perform operations comprising receive, from a computing device, a redundancy factor. The redundancy factor comprises an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration. The operations further include receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor. The method further includes collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster. The method further includes extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
BRIEF DESCRIPTION OF DRAWINGS
[0017] The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiments of inventive concepts. In the drawings:
[0018] Figure 1 is a schematic diagram of an overview of distributed coded Al model learning in accordance with some embodiments of the present disclosure;
[0019] Figure 2 is a schematic diagram illustrating a cloud-based implementation as a service in accordance with some embodiments of the present disclosure;
[0020] Figure 3 is a signalling diagram in accordance with some embodiments of the present disclosure;
[0021] Figure 4 is a flow chart illustrating operations of a computing device in accordance with some embodiments of the present disclosure;
[0022] Figure 5 is a flow chart illustrating operations of a master node in accordance with some embodiments of the present disclosure; [0023] Figure 6 is a plot of empirical average execution time for each arm for a simulation in accordance with some embodiments of the present disclosure;
[0024] Figures 7A-7D are plots of per-iteration reward evolution for the simulation in accordance with some embodiments of the present disclosure;
[0025] Figures 8-11 are plots of numbers of pulls for each arm in the simulation in accordance with some embodiments of the present disclosure;
[0026] Figure 12 is a block diagram of a computing device in accordance with some embodiments of the present disclosure;
[0027] Figure 13 is a block diagram of a master node in accordance with some embodiments of the present disclosure; and
[0028] Figure 14 is a block diagram of a worker device in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION
[0029] Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.
[0030] The following description presents various embodiments of the disclosed subject matter. These embodiments are presented as teaching examples and are not to be construed as limiting the scope of the disclosed subject matter. For example, certain details of the described embodiments may be modified, omitted, or expanded upon without departing from the scope of the described subject matter.
[0031] The following explanation of potential problems with some approaches is a present realization as part of the present disclosure and is not to be construed as previously known by others. [0032] As previously referenced, in collaborative distributed Al model training, some loT edge devices in a cluster may not be reliable in computation (e.g., CPU overloaded, system failure, etc.) and communications (e.g., limited communication bandwidth, increased latency, etc.), especially in wireless communication. Due to a heterogeneous and time-varying nature of loT edge devices availability, collaborative distributed Al learning may become challenging.
[0033] Some computing, e.g., edge computing, includes distributed computing and data storage for services with low latency requirements to help enable ultra-fast interactions and/or responsiveness. Resources may be unbalanced in an edge computing scenario and edge devices may be located in different fifth generation (5G) coverage zones, such as high, low, or medium coverage zones. Latency for edge devices in a low coverage zone may be higher than latency for edge devices in a high coverage zone.
Thus, edge devices in the low coverage zone may affect the training process as data may arrive late. Such a scenario may not be suitable for real-time applications.
[0034] Some approaches have used distributed coded techniques to address such a scenario. These techniques may allow injection of erasure and error-correcting codes to improve the reliability via coded computation. This injection is achieved by intelligently adding some redundancy to the data assigned to loT edge devices for a subtask in each iteration.
[0035] In an effort to mitigate the effects of possible straggler worker devices (e.g., due to limited communication bandwidth and/or increased latency), some distributed coded Al approaches can increase the computational workload overhead assigned to the worker devices. Some additional challenges associated with increased workload may include system disturbances, such as slow-down or failures of an individual worker device(s).
[0036] As a consequence, worker devices exposed to such issues may become stragglers and, thus, may have an effect of delaying the learning process. Such worker devices may also slow down the time to achieve convergence, lower the accuracy, and may lead to challenging analysis and debugging. Benefits of a distributed coded Al strategy over uncoded implementation, however, may be restricted by lack of intelligence and lack of an online decision during the dispatching of workload to each worker device and/or in the collection of the output results by the central coordinator or the master node.
[0037] For example, in each iteration, a data matrix-vector multiplication y = A ■ x is computed. To complete an Al learning task as soon as possible, it may be preferable to minimize the long-term computation over a certain number of iterations. If, e.g., 5,000 iterations are needed to ensure the convergence with each computation of the data matrix-vector multiplication, a target may be to minimize the overall training time over these 5,000 iterations by appropriately using the distributed coded Al cluster at the network edge. Thus, it may be desirable to inject an appropriate amount of redundancy to the computation at the worker device(s) in each iteration.
[0038] To determine the redundancy injected to the assigned workloads in distributed coded Al training, some approaches may exploit an offline model-based approach. Such approaches may assume that the worker device's communication and computation capabilities follow certain distributions or probability, such as exponential distributions. Such approaches also may assume that the parameters of the distributions are exactly known in advance, which may be unrealistic for real-word systems and applications.
[0039] For example, K. Lee et al., "Speeding up distributed machine learning using codes," IEEE Transaction on Information theory, Vol. 64, no. 3, pp. 1514-1529 (2017) considers a homogeneous cluster with Maximum Distance Separable (MDS) codes to conduct matrix-vector multiplication. D. Kim et al., "Optimal load allocation for coded distributed computation in heterogeneous clusters," IEEE Transaction on Communications, vol. 69, no. 1, pp. 44-58 (2021) considers a heterogeneous cluster with MDS code for matrix-vector multiplication. A. Reisizadeh et al., "Coded computation over heterogeneous cluster," IEEE Transaction on Information Theory, vol. 65, no. 7, pp. 4227- 4242 (2019) considers a heterogeneous cluster with Random Linear Codes (RLC) for matrix-multiplication.
[0040] Such approaches assume that the workload execution time on each worker device follows an exponential distribution, whose parameters are exactly known. The approaches derived a closed-form expression of the amount of redundancy to be injected to the worker devices, which may not be (e.g., cannot be) achieved in real applications. [0041] In another approach, Wenchao Xia et al., "Multi-Armed Bandit-Based Client Scheduling for Federated Learning," IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7108-7123 (Nov. 2020), doi: 10.1109/TWC.2020.3008091 ("Xia") experimented with a Multi-Armed Bandit algorithm in a different context. Xia discusses the effectiveness of the framework for online client scheduling in Federated Leaning without knowing wireless channel state information and statistical characteristics of clients.
[0042] Various embodiments of the present disclosure may provide solutions to these and other potential problems. A method is provided for systems (e.g., large-scale systems) where collaborative distributed Al model learning performance may need to be robust against disturbances such as straggler worker devices, system failures, communication issues, etc. In each iteration of a collaborative distributed learning Al model, the method includes a data matrix-vector multiplication as part of a building block. The data matrix-vector multiplication is computed at a computing device (e.g., a computing device (which, in some embodiments, may be a master node) based on the outputs of worker devices in a distributed computing cluster.
[0043] The Al model may include one of independent component analysis (ICA), principal component analysis (PCA), a convolutional neural network (CNN), and a deep neural network (DNN). Additionally, linear transformation may be included in signal processing, including any iterative intensive computation as class of processing .
[0044] Potential technical advantages provided by various embodiments of the present disclosure may include that based on the coded distributed Al model deciding an amount of redundancy to be injected in each iteration, in real time applications, the method may reduce effects of disturbances when, instead, an assumption on a worker device's capability is used. Additionally, when a MAB based decision framework is included in the method, the decision may be a model free plug-and-play (e.g., online, real-time) decision on the amount of redundancy to be injected in each iteration of the distributed coded Al model training. As a consequence, the method may allow selection of a reliable subset of worker devices for a real-world distributed coded AL model training system such as to minimize the training time in each iteration; and the online decision framework may help make online workload allocation decisions to execute the iterative Al model using distributed coded Al training. [0045] Additional potential technical advantages based on the method deciding the amount of workload to be assigned to respective worker devices may include the following:
[0046] Avoidance of lost Al model updates and minimize the latency: If workload is not intelligently assigned across worker devices in each iteration, the overall training may be affected by delay caused by overloaded processing of some worker devices. Thus, model aggregation or reconstruction may not be achieved on time. Application performance also may decrease and may not fit a real-time process. The method may reduce (e.g., significantly reduce) the convergence time of the Al model as the master node does not need to wait for the slowest worker device(s) responses.
[0047] Model Free Approach: Deployment of the method is not restricted to a particular Al model. The method may be suitable for real-word application where most of the processes are stochastic. In the real-word, worker devices can dynamically join and leave the computing cluster. When a worker device joins the computing cluster, it may not be practical to require the worker device to report its parameters (e.g., communication, computation capability, and reliability).
[0048] Online Decision as Plug-and-Play: The method may be more practical than some approaches discussed herein based on a master node treating the performance of worker devices as a black box. As soon a distributed computing cluster is formed, the master node (also referred to herein as a central coordinator) may start to work with the distributed computing cluster and figure out the worker devices' capability and reliability by itself. The redundancy factor in each iteration is updated or chosen from an available set (e.g., using a MAB based framework).
[0049] Self-Organized and Intelligent Central Coordinator: The method may allow deployment of a master node that orchestrates the processing itself without additional external intervention.
[0050] Stragglers Mitigation in Distributed Computing: Based on the method allowing, in each training iteration, an intelligent workload allocation and data collection communication bottlenecks, system disturbances, and node failures in distributed Al model training may be efficiently addressed. [0051] Reduction of Energy Consumption: Upon reception of a number of outputs (e.g., decided by an encoding algorithm and MAB arm), the master node does not need to wait. As a consequence, the computing capacity of the worker devices may be optimally used and, thus, power consumption may be reduced.
[0052] Cloud based implementation as a service: The method may be generalized to various types of collaborative computing applications as a service. Thus, a subscriber of such a service may see its master node assisted by the method (e.g., especially for latency critical applications).
[0053] Real-Time Applications: The method may include an online decision. As such, the method may be suitable for real-time applications (e.g., where latency and task scheduling are critical).
[0054] Generalization and Application Specific: The method may be deployed based on any type of existing MAB algorithm. Thus, for each application and depending on resources, the method may be generalized to any type of application and any type of MAB algorithm.
[0055] Embodiments of the present disclosure include a homogeneous computing cluster of worker devices. The worker devices have the same statistical characteristics in terms of their computation capability and reliabilities; and these worker devices remain in the cluster during the iterative Al learning process.
[0056] Figure 1 is schematic diagram illustrating an overview of distributed coded Al model learning with a homogeneous distributed computing cluster of worker devices 103a. . . 103n executing an iterative Al model in accordance with some embodiments of the present disclosure. Al model 105 is iteratively trained based in a data matrix-vector multiplication.
[0057] As illustrated in Figure 1, N homogeneous worker devices 103 are connected to a backend master node 101 (also referred to as a central coordinator 101). Master node 101 also may be a computing device for the method, as discussed further herein. Master node 101 and the worker devices 103a. . . 103n form a distributed computing cluster. Iterative Al model 105 is run on the distributed computing cluster. In each iteration, labeled "m”, a data matrix-vector multiplication y = A ■ x is computed, where y is a result, A is a data matrix having co rows and b columns, x is a vector, y e fRLAco, l e ]RA (&J ■ b), and x e ]RAZ). In each iteration, a> and b remain unchanged, k is a redundancy factor (also referred to as a recovery threshold) that is a decision variable per iteration of training of Al model 105. The redundancy factor k identifies an amount of workload to be assigned per worker device 103 of the distributed computing cluster in the iteration.
[0058] MDS coding is a widely adopted linear block coding technique. See e.g., K. Lee et al., "Speeding up distributed machine learning using codes", IEEE Transaction on Information theory, Vol. 64, no. 3, pp. 1514-1529 (2017); N. Ding et al, "Optimal incentive and load design for distributed coded machine learning", IEEE Journal on Selected Areas in Communications, Vol. 39, no. 7, pp. 2090-2104 (2021) ("Ding"); R. Singleton, "Maximum distance Q-nary codes", IEEE Transactions on Information Theory, vol. 10, no. 2, pp. 116-118 (1964). Given the redundancy factor k, where 1 < k < N, k G Z, a submatrix A_n G ]RA(Z ■ b) is assigned to worker device 103 n E N = {1, 2, 3, . . , N}, where I = a>/k and I G Z. The A„ submatrices may be generated by coding theoretic techniques.
[0059] Still referring to Figure 1, in order for master node 101 to recover the computation result y, master node 101 collects local computation results from a subset of the N worker devices.
[0060] Considering an index set of the iterations M={1,2,3,..,M}, and assuming that the iterative Al model 105 takes M iterations to converge, an objective of the method of the present disclosure may be to minimize the overall execution time of this learning by determining an appropriate k value in each iteration m EM.
[0061] A reliability-workload trade-off of the determined value of k may be illustrated as follows: A smaller k value translates to a smaller number of local work devices' 103 computation results needed to construct a final, overall computation y. Thus, a higher system reliability may result as the master node 101 relies on a fewer number of worker devices 103. Thus, master node 101 may tolerate a larger number of malfunctioning worker devices 103. However, a smaller value k also leads to a larger I value and a higher row dimension in the submatrix A„ that is assigned to each worker device 103. Overall, each worker device, thus, may need to tackle a higher computation workload. An appropriate k value may balance the reliability-workload trade-off. [0062] Figure 2 is schematic diagram illustrating a cloud-based implementation as a service in accordance with some embodiments of the present disclosure. The cloud-based infrastructure may include a cloud based computing device 201 communicatively connected a network (e.g., 5G network 203). The network includes communication connections to a plurality of distributed computing clusters 207a/205a - 207n/205n, each including a master node 207 and a plurality of worker devices 205 (one or more of which may be generally referred to as work device 205). The implementation includes a trusted edge cloud where security and confidentiality are included.
[0063] In the example of Figure 2, the cloud-based implementation includes 5G network 203 that includes an access network, such as a radio access network (RAN) and a core network (not illustrated) which includes one or more core network nodes. The access network may include one or more access network nodes, such as master nodes 207a. . . 207n (one or more of which may be generally referred to as master node 207), or any other similar 3rd Generation Partnership Project (3GPP) access node or non-3GPP access point. The master nodes 207 facilitate direct or indirect connection of worker devices 205, such as by connecting worker devices 205 to the 5G network 203 over one or more wireless connections.
[0064] Example wireless communications over a wireless connection include transmitting and/or receiving wireless signals using electromagnetic waves, radio waves, infrared waves, and/or other types of signals suitable for conveying information without the use of wires, cables, or other material conductors. Moreover, in different embodiments, the cloud-based implementation may include any number of wired or wireless networks, master nodes, computing devices, worker devices, and/or any other components or systems that may facilitate or participate in the communication of data and/or signals whether via wired or wireless connections. The cloud-based implementation may include and/or interface with any type of communication, telecommunication, data, cellular, radio network, and/or other similar type of system. [0065] As a whole, the communication systems of Figures 1 and/or 2 enable connectivity between the worker devices, master nodes, and computing devices. In that sense, the communication systems may be configured to operate according to predefined rules or procedures, such as specific standards that include, but are not limited to: Global System for Mobile Communications (GSM); Universal Mobile Telecommunications System (UMTS); Long Term Evolution (LTE), and/or other suitable 2G, 3G, 4G, 5G standards, or any applicable future generation standard (e.g., 6G); wireless local area network (WLAN) standards, such as the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (WiFi); and/or any other appropriate wireless communication standard, such as the Worldwide Interoperability for Microwave Access (WiMax), Bluetooth, Z-Wave, Near Field Communication (NFC) ZigBee, LiFi, and/or any low-power wide-area network (LPWAN) standards such as LoRa and Sigfox.
[0066] As discussed previously herein, a MAB model may be used in the method of the present disclosure. A MAB may be used to model a tradeoff faced by automated Al model. The MAB may aim to gain new knowledge by exploring its environment, and to exploit its current, reliable knowledge. MAB problems may be a class of partial-information sequential resource allocation problems concerned with allocating between multiple options, where the benefits of each option are not known at the time of allocation. A benefit may be discovered as time passes and resources are reallocated. The name "MAB" refers to a visualization of this problem.
[0067] In a MAB game, the available parameters to the forecaster are the number of arms (or actions) K and the number of rounds n, unknown to the forecaster. A gain vector gt = gl t gk t) at each round t may be generated as follows:
For each round t=l,2,....,n
• The forecaster chooses an arm lt E (1, ... , K)
• The forecaster received the gain gt
• Only gt is revealed to the forecaster
[0068] A cumulative regret goal may be to maximize the cumulative gains obtained. In an example embodiment, the goal is to minimize: (Equation 1)
Figure imgf000017_0001
Where the expectation E comes from both a possible stochastic generation of the gain vector and a possible randomization in the choice of lt.
[0069] In a MAB game, the environment is stochastic. The gain vector gt is sampled from an unknown product distribution x ... .x vK on [0, 1]K that is gi t « vt. Also, the environment is adversarial in the way that the gain vector gt is chosen by an adversarial (which at time t, knows all the past, but not lt. There may be variants of the MAB bandit problem, as well as multiple applications.
[0070] With a stochastic MAB bandit game introduced by Robbins, the unknown parameters to the forecaster are the reward distributions Vi, ... , vK of the arms (with respective mean g ...,gK). The algorithm may be deployed as follows:
For each round t = 1, 2, ..., n
• The forecaster chooses an arm lt E (1,
Figure imgf000018_0001
• The environment draws the gain vector gt = gliti gK t) according to
Figure imgf000018_0002
• The forecaster receives the gain gt
Figure imgf000018_0003
Where:
Figure imgf000018_0004
[0071] The cumulative regret may be given by:
Figure imgf000018_0005
(Equation 2)
[0072] The goal may be to minimize the expected cumulative regret:
Figure imgf000018_0006
(Equation 3)
[0073] In some embodiments, the distributed coded Al model training is mapped to a MAB algorithm. Four features may delimit the MAB problem within the general class of stochastic control problems:
• Only one arm operates at each instant time. The evolution of the arm that is being operate is uncontrolled. The forecaster chooses which arm to operate but not how to operate it.
• Arms that are not operated remain frozen
• Arms are independent
• Frozen arms contributed no reward
[0074] In such a problem, there is a set of arms, each of which when played or pulled by the forecaster yields some reward, depending on its internal state which evolve stochastically over time. Such elements put together are suitable for distributed coded Al model training. [0075] Since l=ω /k is integer, all possible selections of k (an arm (or in other words, the redundancy factor)) can be retrieved in a set of K = {k1 k2, k3,.., k1 }. The index set may be taken as: J = {1, 2, 3, . . , I}. If a particular ki ∈ K is selected, the overall execution time, from when the submatrices A„ start to be distributed to worker devices to when the master node obtains ki local computation results from the worker devices is denoted by Φi.
[0076] The randomness of the process may be characterized by the randomness of the computation time of each the randomness of communication time to
Figure imgf000019_0006
dispatch and x, and the upload of yn. The execution time Φi for one iteration is also a random variable. If the selection of k in iteration m is denoted by then, the
Figure imgf000019_0007
associated execution time for this iteration is given by
Figure imgf000019_0008
[0077] For each iteration m in M, the target of this proposal scheme is to determine in order to:
Figure imgf000019_0009
(Equation 4)
Figure imgf000019_0001
[0078] The formulation in Equation 4 is suitable for an MAB model. In this analogy, % set can be viewed as the set of arms. Pulling an arm is mapped by the selection of from set in iteration m. Doing so, the computation time may
Figure imgf000019_0010
Figure imgf000019_0011
be minimized, which is analogous to get a reward . Finally, the overall target
Figure imgf000019_0005
is to maximize the total reward during the Al model training
Figure imgf000019_0004
process.
[0079] If the master node knows
Figure imgf000019_0012
, with i in J, then the optimal policy for each iteration m is given by:
Figure imgf000019_0002
(Equation 5) (Equation 6)
Figure imgf000019_0003
[0080] In practice, the master node may not have information of either Φ(ki) or , i in 3 when the distributed computing cluster is just formed before any
Figure imgf000019_0013
computation is executed. The master node, therefore, may allow an initialization step to try different k values and obtain the knowledge of
Figure imgf000019_0014
. This operation of the method be considered as a warm-up phase. Different warm-up phases can be designed according to the type of MAB algorithm to be deployed.
[0081] In some embodiments, the online decision is fit into the MAB algorithm. An online framework may involve sequential decision-making under uncertainty. Thus, in some embodiments, the agent or forecaster is the master node and is initially unaware of the stochastic evolution of the environment (that is, arms/redundancy factor), aims at maximizing a common objective based on the history of actions and observations.
[0082] An assumption may be included that the number of iterations that lead to the Al model convergence is large enough compared to the size of set of redundancy factor to be selected K.
[0083] An example embodiment of the method of the present disclosure is illustrated in the signalling diagram shown in Figure 3. Three components are included in Figure 3: computing device 101 which includes an MAB online decision model, distributed computing cluster 205/207, and iterative Al model 105. Each component receives a set of instructions and performs processing to output some metrics or variables. Sequences of the operations of this example embodiment include the following.
[0084] Step 0: Iterative Al model 105 signals (operation 305) a request to a master node 207 of the distributed computing cluster for a computation in each iteration of a vector-data matrix multiplication y=A-x, where matrix A and data x are available. Matrix A has constant row co and constant column b. The dimensions are w*l for y, w*b for A, and b*l for x.
[0085] Master node 207 forms (operation 303) the distributed computing cluster of available and trusted work devices 205 (e.g., loT edge devices) of size N. Master node 207 also activates or triggers the MAB online decision framework of computing device 101.
[0086] Step 1: Master node 207 passes (operation 305) to the MAB online decision model of computing device 101 the number of rows ω of the matrix and the size of the cluster N. The MAB online decision model prepares (operation 309) a list of redundancy factors K = {k1, k2, k3, .. , kI} based on ω , which are referred to herein as arms. The MAB online decision model sets (operation 311) all execution time Φi (mapped as a reward) and the number of times each arm is selected λi to zero. [0087] Step 2: At iteration m, iterative Al model 105 releases (operation 321) the matrix A and the data x and requests (operation 315) the redundancy factor from
Figure imgf000021_0004
computing device 101. Computing device 101 activates (operation 317) the MAB model to select the appropriate \ which is the arm that may lead to minimum execution
Figure imgf000021_0001
time. The MAB online decision model sends (operation 319) the selected redundancy factor to master node 207.
[0088] Step 3: Master node 207 encodes (operation 323) the matrix A into a plurality of submatrices using a coding theoretic technique such as MDS. Master
Figure imgf000021_0005
node 207 multicasts (operation 323) to each worker device 205 the pair
Figure imgf000021_0006
. After computation of the distributed local matrix-vector multiplication at each worker device 205 in the distribution, master node 207 collects (operation 323) results of a
Figure imgf000021_0007
subset of the worker devices 205 in the distribution. Master node 207 decodes (operation 323) the results and extracts the final, overall result y.
[0089] Step 4: Master node 207 passes (operation 329) the final, overall result y to iterative Al model 105. Master node 207 also consolidates (operation 323) the final execution time associated with the redundancy factor and sends (operation
Figure imgf000021_0002
Figure imgf000021_0003
327) it to computing device 101.
[0090] Step 5: Computing device 101 updates (operation 331) the parameters
Figure imgf000021_0008
and λi according to the algorithms used.
[0091] Step 6: Iterative Al model 105 verifies (operation 333) according to a criteria whether the Al model 105 has converged: If Al model 105 has converged, the iterative Al training ends and all resources are released (operation 335). If Al model 105 has not converged, the method moves to the next iteration (m + 1) and the method restarts at Step 2.
[0092] Figure 4 is a flowchart illustrating operations of a computing device (e.g., computing device 101) according to some embodiments of the present disclosure. The computing device can be computing device 1200 of Figure 12 (as discussed further herein) that is configured for iterative training of a collaborative distributed coded Al model (e.g., Al model 105). The method includes receiving (409) a request from the Al model for a redundancy factor for an iteration of training of the Al model. The redundancy factor comprises an amount of workload to be assigned per worker device (e.g., worker device 205a) of a distributed computing cluster (e.g., distributed computing cluster 207/205) in the iteration. The method further includes selecting (411) the redundancy factor in the iteration based on use of a ML model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors. The method further includes sending (413) the selected redundancy factor to a master node (e.g., master node 207) for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
[0093] The method may further include receiving (415), from the master node, an overall execution time for the distributed coded execution of the multiplication of the data matrix and the vector in the iteration by a subset of a plurality of worker devices in the distributed computing cluster.
[0094] The use of the ML model may comprise (i) per iteration in a set of iterations, choosing a redundancy factor from the set of redundancy factors, (ii) per iteration in the set of iterations, receiving a reward value for the chosen redundancy factor, and (ii) in the iteration, selecting the redundancy factor from the set of redundancy factors that has a highest reward value. The reward value has an inverse relationship to an overall execution time for the distributed coded execution of the multiplication of the data matrix and the vector in the iteration.
[0095] In some embodiments, the method further includes receiving (401), from the master node, a first parameter defining a size of the distributed computing cluster; receiving (403) from the Al model a second parameter defining a number of rows in the data matrix; and identifying (405) the set of redundancy factors based on the number of rows in the data matrix.
[0096] The method may further include initializing (407) values of a plurality of parameters in the ML model to zero. The plurality of parameters may comprise (i) a number of times that redundancy factors are selected from the set of redundancy factors, and (ii) an average reward value of the selected redundancy factors.
[0097] The method may further include updating (417) the ML model with (i) the number of times that redundancy factors are selected, and (ii) the average reward value of the selected redundancy factors where the average reward value has an inverse relationship with the received overall execution time.
[0098] The selecting (413) may include an online decision that selects the redundancy factor.
[0099] The selected redundancy factor may be suitable for a mission critical operation.
[00100] The ML model may comprise a MAB model.
[00101] The plurality of worker devices may comprise a plurality of Internet of
Things, loT, edge computing devices.
[00102] Operations 401-407 and 415-417 from the flow chart of Figure 4 may be optional with respect to some embodiments of computing devices and related methods. [00103] Figure 5 is a flowchart illustrating operations of a master node (e.g., master node 207) according to some embodiments of the present disclosure. The master node may be master node 1300 of Figure 13 (as discussed further herein). The master node is in a distributed computing cluster for iterative training of a collaborative distributed coded Al model (e.g., Al model 105). The method includes receiving (501), from a computing device (e.g., computing device 101), a redundancy factor. The redundancy factor comprises an amount of workload to be assigned per worker device (e.g., worker device 205a) of the distributed computing cluster in an iteration. The method further includes receiving (503) a data matrix and a vector from the Al model; encoding (505) the data matrix into a plurality of submatrices; and distributing (507) a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor. The method further includes collecting (509) respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extracting (511) an overall result from the collected respective results; and sending (513) the overall result to the Al model to determine whether the training is completed.
[00104] The method may further include identifying (515) an overall execution time for the distributed coded execution of the multiplication of the respective submatrix and the vector in the iteration of the training of the Al model by the respective worker devices in the subset of worker devices in the distributed computing cluster.
[00105] In some embodiments, the method further includes sending (517) the overall execution time to the computing device.
[00106] Operations 515-517 from the flow chart of Figure 5 may be optional with respect to some embodiments of master nodes and related methods.
[00107] In example embodiments, the MAB online decision model may be, without limitation, an ε-greedy or a upper confidence bound 1 (UCB1) model. A workflow of the e- greedy and/or UCB1 may be summarized in the following algorithm:
Algorithm: Example implementation of MAB inline decision model.
1: λi <- 0, i ∈ I
2: for iteration m = 1, 2, ..., / do
3: Select arm k_cm to initialize this arm: km <- k_cm.
4: Pull arm km. The corresponding reward (opposite of execution time) is
5: Update λi <— λi + 1, <-
6: end for
7: for iteration m = I + 1, I + 2, . . ., M do
8: Make a decision on the arm to be selected according to a certain equation.
Suppose arm k_Ci is to be selected.
9: Select arm: km <— k_d.
10: Pull arm km. The corresponding reward (opposite of execution time) is
11: Update
Figure imgf000024_0001
12: end for
[00108] In the example embodiments, when a distributed computing cluster is formed in a given iteration of the training, the above algorithm is triggered to make an online decision on the k value in the execution of the iterative Al model, as discussed below for a simulation.
[00109] Simulation Step 0: The simulation includes a distributed computing cluster of N = 500 loT edge devices (i.e., worker devices). The simulation assumed that training of the iterative Al model takes approximatively M = 5000 iterations to converge. It is noted that, in practice, there is no need to specify the number of iterations. The simulation also assumed that the number of iterations up to convergence is much larger than the set of redundancy factors (that is M » I). In the simulation, the set of redundancy factor is I = 23; ω = 107 and b = 15. Then, in each iteration of the Al model training, the master node computes .
Figure imgf000025_0002
[00110] Simulation Step 1: The number of rows ω = 107 of the matrix and the size of the cluster N = 500. The list of redundancy factors, therefore, is given by:
K = {1, 2, 4, 5, 8, 10, 16, 20, 25, 32, 40, 50, 64, 80, 100, 125, 160, 200, 250, 320, 400, 500} The master node triggers the example algorithm above to make an online decision. In the example algorithm, the number of pulls for arm k _c. is denoted by λi . In the MAB model, the reward harvested by selecting
Figure imgf000025_0003
. In the example algorithm, after initializing the number of pulls for each arm λi as 0 (Line 1), the first I iterations of the example algorithm are used to pull every arm once and get an initial result of its resulting performance (Line 2 to Line 5). Other variants of initialization may be used in this sequence.
[00111] Simulation Step 2: In the example algorithm, the empirical average execution time by selecting arm k _c. is stored as
Figure imgf000025_0001
After pulling each arm once, the main body of the example algorithm is performed for the remaining iterations (Line 6 to Line 10). An arm is selected in each iteration according to a certain equation in Line 7 of the example MAB algorithm.
In e-greedy, a hyperparameter e is included, where 0<e<l. In Line 7 of example algorithm, if e-greedy is implemented, the following approach for arm selection is used: _ uation
Figure imgf000025_0004
8)
If a UCB1 algorithm is implemented, we first, a priority factor for arm in iteration
Figure imgf000025_0006
Figure imgf000025_0007
m is defined as: (Equation 9)
Figure imgf000025_0005
Where μm and σm are the mean and standard deviation of the set { and a is the
Figure imgf000025_0008
normalization hyperparameter we have introduced. Then, from the priority factor in the equation immediately above, the arm selection (j) is derived as follows: (Equation 10)
Figure imgf000026_0001
[00112] Simulation Step 3: Evaluation of the execution time. In each iteration of the Al model, the execution time on each loT device t includes three random parts: the distribution time td, the local computation time tc, and the uploading time tu. See Equation 8.
[00113] In the simulation, the following was used: U = 105, 0 = 10-5, Pdistri = Pup = 0.25, Vdistri = Vdistri = 32 · 1.1, where a floating-point number is represented by 32 bits with an extra packet encapsulation overhead of 0.1, Pdistri = 200 Mbps and Pup = 25 Mbps. Finally, the overall execution time is given by: t = tc + td + tu (Equation 11)
The following example embodiment of mapping was used in the simulation to estimate the overall execution time.
[00114] Estimation of the overall execution time: In each iteration of the Al model, the execution time on each loT device t includes three random parts: the distribution time td, the local computation time tc, and the uploading time tu. See Equation 8
[00115] (Equation 12) and (Equation 13).
Figure imgf000026_0002
Figure imgf000026_0003
In Equation 12, Ddistri is the downlink distribution, (/ + 1) · b is the number of floatingpoint elements in A„ and x that need to be distributed to a worker. Also, Vdistri is a coefficient that translates a floating number into its corresponding size in a data packet, and βdistri is a discrete random number that represents the number of transmissions required to successfully distribute A„ and x to a worker.
[00116] In the simulation, Pdistri was considered to follow a geometric distribution given by: (Equation 14)
Figure imgf000026_0004
Where Pdistri is the probability of a successful transmission. See e.g., S. Dhakal et al., Proceeding of IEEE 90th Vehicular Technology Conference (VTC2019-Fall) (2019), pp. 1-6; "Dhakal") H. Karl and W. Willig, "Protocols and architectures for wireless senso networks", John Wiley & Sons, 2007 ("Karl"). Similar to Equation 13, βup is the uplink bandwidth, b is the number of floating-point elements in yn to be uploaded back to the master node, Vup is a coefficient that translated a floating-point number to its corresponding size in a data packet, and /3up is a discrete random number that represents the number of transmission for successfully uploading yn to the master node. The random variable /3up follows a geometric distribution given by:
Figure imgf000027_0001
(Equation 15)
[00117] Idem, Pup is the probability of a successful transmission. See e.g., Dhakal and Karl.
[00118] The compute time tc includes two components: the computation time tc l and the memory access time tc 2, see e.g., Dhakal, so that: tc = tc,i + tC2 (Equation 16)
[00119] The local computation time tc lcan be estimated as: tc l = 1 - 0 (Equation 17)
Where 6 is the time required to complete the multiplication of one row in
Figure imgf000027_0002
at an loT edge device. The memory access time is a continuous random variable where its probability density function is given by:
Figure imgf000027_0003
(Equation 18)
Where y = U /I, with U being the memory access rate as described, e.g., Ding; Dhakal; W. Shi et al, "Joint device scheduling and resource allocation for latency constrained wireless federate learning," IEEE Transaction on Wireless Communications vol. 20, no. 1, pp. 453- 467 (2020).
[00120] Simulation results are now discussed, including extrapolation of simulation steps 4-6 from analysis of the simulation results.
[00121] Prior to numerical results of the MAB-based online decision models, the empirical average reward for the arms was observed by pulling each arm k _c. , i E I 1000 times and recording the empirical average reward E pt). According to the law of Large Numbers, the empirical reward after 1000 pulls can be very close to the true average reward E(tp ). Figure 6 is a plot of empirical average execution time for each arm (1000 trial per arm) for the simulation. Figure 6 illustrates that the arm 400 achieved the minimum average execution time. [00122] To validate the effectiveness of the MAB-based online decision model for the simulation, the results for the following MAB online decision models were analyzed:
• e-greedy where e=0.05
• e-greedy where e=0.01
• UCB1, where a=25
• A random benchmark, which selects from the set K uniformly at random in each iteration m e M .
[00123] The following Table 1 lists the average execution time (which is the opposite of average reward) per iteration for the four models:
Figure imgf000028_0001
[00124] As illustrated in Table 1, in the simulation, the UCB1 model achieves the best average execution time (e.g., much better that the random algorithm which did not make an intelligent decision).
[00125] The following Table 2 lists the total number of pulls for the arms in the simulation. The simulation includes 5000 iterations a set of 23 arms. Table 2 illustrates the total numbers of pull for the arms whose kt values are no larger than 100, and the number of pulls for each individual arm whose kt is larger than 100:
Figure imgf000028_0002
Figure imgf000029_0001
[00126] The per-iteration reward evolution for the simulation is plotted in Figures 7A-7C versus random (Figure 7D), and the number of pulls for each arm in the simulation is illustrated in the plots of Figures 8-11. Figure 7A is a plot of the simulation results for the e- greedy (e = 0.05) model; Figure 7B is a plot of the simulation results for the e-greedy (e = 0.01) model; Figure 7C is a plot of the simulation results for the UCB1 model; and Figure 7D is a plot of results from a random selection.
[00127] Figure 8 is a plot for the simulation of the number of pulls for each arm for the e-greedy (e = 0.05) model; Figure 9 is a plot for the simulation of the number of pulls for each arm for the e-greedy (e = 0.01) model; Figure 10 is a plot for the simulation of the number of pulls for each arm for the UCB1 model; and Figure 11 is a plot for the random selection of each arm.
[00128] As illustrated in Figures 7A-7D, the UCB1 model of Figure 7C performed the best in the simulation and converged within tens of iterations.
[00129] The e-greedy algorithms of Figures 7A and 7B also converge to a relatively stable arm selection with a similar number of iterations. However, due to the nature of s- greedy, as shown in Equation 8, with a small probability of E, the e-greedy randomly explores arms other that the current best one, which is a reason fluctuations are observed in Figures 7A and 7B, which are contributed by these random explorations.
[00130] In contrast, in the simulation, the UCB1 model of Figure 7C better balanced the trade-off between exploration and exploitation, as a result of the second term in Equation 9. As illustrated in Figure 7C, the UCB1 model stopped exploring the arms earlier and focused on the arms that can produce a high reward (that is, meaning with low execution times).
[00131] The random selection illustrated in Figure 7D does not show convergence and is not suitable for an online decision.
[00132] Figure 12 is a block diagram illustrating elements of a computing device 1200 (also referred to as a central node, a central coordinating node, a server, a base station, gNodeB/gNB, etc.) according to embodiments of inventive concepts. As shown, the computing device may include transceiver circuitry 1201 (also referred to as a transceiver) including a transmitter and a receiver configured to provide uplink and downlink communications with worker devices, other computing devices, etc. The computing device may include network interface circuitry 1207 (also referred to as a network interface) configured to provide communications with worker devices and other computing devices. The computing device may also include processing circuitry 1203 (also referred to as a processor) coupled to the transceiver circuitry, and memory circuitry 1205 (also referred to as memory) coupled to the processing circuitry. The memory circuitry 1205 may include computer readable program code that when executed by the processing circuitry 1203 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1203 may be defined to include memory so that a separate memory circuitry is not required. The computing device may include a ML model 1209 (e.g., an MAB model). [00133] As discussed herein, operations of the computing device may be performed by processing circuitry 1203, ML 1209, network interface 1207, and/or transceiver 1201. For example, processing circuitry 1203 may control transceiver 1201 to transmit downlink communications through transceiver 1201 to one or more worker devices and/or master node and/or to receive uplink communications through transceiver 1201 from one or more worker devices and/or master node. Moreover, modules may be stored in memory 1205 and/or ML model 1209, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1203, processing circuitry 1203 performs respective operations (e.g., operations discussed herein with respect to example embodiments relating to computing devices). According to some embodiments, computing device 1200 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
[00134] Figure 13 is a block diagram illustrating elements of a master node 1300 (also referred to as a server, a gNodeB/gNB, base station, etc.) according to embodiments of inventive concepts. As shown, the master node may include transceiver circuitry 1301 (also referred to as a transceiver) including a transmitter and a receiver configured to provide uplink and downlink communications with other computing devices, worker devices, etc. The master node includes network interface circuitry 1307 (also referred to as a network interface) configured to provide communications with other computing devices and worker devices (e.g., with computing devices, etc.). The master node may also include processing circuitry 1303 (also referred to as a processor) coupled to the transceiver circuitry, and memory circuitry 1305 (also referred to as memory) coupled to the processing circuitry. The memory circuitry 1305 may include computer readable program code that when executed by the processing circuitry 1303 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1303 may be defined to include memory so that a separate memory circuitry is not required. The master node may include an Al model 1309.
[00135] As discussed herein, operations of the inaccessible computing device may be performed by processing circuitry 1303, Al model 1309, network interface 1307, and/or transceiver 1301. For example, processing circuitry 1303 may control transceiver 1301 to transmit downlink communications through transceiver 1301 to one or more computing devices and/or work devices and/or to receive uplink communications through transceiver 1301 from one or more computing devices and/or worker devices. Moreover, modules may be stored in memory 1305 and/or Al model 1309, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1303, processing circuitry 1303 performs respective operations (e.g., operations discussed herein with respect to example embodiments relating to master nodes). According to some embodiments, master node 1300 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
[00136] Figure 14 is a block diagram illustrating elements of a worker device 1400 according to embodiments of inventive concepts. As shown, the worker device may include transceiver circuitry 1401 (also referred to as a transceiver) including a transmitter and a receiver configured to provide uplink and downlink communications with computing devices, master nodes, other worker devices, etc. The worker device may include network interface circuitry 1407 (also referred to as a network interface) configured to provide communications with computing devices, master nodes and/or worker devices. The worker device may also include processing circuitry 1403 (also referred to as a processor) coupled to the transceiver circuitry, and memory circuitry 1405 (also referred to as memory) coupled to the processing circuitry. The memory circuitry 1405 may include computer readable program code that when executed by the processing circuitry 1403 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1403 may be defined to include memory so that a separate memory circuitry is not required.
[00137] As discussed herein, operations of the worker device may be performed by processing circuitry 1403, network interface 1407, and/or transceiver 1401. For example, processing circuitry 1403 may control transceiver 1401 to transmit downlink communications through transceiver 1401 to one or more computing devices, master nodes, worker devices and/or to receive uplink communications through transceiver 1401 from one or more computing devices, master nodes, and/or worker devices. Moreover, modules may be stored in memory 1405, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1403, processing circuitry 1403 performs respective operations (e.g., operations discussed herein with respect to example embodiments relating to worker devices). According to some embodiments, worker device 1400 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
[00138] The worker devices may be any of a wide variety of communication devices, including wireless devices arranged, configured, and/or operable to communicate wirelessly with the computing device 1200, the master node 1300, and other communication devices. Similarly, the master node 1300 and/pr the computing device are arranged, capable, configured, and/or operable to communicate directly or indirectly with the worker devices and/or with other computing devices, master nodes, network nodes or equipment in network to enable and/or provide communications and operations of example embodiments discussed herein with respect to worker devices.
[00139] As used herein, a worker device refers to a device capable, configured, arranged and/or operable to communicate wirelessly with computing devices, master nodes, and/or other worker devices. Examples of a worker device UE include, but are not limited to, a smart phone, mobile phone, cell phone, voice over IP (VoIP) phone, wireless local loop phone, desktop computer, personal digital assistant (PDA), wireless cameras, gaming console or device, music storage device, playback appliance, wearable terminal device, wireless endpoint, mobile station, tablet, laptop, laptop-embedded equipment (LEE), laptop-mounted equipment (LME), smart device, wireless customer-premise equipment (CPE), vehicle-mounted or vehicle embedded/integrated wireless device, etc. Other examples include any UE identified by the 3rd Generation Partnership Project (3GPP), including a narrow band internet of things (NB-loT) user equipment (UE), a machine type communication (MTC) UE, and/or an enhanced MTC (eMTC) UE.
[00140] A worker device may support device-to-device (D2D) communication, for example by implementing a 3GPP standard for sidelink communication, Dedicated Short- Range Communication (DSRC), vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), or vehicle-to-everything (V2X). In other examples, a worker device may not necessarily have a user in the sense of a human user who owns and/or operates the relevant device. Instead, a worker device may represent a device that is intended for sale to, or operation by, a human user but which may not, or which may not initially, be associated with a specific human user (e.g., a smart sprinkler controller). Alternatively, a worker device may represent a device that is not intended for sale to, or operation by, an end user but which may be associated with or operated for the benefit of a user (e.g., a smart power meter). [00141] Although the computing devices, nodes, and worker devices described herein may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices, nodes, and worker devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions, and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the computing device, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware.
[00142] In certain embodiments, some or all of the functionality described herein may be provided by processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non- transitory computer-readable storage medium. In alternative embodiments, some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner. In any of those particular embodiments, whether executing instructions stored on a non-transitory computer-readable storage medium or not, the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device, but are enjoyed by the computing device as a whole, and/or by end users and a wireless network generally.
[00143] Further definitions and embodiments are discussed below.
[00144] In the above-description of various embodiments of the present disclosure, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
[00145] When an element is referred to as being "connected", "coupled", "responsive", or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected", "directly coupled", "directly responsive", or variants thereof to another element, there are no intervening elements present. Like numbers refer to like elements throughout. Furthermore, "coupled", "connected", "responsive", or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or" (abbreviated "/") includes any and all combinations of one or more of the associated listed items.
[00146] It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.
[00147] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation "e.g.", which derives from the Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation "i.e.", which derives from the Latin phrase "id est," may be used to specify a particular item from a more general recitation.
[00148] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
[00149] These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.
[00150] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
[00151] Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts are to be determined by the broadest permissible interpretation of the present disclosure including the examples of embodiments and their equivalents, and shall not be restricted or limited by the foregoing detailed description.

Claims

CLAIMS:
1. A computer-implemented method performed by a computing device (101, 1200) for iterative training of a collaborative distributed coded artificial intelligence, Al, model, the method comprising: receiving (409) a request from the Al model for a redundancy factor for an iteration of training of the Al model, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration; selecting (411) the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors; and sending (413) the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
2. The method of Claim 1, further comprising; receiving (415), from the master node, an overall execution time for the distributed coded execution of the multiplication of the data matrix and the vector in the iteration by a subset of a plurality of worker devices in the distributed computing cluster.
3. The method of any of Claims 1 to 2, wherein the use of the ML model comprises (i) per iteration in a set of iterations, choosing a redundancy factor from the set of redundancy factors, (ii) per iteration in the set of iterations, receiving a reward value for the chosen redundancy factor, and (ii) in the iteration, selecting the redundancy factor from the set of redundancy factors that has a highest reward value, where the reward value has an inverse relationship to an overall execution time for the distributed coded execution of the multiplication of the data matrix and the vector in the iteration.
4. The method of any of Claims 1 to 3, further comprising: receiving (401), from the master node, a first parameter defining a size of the distributed computing cluster; receiving (403) from the Al model a second parameter defining a number of rows in the data matrix; and identifying (405) the set of redundancy factors based on the number of rows in the data matrix.
5. The method of any of Claims 1 to 4, further comprising: initializing (407) values of a plurality of parameters in the ML model to zero, the plurality of parameters comprising (i) a number of times that redundancy factors are selected from the set of redundancy factors, and (ii) an average reward value of the selected redundancy factors.
6. The method of Claim 5, further comprising: updating (417) the ML model with (i) the number of times that redundancy factors are selected, and (ii) the average reward value of the selected redundancy factors where the average reward value has an inverse relationship with the received overall execution time.
7. The method of any of Claims 1 to 6, wherein the selecting (413) comprises an online decision that selects the redundancy factor.
8. The method of Claim 7, wherein the selected redundancy factor is suitable for a mission critical operation.
9. The method of any of Claims 1 to 8, wherein the ML model comprises a multi-armed bandit model.
10. The method of any of Claims 1 to 9, wherein the plurality of worker devices comprises a plurality of Internet of Things, loT, edge computing devices.
11. A computing device (101, 1200), the computing device comprising: processing circuitry (1203); memory (1205) coupled with the processing circuitry, wherein the memory includes instructions that when executed by the processing circuitry causes the computing device to perform operations comprising: receive a request from the Al model for a redundancy factor for an iteration of training of the Al model, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration; select the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors; and send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
12. The computing device of Claim 11, the operations further comprising any of the operations of Claims 2-10.
13. A computing device (101, 1200) adapted to perform operations comprising: receive a request from the Al model for a redundancy factor for an iteration of training of the Al model, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration; select the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors; and send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
14. The computing device of Claim 13 adapted to perform operations further comprising any of the operations of Claims 2-10.
15. A computer program product comprising a non-transitory storage medium (1205) including program code to be executed by processing circuitry (1203) of a computing device (101, 1200), whereby execution of the program code causes the computing device to perform operations comprising: receive a request from the Al model for a redundancy factor for an iteration of training of the Al model, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration; select the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors; and send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
16. The computer program product of Claim 15, the operations further comprising any of the operations of Claims 2-10.
17. A computer program comprising program code to be executed by processing circuitry (1203) of a computing device (101, 1200), whereby execution of the program code causes the computing device to perform operations comprising: receive a request from the Al model for a redundancy factor for an iteration of training of the Al model, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in the iteration; select the redundancy factor in the iteration based on use of a machine learning, ML, model that selects the redundancy factor that has a lowest overall execution time from a set of redundancy factors; and send the selected redundancy factor to a master node for a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model.
18. The computer program of Claim 17, whereby execution of the program code causes the computing device to perform operations according to any of Claims 2-10.
19. A computer-implemented method performed by a master node (207, 1300) in a distributed computing cluster for iterative training of a collaborative distributed coded artificial intelligence, Al, model, the method comprising: receiving (501), from a computing device, a redundancy factor, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration; receiving (503) a data matrix and a vector from the Al model; encoding (505) the data matrix into a plurality of submatrices; distributing (507) a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor; collecting (509) respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extracting (511) an overall result from the collected respective results; and sending (513) the overall result to the Al model to determine whether the training is completed.
20. The method of Claim 19, further comprising: identifying (515) an overall execution time for the distributed coded execution of the multiplication of the respective submatrix and the vector in the iteration of the training of the Al model by the respective worker devices in the subset of worker devices in the distributed computing cluster.
21. The method of Claim 20, further comprising: sending (517) the overall execution time to the computing device.
22. A master node (207, 1300), the master node comprising: processing circuitry (1303); memory (1305) coupled with the processing circuitry, wherein the memory includes instructions that when executed by the processing circuitry causes the master node to perform operations comprising: receive, from a computing device, a redundancy factor, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration; receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor; collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
23. The master node of Claim 22, the operations further comprising any of the operations of Claims 20-21.
24. A master node (207, 1300) adapted to perform operations comprising: receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor; collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
25. The master node of Claim 24 adapted to perform operations further comprising any of the operations of Claims 20-21.
26. A computer program product comprising a non-transitory storage medium (1305) including program code to be executed by processing circuitry (1303) of a master node (207, 1300), whereby execution of the program code causes the master node to perform operations comprising: receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor; collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
27. The computer program product of Claim 26, the operations further comprising any of the operations of Claims 20-21.
28. A computer program comprising program code to be executed by processing circuitry (1303) of a master node (207, 1300), whereby execution of the program code causes the master node to perform operations comprising: receive a data matrix and a vector from the Al model; encode the data matrix into a plurality of submatrices; distribute a respective submatrix and the vector to a respective worker device in the distributed computing cluster according to the redundancy factor; collect respective results of a distributed coded execution of a multiplication of a respective submatrix and the vector in an iteration of the training of the Al model by respective worker devices in the subset of the worker devices in the distributed computing cluster; extract an overall result from the collected respective results; and send the overall result to the Al model to determine whether the training is completed.
29. The computer program of Claim 28, whereby execution of the program code causes the master node to perform operations according to any of Claims 20-21.
30. A system for a computer-implemented method for iterative training of a collaborative distributed coded artificial intelligence, Al, model, the system comprising: a computing device (101, 1200) comprising a machine learning, ML, model configured to (i) select a redundancy factor from a set of redundancy factors per iteration of the iterative training of the collaborative distributed Al model, the redundancy factor comprising an amount of workload to be assigned per worker device of a distributed computing cluster in an iteration, and (ii) send the selected redundancy factor to a master node; a distributed computing cluster (207, 205) comprising (i) the master node (207) and (ii) a plurality of worker devices (205) that perform a distributed coded execution of a multiplication of a data matrix and a vector in an iteration of the training of the Al model based on a distribution by the master node of the data matrix and the vector based on the received selected redundancy factor; and the collaborative distributed Al model (105, 1309) communicatively connected to the distributed computing cluster and the computing device that is trained based on the distributed coded execution of the multiplication of the data matrix and the vector.
PCT/IB2022/052483 2022-03-18 2022-03-18 Iterative training of collaborative distributed coded artificial intelligence model WO2023175381A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2022/052483 WO2023175381A1 (en) 2022-03-18 2022-03-18 Iterative training of collaborative distributed coded artificial intelligence model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2022/052483 WO2023175381A1 (en) 2022-03-18 2022-03-18 Iterative training of collaborative distributed coded artificial intelligence model

Publications (1)

Publication Number Publication Date
WO2023175381A1 true WO2023175381A1 (en) 2023-09-21

Family

ID=80937171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2022/052483 WO2023175381A1 (en) 2022-03-18 2022-03-18 Iterative training of collaborative distributed coded artificial intelligence model

Country Status (1)

Country Link
WO (1) WO2023175381A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201198A (en) * 2023-11-07 2023-12-08 北京数盾信息科技有限公司 Distributed high-speed encryption computing method

Non-Patent Citations (13)

* Cited by examiner, † Cited by third party
Title
A. REISIZADEH ET AL.: "Coded computation over heterogeneous cluster", IEEE TRANSACTION ON INFORMATION THEORY, vol. 65, no. 7, 2019, pages 4227 - 4242, XP011729990, DOI: 10.1109/TIT.2019.2904055
D. KIM ET AL.: "Optimal load allocation for coded distributed computation in heterogeneous clusters", IEEE TRANSACTION ON COMMUNICATIONS, vol. 69, no. 1, 2021, pages 44 - 58, XP011831816, DOI: 10.1109/TCOMM.2020.3030667
H. KARLW. WILLIG: "Protocols and architectures for wireless senso networks", 2007, JOHN WILEY & SONS
K. LEE ET AL.: "Speeding up distributed machine learning using codes", IEEE TRANSACTION ON INFORMATION THEORY, vol. 64, no. 3, 2017, pages 1514 - 1529, XP011677617, DOI: 10.1109/TIT.2017.2736066
LEE KANGWOOK ET AL: "Speeding Up Distributed Machine Learning Using Codes", IEEE TRANSACTIONS ON INFORMATION THEORY, IEEE, USA, vol. 64, no. 3, 1 March 2018 (2018-03-01), pages 1514 - 1529, XP011677617, ISSN: 0018-9448, [retrieved on 20180215], DOI: 10.1109/TIT.2017.2736066 *
N. DING ET AL.: "Optimal incentive and load design for distributed coded machine learning", IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, vol. 39, no. 7, 2021, pages 2090 - 2104, XP011861181, DOI: 10.1109/JSAC.2021.3078494
NG JER SHYUAN ET AL: "A Comprehensive Survey on Coded Distributed Computing: Fundamentals, Challenges, and Networking Applications", IEEE COMMUNICATIONS SURVEYS & TUTORIALS, IEEE, USA, vol. 23, no. 3, 23 June 2021 (2021-06-23), pages 1800 - 1837, XP011873121, DOI: 10.1109/COMST.2021.3091684 *
R. SINGLETON: "Maximum distance Q-nary codes", IEEE TRANSACTIONS ON INFORMATION THEORY, vol. 10, no. 2, 1964, pages 116 - 118
S. DHAKAL ET AL., PROCEEDING OF IEEE 90TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2019-FALL, 2019, pages 1 - 6
W. SHI ET AL.: "Joint device scheduling and resource allocation for latency constrained wireless federate learning", IEEE TRANSACTION ON WIRELESS COMMUNICATIONS, vol. 20, no. 1, 2020, pages 453 - 467
W. Y. B. LIM ET AL.: "Incentive mechanism design for resource sharing in collaborative learning", ARXIV:2016.00511, 2020, Retrieved from the Internet <URL:https://doi.org/10.48550/arXiv.2006.00511>
WENCHAO XIA ET AL.: "Multi-Armed Bandit-Based Client Scheduling for Federated Learning", IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, vol. 19, no. 11, November 2020 (2020-11-01), pages 7108 - 7123, XP011819658, DOI: 10.1109/TWC.2020.3008091
XIA WENCHAO ET AL: "Multi-Armed Bandit-Based Client Scheduling for Federated Learning", IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 19, no. 11, 16 July 2020 (2020-07-16), pages 7108 - 7123, XP011819658, ISSN: 1536-1276, [retrieved on 20201110], DOI: 10.1109/TWC.2020.3008091 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201198A (en) * 2023-11-07 2023-12-08 北京数盾信息科技有限公司 Distributed high-speed encryption computing method
CN117201198B (en) * 2023-11-07 2024-01-26 北京数盾信息科技有限公司 Distributed high-speed encryption computing method

Similar Documents

Publication Publication Date Title
Fadlullah et al. HCP: Heterogeneous computing platform for federated learning based collaborative content caching towards 6G networks
Nath et al. Deep reinforcement learning for dynamic computation offloading and resource allocation in cache-assisted mobile edge computing systems
Vu et al. Cell-free massive MIMO for wireless federated learning
Yang et al. Federated learning via over-the-air computation
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
WO2021233053A1 (en) Computing offloading method and communication apparatus
Gao et al. Task partitioning and offloading in DNN-task enabled mobile edge computing networks
Dab et al. Q-learning algorithm for joint computation offloading and resource allocation in edge cloud
CN110851197B (en) Method and system for selecting and unloading tasks of edge computing multi-server
CN111629380A (en) Dynamic resource allocation method for high-concurrency multi-service industrial 5G network
Reddy et al. Hybrid optimization algorithm for security aware cluster head selection process to aid hierarchical routing in wireless sensor network
Li et al. A delay-aware caching algorithm for wireless D2D caching networks
WO2023036268A1 (en) Communication method and apparatus
Van Truong et al. System performance and optimization in NOMA mobile edge computing surveillance network using GA and PSO
WO2023175381A1 (en) Iterative training of collaborative distributed coded artificial intelligence model
Younis et al. Energy-Latency Computation Offloading and Approximate Computing in Mobile-Edge Computing Networks
CN113645637A (en) Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium
WO2023098860A1 (en) Communication method and communication apparatus
Kim et al. Deep Q-network-based cloud-native network function placement in edge cloud-enabled non-public networks
US20220376955A1 (en) Methods and wireless network for selecting pilot pattern for optimal channel estimation
Cheng et al. Efficient deep learning approach for computational offloading in mobile edge computing networks
Dong et al. Bandit sampling for faster activity and data detection in massive random access
Kim et al. Joint edge server selection and dataset management for federated learning-enabled mobile traffic prediction
Tong et al. From Learning to Analytics: Improving Model Efficacy with Goal-Directed Client Selection
Huang et al. Multi Access Edge Computing Task Unloading Algorithm Based on Genetic Ant Colony Algorithm Fusion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22712652

Country of ref document: EP

Kind code of ref document: A1