GB2610238A

GB2610238A - Method and apparatus for data processing

Info

Publication number: GB2610238A
Application number: GB2113043.0A
Authority: GB
Inventors: Laganakos Vasileios; Richard Nutter Mark
Original assignee: ARM Ltd; Advanced Risc Machines Ltd
Current assignee: ARM Ltd
Priority date: 2021-08-12
Filing date: 2021-09-13
Publication date: 2023-03-01
Also published as: GB202113043D0

Abstract

Scheduling the execution of neural networks in a computing environment. A model for predicting the execution characteristics of neural networks is provided. Input data representing one or more traits of neural network is obtained and processed using the model to generate prediction data including determinations of one or more execution characteristics of the neural network. The neural network is then scheduled for execution in a computing environment based on the prediction data. A computer-implemented and system for training the model to predict the one or more execution characteristics of neural networks is also provided.

Description

METHOD AND APPARATUS FOR DATA PROCESSING

Technical Field

The present disclosure relates to neural networks. In particular, but not exclusively, the present disclosure relates to predicting execution characteristics of neural networks and scheduling the execution of the neural networks in a computing environment

Background

Neural networks are machine learning computer models which can be configured and trained to perform specific computing tasks. Neural networks are increasingly being used for a range of machine learning tasks including image processing, natural language processing, and so forth. Neural networks generally comprise an input layer, one or more hidden layers, and an output layer. Each layer comprises a set of nodes modelled by weights, a bias function, and an activation function. The weights can be updated during training and are applied to data when processing the data using the neural network. The architecture of a neural network, including the size and number of layers, and the manner in which the weights are applied to data can vary between different neural networks configured to perform different computing tasks.

Distributed computing environments, such as a cloud-based computing environment, may include many individual computing devices connected over a network. Using distributed computing environments allows resources to be shared between the individual computing devices when executing workloads.

It is desirable to manage distributed computing environments more efficiently when executing workloads.

Summary

According to a first aspect of the present disclosure, there is provided a computer-implemented method for scheduling execution of neural networks in a computing environment, the method comprising: providing a model for predicting one or more execution characteristics of neural networks; obtaining input data representing traits associated with a said neural network, wherein the said neural network is to be executed in a computing environment; processing the input data using the model to determine prediction data including predictions of one or more execution characteristics of the said neural network; and scheduling execution of the said neural network in the computing environment based on the prediction data.

According to a second aspect of the present disclosure, there is provided a computer-implemented method for training a model to predict one or more execution characteristics of neural networks, the computer-implemented method comprising, for a plurality of training neural networks: executing a said training neural network in an computing environment; monitoring the execution of the said training neural network in the computing environment to determine one or more execution characteristics associated with the said training neural network; generating training data including an association between the said training neural network and the one or more execution characteristics; and training the model to predict one or more execution characteristics of neural networks based on the training data.

According to a third aspect of the present disclosure, there is provided a computer system comprising at least one processor and at least one storage comprising computer-executable instructions which when executed by the at least one processor cause the at least one processor to perform a method according to any one of the first or second aspects.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium including computer-executable instructions which, when executed by one or more processors, cause the processors to perform a method according to any one of the first or second aspects.

Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.

Brief Description of the Drawings

Figure 1_ is a schematic diagram showing a neural network according to examples; Figure 2 is a schematic diagram illustrating a computer-implemented method for scheduling the execution of neural networks in a computing environment according to examples; Figure 3 is a flow diagram showing the computer-implemented method for scheduling execution of neural networks shown in Figure 2; Figure 4A is a schematic diagram showing prediction data according to an example; Figure 4B is a schematic diagram showing prediction data according to an example which is different to the example shown in Figure 4B Figure 5 is a schematic diagram illustrating a computer-implemented method for scheduling the execution of neural networks according to examples which include obtaining data representing constraints; Figure 6 is a flow diagram showing a computer-implemented method according to the examples of Figure 5; Figure 7 is a schematic diagram illustrating a computer-implemented method for training a model to predict execution characteristics of neural networks according to examples; Figure 8 is a flow diagram showing the computer-implemented method illustrated in Figure 7, Figure 9 is a schematic diagram showing a computer system for implementing methods according to the examples shown in the previous figures; and Figure 10 is a schematic diagram showing a non-transitory computer-readable storage medium comprising instructions for executing methods according to the examples shown in Figures Ito 9.

Detailed Description

Certain examples described herein relate to efficiently scheduling the execution of neural networks in a computing environment. Neural networks can be configured to perform specific computer-implemented tasks. Applications of neural networks include image processing, such as computer vision techniques, audio processing, natural language processing, control systems, and so forth. The rapid development of new neural networks has led to the deployment of neural networks on computing devices having a range of specifications and abilities and to process datasets comprising a variety of data types and file sizes.

Mobile devices comprising low powered processors and limited memory facilities can be configured to be able to execute neural networks for computer vision related tasks in order to provide augmented reality experiences to users. Powerful servers and super computers can be deployed to process large volumes of data using machine learning techniques such as those including neural networks and can thereby provide process said data efficiently to provide insights.

Computing environments, such as cloud computing environments, refer to the use of on-demand availability computer system resources, in particular, relating to data storage and computing power, without direct active management by users. Cloud computing environments include geographically distributed or remote servers which are connected over networks, such as the internet. Other computing environments may include centrally located computing resources which are configurable on-demand.

Computing environments may either comprise a plurality of communicatively coupled computing devices connected over a network, such a wide area network, WAN, or a local area network. In other examples, a computing environment may include one or more computing devices wherein the computing resources in each computing device, such as storage and/or processing resources, can be configured, reserved, deployed, and redeployed when scheduling specific workloads. In this way, a particular amount of storage and/or processing resources can be selected for executing certain workloads in the computing environment. A workload may generally relate to any program or application running or to be run on a computer or in a computing environment.

Using computing environments such as those described above, including cloud computing environments, allows the resources of multiple physical computing devices to be shared. This can increase processing speeds, allow computing resources to be used more efficiently, and allow the scale-up or scale-down of computing resources in dependence on demand. Demand for these computing resources can fluctuate based on underlying periodic trends in demand and/or based on on-demand requests for use of certain computing resources.

Systems and methods described herein determine resource requirements suitable for executing a neural network and then schedule the execution of the neural network using appropriate resources in a computing environment. Using appropriate resources in a computing environment may include executing the neural network on a computing device in the computing environment which satisfies the minimum resource requirements of the neural network, or provides a target performance, for example relating to speed, when executing the neural network. In other cases, scheduling the execution of the neural network on appropriate resources may include scheduling the neural network to execute on a computing device, or using a determined set of computing resources, in the computing environment so as to minimise the resources used over the entire computing environment including where the computing environment executes one or more further workloads. This is done by scheduling a plurality of workloads across the computing environment, and where possible scheduling these workloads end to end using appropriate resources in the computing environment, in an attempt to minimise the down time of resources in the computing environment. A predictor model may be used to generate predictions of the execution characteristics, such as the computing resource requirements, associated with a neural network. A scheduler may schedule the execution of the neural network in a computing environment using the predictions generated by the predictor model.

Generally, scheduling the execution of neural networks efficiently in a computing environment presents difficulties. Due to the large variety, type, and size of neural networks, and the increasing speed at which they are being developed, the execution of neural networks can be unpredictable. Two neural networks which may initially look similar to a user, may execute very differently when run on the same hardware. One neural network may require greater amounts of storage, processing power, and energy to be executed than another neural network which generally looks similar. Being able to generate estimations of how a given neural network will execute in a computing environment, would allow more efficient scheduling of these neural networks. In this way, the latency, energy consumption, and speed at which neural networks are processed can be optimised.

Figure 1 shows an example of a neural network 100. The neural network 100 comprises an input layer 102, a plurality of hidden layers 104, and an output layer 106.

Each layer comprises a plurality of nodes 112A to 112C which are each connected to nodes in a subsequent layer by a plurality of edges. When processing data, for example when performing inference on a dataset, or processing test data, using a neural network 100, input data values received at the input layer 102, are multiplied by weights associated with the edges connected to the nodes 112A, 112B, 112C of the input layer 102. Arithmetic operations, such as addition and/or subtraction are applied to the results of the multiplications, and bias and activation functions are applied to the results of these arithmetic operations to obtain values for the nodes in the first hidden layer. The weight values associated with the edges may be stored in associated with the respective nodes to which they connect along with a respective bias and activation function. For example, the weight values may be stored in the form of an array or matrix for each node. The data values determined at the first hidden layer are then similarly processed and propagated to the second hidden layer where new values are determined. This process continues through the neural network until the final, fully connected, layer 106 in which the output values 110a and 110b are determined.

Neural networks 100 may be defined by a number of traits which govern how the neural network 100 is implemented. These traits include, for example, the number of layers in the neural network, the number of nodes in each layer of the neural network, the type of data which the neural network is configured to process, the weight values associated with nodes in the neural network, the task for which the neural network has been trained, and the architecture of the neural network, which defines the configuration of the neural network including the size, number of and arrangement of connections between layers of the neural network, as well as the bias and/or activation functions used in the neural network. Examples of specific architectures of neural networks include, Feed Forward (FF), Deep Feed Forward (DFF), Recurrent Neural Networks (RNN), Long/Short Term Memory (LSTM), Gated Recurrent Units (GRUs), Deep Convolutional Networks (DCN), Deep Convolutional Inverse Graphics Networks (DCIGN), Generative Adversarial Networks (GAN), Echo State Networks (ESN), and Support Vector Machine (SVM). It is to be appreciated that the examples of neural network architectures are provided for illustrative purposes and that this list is not exhaustive. Certain neural network architectures may be better suited to particular machine learning tasks than other neural networks and hence there may be an association between the task for which the neural network is trained The traits of a neural network 100 may also include a software package or library on which the neural network 100 is based or from which the neural network is designed. For example, neural networks 100 may be designed using PyTorchTm, a machine learning library based on the TorchTm library, used for applications such as computer vision and natural language processing. Other software libraries may also be used such as PandasTM, a Python library for providing data structures and data analysis tools for the Python programming language, and TensorFlowTm a free and open-source software library for machine learning. Software libraries, such as PyTorchnl, PandasTM, and TensorFlowTm provide a framework on which neural networks 100 can be designed.

The underlying framework on which the neural network is built will have an effect on the way in which the neural network 100 is executed on a computer. For example, some frameworks, such as PyTorchTm allow computation on Tensors with GPU acceleration and hence include processes which are designed for increased parallelizability and multi -threading.

Executing a neural network 100 generally refers to processing data using the neural network 100. This includes providing data to an input layer 102 of the neural network 100, and processing the data according to the weight values and nodes in the neural network 100 to obtain output data from the output layer 106. Different neural networks generally have different execution characteristics, where execution characteristics refer to the amount of computing resources used to execute the neural network, a power consumption used when executing the neural network, a total energy expenditure used to execute the neural network, and a duration of the execution of the neural network. The duration of the execution of the neural network may also be referred to as latency or execution time. The duration may include the total duration including loading and/or buffering data related to the execution of the neural network and the execution of processes involved when performing inference using the neural network. Alternatively, the duration of executing the neural network may relate to performing operations to execute the neural network The computing resources include an amount of storage to be used to store and execute the neural network 100 and/or an amount of processing resources to be used to execute the neural network. The amount of storage used to execute the neural network may include any suitable combination of volatile and non-volatile memory and may be measured in bytes. The amount of storage used to execute a neural network 100 may include non-volatile storage used for long term storage of data representing the neural network 100 and data to be processed by the neural network 100.

The amount of storage used to execute the neural network 100 may also include volatile memory used to execute the neural network 100 including memory from which input data is read and onto which output data is written by the processor when executing the neural network 100. For example, volatile memory may be used to store data representing inputs, such as input feature maps (IFMs) and outputs, such as output feature maps (OFMs) for each layer of the neural network 100. When executing the neural network 100, for example to perform inference, OEMs generated as outputs from hidden layers 104 are determined and stored in memory and may be used as an input for a subsequent layer of the neural network 100.

In some examples, when executing a neural network 100, data representing the neural network 100, such as data specifying the architecture of the neural network 100, the weight values of the neural network 100, and/or the bias and activation functions may be loaded onto Random Access Memory, RAM, from longer term non-volatile storage such as Read-Only Memory (ROM) or a Hard Disk Drive (MD), magnetic tape storage, or optical discs. An amount of storage used to execute the neural network 100 may include the non-volatile storage used for longer term storage of the relevant data and volatile storage to be accessed by a processor when executing the neural network 100.

Processing resources refers generally to a measure of processing power used to execute the neural network 100. In practice, the processing resources may be defined in terms of the number and/or type of processors to be used to execute the neural network 100. Processors in this context include general purpose processing units such as Central Processing Units, CPUs, Graphics Processing Units, GPUs, Neural Processing Units, NPUs, Image Signal Processor, ISP, or Image Processing Unit, IPU, or any other processing units for implementing processes based on computer-executable instructions. In some cases, the term processor may refer to a plurality of processing units which are communicatively coupled. For example, a computing device may house a processor, or processing circuitry, which comprises a collection of one or more CPUs, GPUs, and/or other processing units such as NPUs.

Generally, the performance of processors such as central processing units, CPUs, are measured in terms of a speed, measured in Gigahertz which refers to the number of cycles which the processor is capable of performing per second. For example, a CPU rated at 3.6 GHz indicates that the CPU is capable of performing 3.6 billion processes per second. However, the rated speed of a CPU is not the only metric by which the performance of CPU is measured. Other variables which may affect the overall performance of a processor include the number of physical and/or virtual cores which the processor has, the number of threads which the processor is capable of running simultaneously, and so forth. The architecture of certain processors may also be more suited to execute certain processes than others. For example, NPUs are processors which are specifically designed to perform machine learning tasks, such as Multiply And Accumulate (MAC) operations to execute convolutional neural networks. NPUs benefit from vastly simpler logic compared to CPUs which are adept at processing highly serialized instruction streams. This is because the workloads of NPUs tend to exhibit high regularity in the computational patterns of deep neural networks.

Certain execution characteristics of neural networks 100 may be interdependent. For example, where a powerful processor is required in order to execute the neural network 100, the power consumption will be larger than for neural networks 100 which can be executed with less powerful processors.

Figures 2 and 3 illustrate an example of a computer-implemented method 300 for predicting execution characteristics of a neural network 208 and scheduling the execution of the neural network 208 in a computing environment 216. The computer-implemented method 300 includes providing 302 a model 206 for predicting one or more execution characteristics of neural networks. The model 206 is trained to predict, or generate an estimation of, from one or more traits of a neural network, the execution characteristics of the neural network will be.

The model 206 is trained on data which represents the execution of a variety of different neural networks in a computing environment which is the same as, or similar to, the computing environment 216. In this way, the model 206 is able to generate predictions of the execution characteristics of similar neural networks to those which are represented in the data used to train the model 206. The model 206 may also be able to identify underlying patterns or trends in the execution characteristics of neural networks based on their associated traits. The model 206 may be able to extrapolate and apply these learned patterns or trends in order to determine the execution characteristics of neural networks which have traits which are outside of the scope of the traits exhibited by the neural networks from which the data used to train the model 206 is derived. The training of the model 206 will be described in more detail below with respect to Figures 7 and 8.

The neural network 208 shown in Figure 2 is a simplified diagram of a neural network provided for illustrative purposes only. When implementing the method 300, the neural network 208 may comprise any suitable number of nodes and layers, arranged in any particular architecture suitable for the task for which the neural network 208 is trained. The neural network 208 may be designed using computer program code based any of a plurality of suitable software packages, for example, PandasTM, PyTorchTm, TensorFlowTm, and so forth. The output of the neural network 208 may either be a univari ate or multivariate output comprising one or a plurality of data values respectively.

To predict the execution characteristics of the neural network 208, the computer-implemented method 300 includes obtaining 304 input data 202 representing one or more traits 204A to 204N associated with a neural network 208, wherein the neural network 208 is to be executed in the computing environment 216. For example, the input data 202 may include an indication of one or more traits 204A to 204N of the neural network 208 in the form of a list or an array. In other examples, the input data 202 may comprise a representation of the neural network 208 from which the traits 204A to 204N may be determined. As described above, the traits may relate to the number and/or size of layers of the neural network 208, the task for which the neural network 208 is trained, the architecture of the neural network 208, and any software package, or framework, based on which the neural network 208 is designed. In some examples, the input data 202 may comprise a first portion which includes metadata representing one or more traits 204A to 204N of the neural network 208, and a second portion comprising data representing the neural network 208 which is to be executed.

The payload data may include an indication of the architecture of the neural network 208 and weight values associated with the neural network 208.

The input data 202 may be obtained on the instruction of a user. For example, a user may provide an instruction indicating that the neural network 208 is to be executed in the computing environment 216, including providing data representing the neural network 208 to be executed. The input data 202 may be obtained, or derived from, the instruction from the user. Where the input data 202 represents a list or an array of the one or more traits 204A to 204N of the neural network 208, the input data 202 may be obtained by processing data representing the neural network to determine the one or more traits 204A to 204N associated with the neural network 208. This may include identifying the one or more traits from metadata and/or processing computer program code representing the neural network 208 to identify traits associated therewith.

The input data 202 is then processed 306 using the model 206 to determine prediction data 210 including predictions of one or more execution characteristics 212A to 212N of the neural network 208. As described above the execution characteristics 212A to 212N may include any one or more of, the computing resources to be used when executing the neural network 208, such as an amount of storage and/or an amount of processing resources, a power consumption to be used when executing the neural network 208, an energy consumption to be used when executing the neural network 208, and/or a duration of the execution of the neural network 208. In some examples, the model 206 may comprise a neural network which is trained to predict the one or more execution characteristics of the neural network 208 based on the one or more traits 204A to 204N. Where the model 206 comprises a neural network, the one or more traits 204A to 204N may be provided to the model 206 at an input later of the neural network of the model 206. The neural network included in the model 206 may have a multivariate output and as such may utilize multi-output regression.

In some examples, the computing resources in the computing environment 216 are arranged in a plurality of computing devices 218A, 218B, 218C. Each computing device 218A, 218B, 218C comprises a predetermined amount of storage, such as a predetermined combination of non-volatile and volatile storage, and processing resources. The computing devices 218A, 218B, 218C may include a variety of types of computing devices. In the example shown in Figure 2, the computing environment 216 includes personal desktop or laptop computers 218A, mobile devices 218B, such as smart telephones, smart watches, and/or tablet computers, and servers 218C. It will be

U

appreciated, however, that the computing environment 216 may include any suitable combination of computing devices 218A, 218B, 218C. Other examples of computing devices 218A, 218B, 218C beyond those shown in Figure 2 are also envisaged. Computing devices 218A, 218B, 218C may be added to and/or removed from the computing environment 216, either by an administrative computing device, or by the computing devices 218A, 218B, 218C. For example, a user of a computing device 218B may instruct the device 218B, via a user interface, to join or leave the computing environment 216. In other examples, an administrative computing device such as a management server, not shown, may requisition computing devices 218A, 218B, 218C for the computing environment 216 over a network.

Where the computing environment 216 includes a plurality of computing devices 218A to 218C of varying specification, the one or more execution characteristics 212A to 212N in the prediction data 210 may specify computing device criteria. The computing device criteria may directly relate to a type of, or specific, computing device 218A or may relate to computing resources which a potential computing device 218A should have to execute the neural network 208. In a first example, illustrated in Figure 4A, the predictions 212A to 212N of the prediction data 210 include an indication 402 of a first computing device 218A in the computing environment 216 on which the neural network 208 can be executed. The prediction data 210 also indicates predictions of one or more other execution characteristics of the neural network 208 if the neural network 208 was to be executed using the first computing device 218A. The prediction data 210 shown in Figure 4A also includes an indication 404 of a second computing device 218B on which the neural network 208 could be executed and the associated predicted execution characteristics for the second computing device 218B. In the example shown in Figure 4A the prediction data 210 indicates 402 that the neural network 208 could be executed on the first computing device 218A and doing so would require an energy consumption of EI = 1000mAh and would take a time Li = 1000 seconds. The prediction data 210 also indicates 404 that the neural network 208 could be executed on the second computing device 218B and that doing so would require an energy consumption of E2 = 3000mAh and would take a time L2 = 20000 seconds. I 3'

In a second example, illustrated in Figure 4B, the prediction data 210 includes predictions 406, 408, and 410 of the specific computing resources to be used to execute the neural network 208. In particular, the prediction data 210 indicates a processor type 406 and a minimum storage 410. The processor type 406 may be expressed in terms of a make and/or model of processor which can be used to execute the neural network 208 and/or may specify particular characteristics such as the number of cores, architecture, and clock speed for the processor.

Figure 4B shows an example where a minimum processor type PI and a recommended processor type P2 are both specified in the prediction data 210. The minimum processor type Pi represents a minimum specification of processor which is required to execute the neural network 208, and the recommended processor type P2 represents a specification of processor which provides a performance which is aligned with one or more target execution characteristics for the neural network 208. For example, a minimum processor PI specified for executing a convolutional neural network may be a low powered, general purpose processor found in a mobile device.

In this case, while the minimum processor type PI may be capable of executing the neural network 208 there may be considerable penalties with respect to energy efficiency and the length of time, or duration, it takes to execute the neural network 208. Whereas a recommended processor type P7 may be an NPU specifically designed to execute convolutional neural networks and as such may be capable of executing the neural network 208 more efficiently and quickly than the minimum processor type Pi. In the example shown in Figure 4B, the execution characteristics 410, 412, and 414 relating to the storage, energy consumption, and duration for executing the neural network 208 represent an average across the minimum processor type 406 and the recommended processor type 408. However, in other examples, each processor type Pi and P2 may be associated with their own respective storage, energy, and duration characteristics 410, 412, and 414. While the example of Figure 4B shows two processor types in the predicted execution characteristics, it will be appreciated that only one processor specification may be provided in the predicted execution characteristics.

Returning again to Figures 2 and 3, once the prediction data 210 is generated, the method 300 includes scheduling 308 the execution of the neural network 208 in the computing environment 216 based on the prediction data 210. In the example of Figures

N

2 and 3, a scheduler 214 is used to schedule the execution of the neural network 208 in the computing environment 216. The scheduler 214 uses the predicted execution characteristics 212A to 212N, for example computing device criteria, of the neural network 208 to identify suitable resources, such as a computing device 218A, in the computing environment 216 for executing the neural network 208 and schedules the neural network 208 to be executed on that computing device 218A at a determined time Ti. In other words, the scheduler 214 identifies a computing device 218A which satisfies the computing device criteria and schedules the execution of the neural network 208 on that computing device 218A To schedule the execution of neural networks in the computing environment 216, the scheduler is communicatively coupled to resources, such as the computing devices 218A to 218C, in the computing environment 216. The scheduler 214 may be responsible for scheduling the execution of a plurality of workloads, including further neural networks, in the computing environment 216. The scheduler 214 may be configured to implement a specific scheduling algorithm which aims to achieve one or more goals when scheduling workloads in the computing environment 216. In some examples, the scheduler 214 may prioritise the execution of the neural network 208. In other cases, the scheduler 214 may be configured to schedule workloads in a balanced manner such that the average latency and performance of the workloads is substantially even.

Generally, it is difficult to determine the expected resources a neural network will use when being executed, and this provides a challenge when trying to schedule their execution efficiently in a computing environment 216. By using the predicted execution characteristics of the neural network 208 to schedule the execution of the neural network 208 in the computing environment 216, it becomes possible to use the computing resources in the computing environment more efficiently. In some cases, it also becomes possible to execute neural networks in the computing environment 216 more quickly.

Returning again to Figure 2, the scheduling of a plurality of neural networks NN1 to NN6 in the computing environment 216 is shown, wherein each of the neural networks NNI to NN6 are scheduled to be executed on one of the computing devices 218A, 218B, 218C at a respective time T. The scheduler 214 may be implemented as any suitable combination of software and hardware for communicating with the computing environment 216 and scheduling the execution of the neural network 208. The scheduler 214 may be part of the computing environment, for example, one or more computing devices in the computing environment 216 may be instructed to perform the function of the scheduler 214. Alternatively, or additionally, a dedicated computing device such as a management server may be used to schedule the execution of neural networks 208 in the computing environment 216.

The scheduler 214 may be configured to achieve one or more goals when scheduling the execution of neural networks 208 in the computing environment 216. In some cases, the scheduler 214 may be tasked with scheduling the execution of the neural networks 208 in the computing environment efficiently. That is to say, the scheduler 214 may attempt to minimize the total energy expenditure used in the computing environment 216. In other examples, the scheduler 214 may be tasked with achieving a suitable trade-off between energy consumption and total duration for executing the neural networks 208. In other examples, the scheduler 214 may be configured to schedule the neural networks 208 in the computing environment 216 in order to complete the execution of the neural networks 208 as quickly as possible, even if excessive energy is consumed. Illustrative examples of scheduling algorithms which may be implemented by the scheduler 214 include First Come First Serve (FCFS), Shortest-Job-First (SJF), Shortest Remaining Time, Priority Scheduling, Round Robin Scheduling, and Multilevel Queue Scheduling, Directed Acyclic Graphs (DAG) Task Scheduling for heterogenous systems may be used, for example, Heterogeneous Earliest Finish Time (HEFT), Critical Path on Processor (CPOP), and Performance Effective Task Scheduling (PETS).

Figures 5 and 6 show a computer-implemented method 600 which is similar to the method 300 shown in Figures 2 and 3 but which also comprises obtaining 602 data 502 identifying one or more constraints 504A to 504N associated with the execution of the neural network 208. Where the method 600 includes obtaining the data 502 identifying one or more constraints 504A to 504N, the prediction data 210 represents predictions of the one or more execution characteristics 212A to 212N of the neural network 208 according to the one or more constraints 504A to 504N. The constraints 504A to 504N represent restrictions in the computing environment 216 which are related to one or more of the execution characteristics 212A to 212N of the neural network 208. For example, the one or more constraints 504A to 504N may specify characteristics of the computing environment such as available computing resources in the computing environment 216, available energy in the computing environment 216, and so forth. Providing one or more constraints 504A to 504N to the model 206 when generating predictions of execution characteristics 212A to 212N allows the predictions to be sensitive to any constraints 504A to 504N which may be imposed on the execution of the neural network 208.

In some cases, the constraints 504A to 504N may include selections of one or more execution characteristics 212A to 212N, such as a duration in which the neural network 208 is to be executed, an energy consumption to be used when executing the neural network 208, an amount of storage to be used to execute the neural network 208 and so forth.

By providing a constraint 504A to 504N on one or more of the execution characteristics 212A to 212N of the neural network 208, the prediction of the other execution characteristics 212A to 212N may change. For example, where a first constraint 504A specifies that the neural network 208 is to be executed within 2000 seconds, the predicted computing resources for executing the neural network 208 may specify that a more powerful processor is required to execute the neural network 208 than where the neural network 208 may be executed within 4000 seconds. Similarly, the one or more constraints 504A to 504N may specify a processor type, an amount of storage, and or an energy or power consumption for executing the neural network 208. The data 502 representing the one or more constraints 504A to 504N may be provided by an administrator, or may be specified when the neural network 208 is sent to be executed in the computing environment 216. For example, a user may manually specify one or more constraints 504A to 504N relating to the execution of the neural network 208. Alternatively, the data 502 representing the one or more constraints 504A to 504N may be determined based on the computing environment 216 in which the neural network 208 is to be executed. In particular, the one or more constraints 504A to 504N may be determined based on any one or more of, the computing resources in the computing environment 216, an amount of energy available for use in the computing F' environment, and scheduling data representing the scheduling of one or more workloads executing or to be executed in the computing environment 216.

Determining the one or more constraints 504A to 504N in this way, makes it possible for the predictions of the execution characteristics to be more sensitive to the characteristics of the computing environment 216 and hence provide more useful predictions. That is to say the predictions of the execution characteristics 212A to 212N may not include execution characteristics 212A to 212N which cannot be satisfied by the computing environment 216. For example, if one or more of the computing devices 218A are not available, the predictions may specify execution characteristics associated with the execution of the neural network 208 on an alternative computing device 218B and 218C.

The available resources and characteristics such as the available energy of the computing environment 216 may change over time. As described above, the configuration of the computing environment 216 can change over time as computing devices 218A to 21 SC are added to and removed from the computing environment 216.

The changing configuration of the computing environment 216 can include the changing of the one or more constraints 504A to 504N. Hence obtaining data 502 representing the one or more constraints 504A to 504N may allow the prediction of execution characteristics 212A to 212N to be sensitive to a changing state of the computing environment 216 in which the neural network is to be executed 208.

Where the one or more constraints 504A to 504N are based on the computing environment 216, the scheduling of workloads in the computing environment can influence the constraints 504A to 504N. For example, the scheduling of a workload in the computing environment 216 may cause the certain resources to be reserved for that workload for a predetermined period of time, or may cause the reduction in one or more resources in the computing environment 216. One such example is where the computing environment 216 comprises mobile devices 218B, and the available energy supplied in the batteries of these mobile devices 218B changes over time as batteries of these mobile devices 218B are charged and discharged.

Obtaining 602 the data 502 identifying the one or more constraints 504A to 504N may include, monitoring the computing environment 216 to make determinations of the computing resources in the computing environment 216 and/or the amount of energy available for use in the computing environment. Scheduling data is also obtained and the data 502 identifSing the one or more constraints 504A to 504N is then generated using the scheduling data and the determinations. Monitoring the computing environment 216 includes accessing metadata associated with the computing environment 216, for example, a management server of the computing environment 216 may be accessed to obtain metadata including lists of computing resources included in the computing environment 216.

Determining the one or more constraints 504A to 504N may involve predicting one or more the constraints at a future time when the neural network 208 is likely to be executed. The state of the computing environment 216 at a time when the neural network 208 is likely to be executed, in other words at some future time, may be determined from the computing resources in the computing environment 216, the amount of energy available in the computing environment 216 and the scheduling data. For example, where the computing environment 216 is being used to execute one or more workloads, the energy available for use in the computing environment 216 may change by the time the neural network 208 is to be executed. Hence, a constraint 504B on the energy available for use in the computing environment 216 may be based on a predicted energy available for use when the neural network 208 is to be executed in the computing environment 216 which is predicted from the energy available for use in the computing environment and the scheduling data.

Figures 7 and 8 show a computer-implemented method 800 for training the model 206. The method 800 includes, for a plurality of training neural networks 702A to 702E, executing 802 the training neural network 702A in a computing environment 216. In the example shown in Figure 7, the computing environment 216 in which the training neural network 702A is executed, is the same as the computing environment 216 in which the neural network 208 is to be executed. However, it is to be appreciated that the computing environment used to execute the training neural network 702A when training the model 206 may be a different computing environment to the computing environment 216 in which the neural network 208 is to be executed. The computing environment used for training may include a larger number and/or a larger variety of computing devices than the computing environment 216. In some cases, the computing environment may be a different but similar computing environment to the computing environment 216 in which the neural network 208 is to be executed. For example, the computing environment 216 used when training the model 206 may include similar, or at least some of the same, computing devices as the computing environment 216 which is to be used to execute the neural network 208.

In some cases, a training specific computing environment, not shown, may be provided. The training specific computing environment may be configurable to imitate any other computing environment for which the method 300, 600 is to be applied. In this way, a specific model 206 may be trained for each computing environment where the computer-implemented method 300, 600 for scheduling the execution of neural networks 208 is to be performed.

The method 800 for training the model 206 includes monitoring 804 the execution of the training neural network 702A in the computing environment 216 to determine one or more execution characteristics 704A to 704F associated with the said training neural network 702A. For example, the energy usage, power consumption, and computing resources used when executing the neural network 702A are monitored and data representing these characteristics is stored. Training data 710A, 710B, and 710C is then generated 806 which includes an association between the training neural network 702A and the one or more execution characteristics 704A to 704F. This association may also include an association between one or more traits of the training neural network 702A and the one or more execution characteristics 704A to 704F of that training neural network 702A.

In the example shown in Figure 7, the computing environment 216 is arranged in a plurality of computing devices 218A to 218C. In this case, executing the training neural network 702A in the computing environment 216 includes executing the training neural network 702A on the plurality of computing devices 218A to 218C. The training data 710A to 710C includes training data relating to the execution characteristics 704A to 704F of the training neural network 702A on each of the computing devices 218A to 218C. The training neural network 702A may be executed on the plurality of computing devices 218A to 218C concurrently or at different times. There may be cases where training neural network 702A may not be executed on every computing device 218A to 218C in the computing environment 216. For example, where the specifications of one or more of the devices are far below the expected requirements for executing the training neural network 702A or neural networks of a similar type. In this way, computing resources may not be used in trying to run a training neural network 702A on a computing device which is very unlikely to be able to execute the neural network 702A, thereby increasing efficiency.

Where the computing environment 216 is arranged in a plurality of computing devices 218A to 218C, such as in the example shown in Figure 7, the execution of the training neural network 702A is monitored on the plurality of computing devices 218A to 218C to determine one or more execution characteristics 704A to 704F associated with the training neural network 702A for each of the plurality of devices 218A to 218C on which it is executed. The method 800 is repeated for each of the training neural networks 702B to 702E. Such that the training data 710A to 710C represents associations between each of the training neural networks 702A to 702E and their associated execution characteristics 704A to 708F.

The training data 710A to 710C is then used to train 808 the model 206 to predict one or more execution characteristics of neural networks. As described briefly above, the model 206 may include a neural network. Where the model 206 comprises a neural network, training 808 the model 206 using the training data 710A to 710C may include modifying parameters in the neural network of the model 206, such as a weight values, to minimise a loss function based on the training data 710A to 710C.

The model 206 may include any of a plurality of suitable neural network architectures. In one example, the model 206 includes a generative adversarial neural network architecture which is trained to make predictions of execution characteristics 212A to 212N of a neural network 208 based one or more traits 204A to 204 of the neural network 208. In this example, the model 206 includes an encoder which is configured to process the one or more traits 204A to 204N provided to an input layer to determine a one or more variables representing a lower dimensional latent space. A decoder is provided to operate on the one or more variables representing the lower dimensional latent space to determine values in an output space representing one or more execution characteristics 212A to 212N of the neural network 208.

In some examples, the model 206 may comprise a plurality of neural networks.

For example, the model 206 may include a plurality of neural networks, each trained according to different training data, or to achieve a different goal. A first neural network of the model 206 may be trained to determine a minimum specification for one or more of the execution characteristics of the neural network 208, such as the computing resources. A second neural network of the model 206 may be trained to determine a recommended specification for one or more of the execution characteristics of the neural network 208. The one or more predicted execution characteristics from the model 206 may include outputs from each of the first and second neural network.

The training data 710A to 710C may include labelled training data which categorises the performance of the execution of the training neural network 702A with respect to each of the computing devices 218A to 218C. For example, the performance of the training neural network 702A when executed on each of the computing devices 218A to 218C may be assigned one or more numerical ratings according to one or more parameters representing the performance of the training neural network 702A. These numerical ratings may be used to categorise the training data 7101 to 710C such that the model 206 can be trained to predict execution characteristics of a neural network according to one or more desired ratings. For example, where the computing resources available to a first computing device 218A are insufficient for executing the training neural network 702A, the training data 710A representing the execution characteristics 704A and 704B of the training neural network 702A on this computing device 218A may be assigned a rating of 1. Where the computing resources available to a second computing device 218B are sufficient for executing the training neural network 702A, the training data 710B representing the execution characteristics 704C and 704D of the training neural network 702A on this computing device 218A may be assigned a rating of 2. These ratings may be used to categorize training data used to train the neural network 208.

In other examples, the training data 710A to 710C may be segmented such that the neural network in the model 206 is trained on training data 710A to 710C representing an association between the training neural network 702A and computing devices 218A, or resources in the computing environment 216, which provide a desired performance.

The model 206 may be trained to provide a variety of types of predictions of execution characteristics, which depend on the training data 710A to 710C and the method by which the model 206 is trained. In a first example, the model 206 may be trained to predict the execution characteristics 212A to 212N of the neural network 208 which represent a minimum performance of the neural network 208 in the computing environment. That is to say, the predictions determined by the model 206 may specify a minimum required amount of computing resources to execute the neural network 208, and associated predictions of energy usage, power consumption, and duration. In a second example, the model 206 may be trained to predict recommended execution characteristics, for example, representing a recommended amount of computing resources to execute the neural network 208, and associated predictions of energy usage, power consumption, and duration.

Figure 9 shows a computer system 900 comprising at least one processor 902 and at least one storage 904. The storage comprises computer-executable instructions which when executed by the at least one processor 900 cause the at least one processor 902 to perform a method 300, 600, 800 according to any of the examples described herein. The computer-system 900 shown also includes one or more communication modules (906) for communicating with the computing environment 216 and/or other computing devices. The computer-system 900 shown also includes a user interface 908, which may be implemented in the form of one or more display devices and/or input devices such as a touch screen, keyboard, mouse, input port, or any other suitable input device. Where the computing environment 216 includes a network of computing devices 218A to 218C remotely located from the computer system 900, the computer system 900 may communicate with the network of computing devices 218A to 218C using the one or more communication modules 906.

In some examples, the computer system 900 is implemented as an individual computing device configured to predict execution characteristics 212A to 212N of the neural network 208 and to schedule the execution of the neural network 208 in the computing environment 216. In other examples the computer system 900 includes a plurality of distributed computing devices arranged to perform at least one of the computer-implemented methods 300, 600, and/or 800.

In some cases, the computer system 900 may include the computing environment 216 in which the neural network 208 is to be executed. For example, the computer system may include the computing environment 216 and one or more further computing device for predicting and scheduling the execution of neural networks in the computing environment 216.

Figure 10 shows a non-transitory computer-readable storage medium 100 comprising computer-executable instructions 1002 which, when executed by one or more processors 1004, cause the processors 1004 to perform any one or more of the computer-implemented methods 300, 600 and 800 described herein.

The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged. For example, while the use of a generative adversarial neural network has been described for use in the model 206, it will be appreciated that other deep learning neural networks could also be employed in the model 206. It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments.

1.5 Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.

Claims

CLAIMS1. A computer-implemented method for scheduling execution of neural networks in a computing environment, the method comprising: providing a model for predicting one or more execution characteristics of neural networks; obtaining input data representing one or more traits associated with a said neural network, wherein the said neural network is to be executed in a computing environment; processing the input data using the model to determine prediction data including predictions of one or more execution characteristics of the said neural network; and scheduling execution of the said neural network in the computing environment based on the prediction data.
2. The computer-implemented method of claim 1, wherein the one or more execution characteristics of the said neural network include any one or more of computing resources to be used when executing the said neural network, the computing resources including any one or more of: an amount of storage to be used when executing the said neural network; or an amount of processing resources to be used when executing the said neural network, a power consumption to be used when executing the said neural network; an energy consumption to be used when executing the said neural network; or a duration of the execution of the said neural network.
3. The computer-implemented method of claim 2, wherein the prediction data includes predictions of two or more execution characteristics of the said neural network and wherein the two or more execution characteristics are interdependent.
4 The computer-implemented method of any previous claim, wherein the method comprises obtaining data identifying one or more constraints associated with the execution of the said neural network, and wherein the prediction data represents predictions of one or more execution characteristics of the said neural network according to the one or more constraints.
5. The computer-implemented method of claim 4, wherein the one or more constraints include selections of one or more execution characteristics.
6. The computer-implemented method of claim 4 or claim 5, wherein the data identifying the one or more constraints is determined based on the computing environment in which the said neural network is to be executed.
7. The computer-implemented method of claim 5, wherein the one or more constraints are determined based on at least one of computing resources in the computing environment, including any one or more of: an amount of storage in the computing environment; or an amount of processing resources in the computing environment; an amount of energy available for use in the computing environment; or scheduling data representing the scheduling of one or more workloads executing and/or to be executed in the computing environment.
8. The computer-implemented method of claim 7, wherein obtaining the data identifying the one or more constraints includes: monitoring the computing environment to make determinations of one or more of: the computing resources in the computing environment, or the amount of energy available for use in the computing environment; obtaining the scheduling data, and generating the data identifying the one or more constraints using the scheduling data and the determinations.
9 The computer-implemented method of any preceding claim, wherein computing resources in the computing environment are arranged in a plurality of computing devices, each computing device comprising a predetermined amount of storage and processing resources.
10. The computer-implemented method of claim 9, wherein the one or more execution characteristics specify computing device criteria and wherein scheduling the execution of the said neural network in the computing environment includes scheduling the said neural network to be executed on a said computing device in the computing environment which satisfies the computing device criteria.
11. The computer-implemented method of any preceding claim, wherein the model is trained by, for a plurality of training neural networks: executing a said training neural network in the computing environment; monitoring the execution of the said training neural network in the computing environment to determine one or more execution characteristics associated with the said training neural network; generating training data including an association between the said training neural network and the one or more execution characteristics; and training the model to predict one or more execution characteristics of neural networks based on the training data.
12. The computer-implemented method of claim 11, wherein the computing resources in the computing environment are arranged in a plurality of computing devices, each computing device comprising a predetermined amount of storage and processing resources, wherein executing the said training neural network in the computing environment includes executing the said training neural network on the plurality of computing devices, and wherein monitoring the execution of the said neural networks includes monitoring the execution of the said training neural network on the plurality of computing devices to determine one or more execution characteristics associated with the said training neural network for each of the plurality of devices
13. The computer-implemented method of any preceding claim, wherein the model comprises a neural network, and wherein the model is trained to predict one or more execution characteristic of neural networks by modifying parameters in the neural network based on training data.
14. A computer-implemented method for training a model to predict one or more execution characteristics of neural networks, the computer-implemented method comprising, for a plurality of training neural networks: executing a said training neural network in a computing environment; monitoring the execution of the said training neural network in the computing environment to determine one or more execution characteristics associated with the said training neural network; generating training data including an association between the said training neural network and the one or more execution characteristics; and training the model to predict one or more execution characteristics of neural networks based on the training data.
15. The computer-implemented method of claim 14, wherein the model comprises a neural network and wherein the training the model to predict one or more execution characteristics of neural networks based on the training data includes modifying the parameters of the neural network based on the training data.
16. A computer system comprising at least one processor and at least one storage comprising computer-executable instructions which when executed by the at least one processor cause the at least one processor to perform a method according to any one or more of claims 1 to 15
17. The computer system of claim 16, further comprising the computing environment on which the said neural network is to be executed.
18. The computer system of claim 16, further comprising one or more communication modules for communicating with the computing environment.
19. The computer system of claim 18, wherein the computing environment comprises a network of computing devices located remotely from the computer system, and wherein the computer system communicates with the network of computing devices using the one or more communication modules.
20. A non-transitory computer-readable storage medium including computer-executable instructions which, when executed by one or more processors, cause the processors to perform a method according to any one of claims 1 to 15