EP4315055A1

EP4315055A1 - Managing deployment of an application

Info

Publication number: EP4315055A1
Application number: EP21718201.3A
Authority: EP
Inventors: Mohammad ABU LEBDEH; Dániel GÉHBERGER; Martin Julien
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2024-02-07
Also published as: WO2022208137A1

Abstract

A computer implemented method is disclosed for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. The method comprises obtaining measured values of a performance parameter during execution of the kernel on computing resources of available computing nodes in the network and, for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available, generating predicted values of the performance parameter. The method further comprises selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application, and initiating deployment of the application on the selected candidate computing node. The process of generating predicted values of the performance parameter is based on a process of clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources.

Description

Managing Deployment of an Application

Technical Field

The present disclosure relates to a computer implemented method for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. The method is performed by a management node. The present disclosure also relates to a management node, and to a computer program product configured, when run on a computer, to carry out a method for managing deployment of an application.

Background

Parallel computing is a popular technique enabled by the widespread adoption of multi core Central Processing Units (CPUs) and accelerators, including for example Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). Accelerators are special-purpose processing devices designed to speed up parallel and compute intensive sections of applications. Accelerators are an increasingly popular tool for assisting general-purpose processors in running applications, offering the possibility to offload complex and intensive computation functions (or tasks) from CPUs to accelerators. Applications that have computation functions or tasks that can be offloaded to accelerators are known as hardware-accelerated applications. Such applications comprise two main components: the code that runs on the host computer, and one or more functions that can be offloaded to accelerator devices. Functions that can be offloaded generally comprise one or more highly parallel computing tasks and are referred to as kernels.

Different hardware-accelerated applications can place different demands on accelerators, and accelerators themselves can vary widely, being designed by different vendors and comprising different hardware architecture, middleware support, and programming models. Consequently, performance of hardware-accelerated applications, and individual kernels of such applications, may vary according to the accelerator on which they are executed. One difficulty for hardware-accelerated applications is how to appropriately select one of the available computing resources on a computing node to run a given kernel in an application so that the full capability of the computing node is exploited. When an application is deployed on a certain computing node, a decision must be taken for every kernel in that application whether it will be executed on the CPU, or will be offloaded to an available accelerator on the node, and if so, to which accelerator it will be offloaded, as a computing node may comprise multiple accelerators. This task becomes more challenging when an application is to be deployed on infrastructure with heterogeneous resources, such as an edge computing infrastructure. Edge resources are inherently highly heterogeneous (including for example different models of CPU, GPU, and FPGAs from multiple different vendors), and are also resource-constrained compared to cloud resources. A hardware-accelerated application may therefore be deployed on a large variety of computing node types over time, implying a need to find an optimal offloading configuration (where to run every kernel of the application) for different node types.

The simplest approach to finding an offloading configuration for an application is the manual method in which a user decides where to run each kernel and then configures the application accordingly. This static approach has obvious disadvantages for deployment in architectures for Edge and Fog computing, in which a developer may not know in advance the hardware on which the application will be deployed, as well as an inability to scale.

Automatic selection of an available hardware device for execution of a kernel has generally followed one of two main approaches. The first approach relies on performance metrics collected by executing the applications and their kernels. For example, in S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach, “Accelerating compute-intensive applications with GPUs and FPGAs,” 2008 Symposium on Application Specific Processors, 2008, pp. 101-107, the authors propose multiple schemas to automatically select either a CPU or GPU to run application kernels. The key idea is to run the kernels using a subset of the dataset on both CPU and GPU, measure the execution time, and then select the faster computing resource to run the kernels with the complete dataset. This requirement to run each kernel in the application on every computing resource available in a node leads to high management overhead and inefficiency in resource use. The second approach to automatic selection of a hardware device uses performance prediction to decide where to execute a kernel. For example, in Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. De Bosschere, “Performance prediction based on inherent program similarity,” 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep. 2006, pp. 114-122, the authors use microarchitecture-independent characteristics of an application to predict its performance on a specific computing node. These characteristics are measured by running an instrumented binary of the application, which is generated using an instrumenting tool. The measured characteristics are related to the characteristics of previously profiled programs from a benchmark suite, following which performance prediction may be carried out based on application similarity. In other examples, the functionalities of the JAVA JIT compiler and runtime capabilities may be extended to automatically select a preferred computing resource, or application code may be analyzed and feature extraction performed, on the basis of which a Machine Learning model may infer where to execute a kernel.

Kernel (or application) similarity can play an important role in predicting kernel performance, as well as eliminating redundancy from a set of kernels. One example application similarity detection mechanism relies on architecture-dependent characteristics collected by running the application using a special profiling tool. Another example mechanism uses architecture-independent characteristics to characterize General Purpose computing on GPU (GPGPU) workloads and identify the similarities between GPGPU kernels.

The above discussed examples of the performance prediction based automated approach to application deployment all suffer from one or more disadvantages. For example, some examples are programming language dependent, relying on compile time application characteristics. Several examples require provision of application source code to the management platform, which may not be practical in all domains. In addition, examples which use architecture-dependent or architecture-independent runtime characteristics require specialized profiling tool, updating source code, or extending runtime execution environment in order to collect these characteristics. Summary

It is an aim of the present disclosure to provide methods, a management node and a computer readable medium which at least partially address one or more of the challenges mentioned above. It is a further aim of the present disclosure to provide methods, a management node and a computer readable medium which cooperate to facilitate selection of a computing node for deployment of an application comprising one or more kernels.

According to a first aspect of the present disclosure, there is provided a computer implemented method for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. The method, performed by a management node of the network, comprises obtaining measured values of a performance parameter during execution of the kernel on computing resources of available computing nodes in the network. The method further comprises, for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available, generating predicted values of the performance parameter. The method further comprises selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application, and initiating deployment of the application on the selected candidate computing node. Generating predicted values of the performance parameter comprises, for kernels of applications that have been deployed on computing nodes of the network, obtaining measured values of the performance parameter during execution of the kernels on computing resources of computing nodes, which computing resources are of a first computing resource type, and clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources. Generating predicted values of the performance parameter further comprises, for the kernel cluster containing the kernel of the application, determining a mapping between performance parameter values of different kernels in the kernel cluster, and using the determined mapping to predict a value of the performance parameter during execution of the kernel of the application on a computing resource of an available network computing node for which measured performance parameter values are not available and which is of the first computing resource type. According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform a method according to any one or more of the aspects or examples of the present disclosure.

According to another aspect of the present disclosure, there is provided a management node for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. The management node comprises processing circuitry configured to cause the management node to obtain measured values of a performance parameter during execution of the kernel on computing resources of available computing nodes in the network. The processing circuitry is further configured to cause the management node to, for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available, generate predicted values of the performance parameter. The processing circuitry is further configured to cause the management node to select, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application, and initiate deployment of the application on the selected candidate computing node. Generating predicted values of the performance parameter comprises, for kernels of applications that have been deployed on computing nodes of the network, obtaining measured values of the performance parameter during execution of the kernels on computing resources of computing nodes, which computing resources are of a first computing resource type, and clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources. Generating predicted values of the performance parameter further comprises, for the kernel cluster containing the kernel of the application, determining a mapping between performance parameter values of different kernels in the kernel cluster, and using the determined mapping to predict a value of the performance parameter during execution of the kernel of the application on a computing resource of an available network computing node for which measured performance parameter values are not available and which is of the first computing resource type. Aspects of the present disclosure thus provide a method and nodes that facilitate automated deployment and configuration of hardware-accelerated applications, for example in an edge environment of a network. Example methods according to the present disclosure can select a candidate computing node, for example from a set of heterogeneous computing nodes, based on measured and predicted performance for one or more kernels of the application. Aspects of the present disclosure also propose a kernel similarity mechanism which may be used to predict the performance of kernels, minimizing management overhead with system evolution by avoiding unnecessary kernel performance benchmarking. The kernel similarity mechanism is language, compiler, and accelerator independent, and may consequently be implemented in a wide range of existing management platforms.

Brief Description of the Drawings

For a better understanding of the present disclosure, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the following drawings in which:

Figure 1 is a flow chart illustrating process steps in a method for managing deployment of an application;

Figures 2a to 2e show flow charts illustrating process steps in another example of a method for managing deployment of an application;

Figure 3 is a block diagram illustrating functional modules in a management node;

Figure 4 is a block diagram illustrating functional modules in another example of a management node;

Figure 5 illustrates an example implementation architecture for the methods of Figures 1 to 2e;

Figure 6 illustrates kernel similarity; and

Figures 7 to 12 are sequence diagrams illustrating implementation of the methods of Figures 1 to 2e. Detailed Description

As discussed above, examples of the present disclosure provide a method and nodes that enable selection of a candidate computing node, from a set of potentially heterogeneous computing nodes, for deployment of an application comprising one or more kernels. Some examples of methods disclosed herein additionally generate an offloading configuration suitable for the selected node, so as to achieve performance for the application that is optimized with respect to one or more parameters for the application such as execution time, energy consumption, etc. The methods disclosed herein propose a mechanism for evaluating kernel similarity that is based upon a pattern of variation of measured performance parameter values during execution of kernels on computing resources.

Figure 1 is a flow chart illustrating process steps in a method 100 for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. For the purpose of the present specification, a computing resource is considered to encompass any processing device in a computing node, including for example a CPU, GPU, FPGA, etc. A kernel comprises a function within an application, which function is operable for deployment on an accelerated processing device, such as a GPU, FPGA or one or more cores of a CPU.

The method 100 is performed by a management node, which may comprise a physical or virtual node, and may be implemented in a computing device, server apparatus and/or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. The management node may comprise or be instantiated in any part of a network, for example in a logical core network node of a communication network, network management centre, network operations centre, radio access network node etc. A radio access network node may comprise a base station, eNodeB, gNodeB, or any other current of future implementation of functionality facilitating the exchange of radio network signals between nodes and/or users of a communication network. Any communication network node may itself be divided between several logical and/or physical functions, and any one or more parts of the management node may be instantiated in one or more logical or physical functions of a communication network node. The management node may therefore encompass multiple logical entities, as discussed in greater detail below. Referring to Figure 1, the method 100 comprises, in a first step 110, obtaining measured values of a performance parameter during execution of the kernel of the application to be deployed on computing resources of available computing nodes in the network. The performance parameter during execution of the kernels may be execution time, energy expenditure, or any other parameter by which performance of the kernel may be measured. In some examples of step 110, measured values of a plurality of performance parameters may be obtained. Values of one or more performance parameters for the kernel may have been measured for example during a previous deployment of the application, and stored for future reference. In step 120, the method 100 comprises, for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available, generating predicted values of the performance parameter. The method then comprises selecting a candidate computing node for deployment of the application based on the measured and predicted values of the performance parameter in step 130, and initiating deployment of the application on the selected candidate computing node in step 140.

As illustrated in Figure 1, the step 120 of generating predicted values of the performance parameter comprises a plurality of sub-steps. Generating predicted values of the performance parameter first comprises, as illustrated at 120c, performing steps 120a and 120b for kernels of applications that have been deployed on computing nodes of the network. Step 120a comprises obtaining measured values of the performance parameter during execution of the kernels on computing resources of computing nodes, which computing resources are of a first computing resource type. Computing resource type may refer to the hardware, software, configuration and/or manufacturer of the computing resource, and example computing resource types include GPU, CPU, and FPGA. Computing resources of computing nodes in the network may consequently be of many different types. The measured performance parameter values obtained in step 120a are for execution of kernels on resources of the same type, referred to in the method 100 as a first computing resource type. Step 120b comprises clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources. As discussed above, kernels may perform differently on different computing resources, and thus a pattern may be observed of how a performance parameter such as execution time, energy expenditure, etc. varies with execution of the same kernel on different GPUs, CPUs etc. This variation pattern is used as a clustering parameter in step 120b, to group kernels whose performance parameter varies in a similar manner across deployment on different computing resources of the same type. Generating predicted values of a performance parameter further comprises, in step 120d, determining, for the kernel cluster containing the kernel of the application, a mapping between performance parameter values of different kernels in the kernel cluster. Finally, generating predicted values of the performance parameter comprises using the determined mapping to predict a value of the performance parameter during execution of the kernel of the application on a computing resource of an available network computing node for which measured performance parameter values are not available and which is of the first computing resource type. It will be appreciated that the mapping may take a range of different forms, and in some examples, a plurality of transfer functions between individual kernels of the kernel cluster may be determined, with a combination such as an average used to assemble predictions based on transfer functions from different kernels in the kernel cluster. The kernel similarity mechanism of the method 100, and additional detail of how kernel similarity may be used in accordance with the method 100 to predict performance parameter values, is discussed in greater detail below with reference to Figure 6.

It will be appreciated that while the method 100 makes reference to a single kernel in an application, the application to be deployed may comprise a plurality of kernels, and the steps of the method 100 may be carried out for one, some or all of the kernels of the application. It will also be appreciated that, as discussed above, a range of performance parameters may be of interest for execution of kernels of the application, and the steps of the method 100 may consequently be carried out for a plurality of different performance parameters, and/or for a combination of two or more performance parameters.

As discussed above, for situations in which measured performance data is not available for execution of a kernel on all available computing resources, existing approaches seek to compensate for this lacking information either through extensive testing that is highly resource intensive and inefficient, or through performance prediction. The various existing performance prediction methods all seek to understand in some way what the kernel is doing, and all suffer from drawbacks including requiring source code, requiring specialized profiling tools, extended runtime execution environments, etc. In contrast, the method 100 exploits observed historical behavior to cluster similarly behaving kernels together, and uses observations from similar kernels to predict behavior of a kernel of interest. This observation-based approach, in addition to avoiding the disadvantages of previous methods, is both dynamic and easily scalable, meaning it can be used for the deployment of large numbers of applications on a highly heterogeneous and evolving infrastructure, such as may be found in an Edge cloud or Fog deployment.

Figures 2a to 2e show flow charts illustrating process steps in further examples of method 200 for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. The method 200 provides various examples of how the steps of the method 100 may be implemented and supplemented to achieve the above discussed and additional functionality. As for the method 100, the method 200 is performed by a management node, which may be a physical or virtual node, and which may encompass multiple logical entities, as discussed more fully above with reference to Figure 1.

The management node performing the method 200 may previously have received or obtained a deployment package for the application to be deployed, which package may include for example the application binaries, one or more test suites, and a manifest file. It is envisaged that a manifest file may for example contain identification information for the application, the name and versions of the one or more kernels that may be offloaded, the types of computing resources to which the kernels can be offloaded, and/or application performance requirements, for example setting upper or lower bounds on one or more performance parameters for the application.

Referring to Figure 2a, in a first step 210, the management node obtains measured values of a performance parameter during execution of the one or more kernels of the application to be deployed on computing resources of available computing nodes in the network. These values may for example have been measured during previous deployments of the application and stored in a suitable repository, which may be maintained by or accessible to the management node.

It will be appreciated that measured performance parameter values during execution of the kernel may in many circumstances not be available for all computing resources of all available computing nodes in a network. Particularly in the case of heterogeneous and evolving network architectures, a particular application may not have been deployed on all computing nodes or types of computing node that are available in the network, and it is likely that all kernels of the application will not have been deployed on all computing resources or types of computing resource that are available at the computing nodes. In step 220, the management node therefore generates predicted values of the performance parameter for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available.

The process of generating predicted values of the performance parameter is illustrated in Figure 1, and discussed in greater detail below with reference to Figure 6.

Having generated predicted values of the performance parameter for execution of the one or more kernels of the application on computing resources of available computing nodes for which measured performance parameter values were not available, the management node then, in step 230, selects a candidate computing node for deployment of the application, based on the measured and predicted values of the performance parameter. As illustrated at 230i, this may comprise selecting as the candidate computing node the available computing node that would optimize a value of a performance parameter for the application if the application were to be deployed on that computing node, based on the measured and predicted values of the performance parameter for the kernel of the application. Optimize may for example comprise maximizing a value of the performance parameter, or minimizing a value of the performance parameter, according to the nature of the performance parameter. Through selection of the performance parameter, and the definition of how that performance parameter is to be optimized for the application as a whole, the definition of what is a preferred or “best” computing node for deployment of the application may be adjusted to conform to a target or priority of a network manager, network operator, application owner, etc. Such a target might be to minimize execution time for the application, to minimize energy consumption for execution of the application, etc.

Steps that may be carried out as part of the selection of the candidate computing node in step 230 are illustrated in Figure 2c.

Referring now to Figure 2c, in order to select a candidate computing node, the management node may first assemble a set of available computing nodes from which to select. As illustrated at 230ai and 230aii, the set may comprise computing nodes that have available computing capacity and comprise computing resources that are consistent with requirements of the application. For example, if one or more kernels of the application can only be offloaded to a certain type of computing resource, the assembled set may contain only computing nodes which have available capacity and a computing resource of the required type.

In step 230b, the management node uses a mode selection function to determine whether to perform a guided selection or an exploration selection of a candidate computing node. As illustrated at 230bi, the mode selection function causes the probability of performing a guided selection to increase as a number of computing resource types within the network that satisfy a condition for having unknown characteristics decreases. In this manner, the mode selection function balances exploration of the possibilities of using available computing resources whose characteristics are unknown, against exploiting available knowledge of computing resources whose characteristics are known, in order to obtain an optimal deployment.

In some examples, the condition for a computing resource type to be considered as having unknown characteristics may comprise that kernels that have been deployed on computing resources of the computing resource type appear in a number of kernel clusters that is below a threshold value. The threshold value may be set by a network operator or administrator according to the number of resource types in the network, number of kernels that have been deployed in the network, number of clusters, etc. For example, if the threshold value were to be set at 50%, then if kernels that have been deployed on computing resources of the resource type in question appear in less than 50% of kernel clusters, then the resource type would be considered to have unknown characteristics. It will be appreciated that using the number of clusters in which kernels that have been deployed on resources of a particular type appear ensures that a resource type is considered to have known characteristics only when performance of kernels that conform to a wide range of performance behavior profiles have been observed on resources of that type. For example, if only kernels that belong to a single cluster, and which consequently all exhibit similar patterns of performance variation, have been deployed on a particular resource type, it is possible that the characteristics of that resource type may be misrepresented by the available data, and that kernels that have a different behavior pattern for performance variation will perform quite differently on resources of that resource type.

Referring still to Figure 2c, if the mode selection function indicates a guided selection, the management node then checks, at step 230c, whether measured or predicted values of the performance parameter (as obtained and generated in steps 210 and 220) are available for all computing resources of at least one available computing node.

If measured or predicted values of the performance parameter are available for all computing resources of at least one available computing node, the management node performs a guided selection in step 230d by selecting as the candidate computing node the available computing node that would optimize a value of a performance parameter for the application if the application were to be deployed on that computing node, based on the measured and predicted values of the performance parameter for the kernel of the application.

If measured or predicted values of the performance parameter are not available for all computing resources of at least one available computing node, or if the mode selection function indicates an exploration selection, the management node performs an exploration selection at step 230e by randomly selecting as the candidate computing node an available computing node comprising at least one computing resource for which measured values of the performance parameter during execution of the kernel are not available.

Referring now to Figure 2b, following selection of a candidate computing node for deployment of the application, the management node then, in step 232, determines an offload configuration for deployment of the application on the candidate computing node. The offload configuration comprises a mapping between the one or more kernels of the application and one or more computing resources of the candidate computing node. As illustrated at 232i, the offload configuration may further comprise configuration settings for the mapped computing resource. Steps that may be carried out during the process of determining an offload configuration for deployment of the application on the candidate computing node are illustrated in Figures 2d and 2e.

Referring initially to Figure 2d, the management node initially checks in step 232a whether measured or predicted values of the performance parameter during execution of the one or more kernels are available for all computing resources of the candidate computing node. If measured or predicted values of the performance parameter during execution of the one or more kernels are available for all computing resources of the candidate computing node, the management node then identifies, in step 232b, the offload configuration that optimizes a value of a performance parameter for the application, based on the measured and predicted values of the performance parameter for the kernel of the application. This may comprise identifying for example the mapping between kernels and computing resources, as well as computing resource configurations, that, for instance, minimizes the overall execution time for the application, or minimizes the overall energy expended during execution of the application. This identification may be implemented using a rules-based procedure, a machine learning model that has been trained on historical network data, etc.

If measured or predicted values of the performance parameter during execution of the kernel are not available for all computing resources of the candidate computing node, the management node first identifies, at step 232c, pairs of kernel and computing resource for which measured or predicted values of the performance parameter are not available. The management node then assembles a list of possible offload configurations comprising at least one of the identified pairs in step 232d. In step 232e, the management node performs tests of possible offload configurations from the assembled list. The management node may test all or only some of the configurations, depending upon the number of configurations, time constraints etc. In some examples, the configurations may be clustered or categorized into species, and only configurations from a certain cluster or species may be tested, or a representative configuration from each of a plurality of clusters or species may be tested. Selection of configurations for testing is discussed in further detail below with reference to Figures 10 and 11.

As illustrated at step 232e, testing an offload configuration comprises deploying the application on the candidate computing node in accordance with the offload configuration, and running a testing suite for the application.

The management node measures values of the performance parameter for the kernel during the tests in step 232f, and then proceeds to identify the offload configuration that optimizes a value of a performance parameter for the application, based on the measured (previously obtained and from the tests of step 232e) and predicted values of the performance parameter for the kernel of the application. The measured values from the tests may also be stored in a suitable repository for use in a subsequent iteration of the method 200, either for deployment of the same application or for use in clustering for similarity assessment in connection with deployment of a different application. Referring still to Figure 2d, the management node then checks in step 232g whether the offload configuration has been identified using at least one predicted value of the performance parameter. If a predicted value of the performance parameter has not been used, the management node proceeds to step 234 of the method, illustrated in Figure 2b.

Referring now to Figure 2e, if the offload configuration has been identified using at least one predicted value of the performance parameter, the management node then performs a test of the identified offload configuration on the candidate computing node in step 232h. As discussed above, testing an offload configuration comprises deploying the application on the candidate computing node in accordance with the offload configuration and running a testing suite for the application. The management node measures values of the performance parameter for the kernel during the test at step 232i, and stores the measured values in a repository at step 232j. As discussed above, such stored values may be used in subsequent iterations of the method 200.

In step 232k, following testing, the management node compares the predicted values of the performance parameter to the measured values of the performance parameter during the test, and the management node may then update the mapping between performance parameter values of different kernels in the kernel cluster containing the kernel of the application on the basis of the comparison in step 232I.

Referring again to Figure 2b, having determined an offloading configuration for the one or more kernels of the application, the management node then, in step 234, checks whether deployment of the application in accordance with at least one determined offload configuration on the candidate computing node will fulfill a performance requirement for the application. As illustrated at 240i, this may comprise checking that at least one determined offload configuration for deployment of the application on the candidate computing node will fulfil the performance requirement for the application.

If no offload configuration has been determined that will enable deployment of the application on the candidate computing node to fulfill a performance requirement for the application, the management node then returns to step 230 in order to select a new candidate computing node for deployment of the application, based on the measured and predicted values of the performance parameter. If at least one offload configuration has been determined that will enable deployment of the application on the candidate computing node to fulfill a performance requirement for the application, the management node then initiates deployment of the application on the selected candidate computing node in accordance with the determined offload configuration. This initiation may for example comprise providing the offload configuration to the application, for example as an updated configuration file or via communication with the application through an API.

As discussed above, the methods 100 and 200 may be performed by a management node, and the present disclosure provides a management node that is adapted to perform any or all of the steps of the above discussed methods. The management node may be a physical or virtual node, and may for example comprise a virtualised function that is running in a cloud, edge cloud or fog deployment. The management node may for example comprise or be instantiated in any part of a logical core network node, network management centre, network operations centre, radio access node etc. Any such communication network node may itself be divided between several logical and/or physical functions, and any one or more parts of the management node may be instantiated in one or more logical or physical functions of a communication network node.

Figure 3 is a block diagram illustrating an example management node 300 which may implement the method 100 and/or 200, as illustrated in Figures 1 and 2a to 2e, according to examples of the present disclosure, for example on receipt of suitable instructions from a computer program 350. Referring to Figure 3, the management node 300 comprises a processor or processing circuitry 302, and may comprise a memory 304 and interfaces 306. The processing circuitry 302 is operable to perform some or all of the steps of the method 100 and/or 200 as discussed above with reference to Figures 1 and 2a to 2e. The memory 304 may contain instructions executable by the processing circuitry 302 such that the management node 300 is operable to perform some or all of the steps of the method 100 and/or 200, as illustrated in Figures 1 and 2a to 2e. The instructions may also include instructions for executing one or more telecommunications and/or data communications protocols. The instructions may be stored in the form of the computer program 350. In some examples, the processor or processing circuitry 302 may include one or more microprocessors or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, etc. The processor or processing circuitry 302 may be implemented by any type of integrated circuit, such as an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), etc. The memory 304 may include one or several types of memory suitable for the processor, such as read-only memory (ROM), random- access memory, cache memory, flash memory devices, optical storage devices, solid state disk, hard disk drive, etc.

Figure 4 illustrates functional modules in another example of management node 400 which may execute examples of the methods 100 and/or 200 of the present disclosure, for example according to computer readable instructions received from a computer program. It will be understood that the modules illustrated in Figure 4 are functional modules, and may be realised in any appropriate combination of hardware and/or software. The modules may comprise one or more processors and may be integrated to any degree.

Referring to Figure 4, the management node 400 is for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources. The management node comprises a receiving module 410 for obtaining measured values of a performance parameter during execution of the kernel on computing resources of available computing nodes in the network. The management node 400 further comprises a prediction module 420 for generating predicted values of the performance parameter for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available. The management node further comprises a selection module 430 for selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application, and a deployment module 440 for initiating deployment of the application on the selected candidate computing node. The prediction module 420 comprises a retrieving module 422 for obtaining measured values of the performance parameter during execution of kernels of applications that have been deployed on computing nodes in the network on computing resources of computing nodes, which computing resources are of a first computing resource type, and a clustering module 424 for clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources. The prediction module 420 further comprises a mapping module 426 for determining a mapping between performance parameter values of different kernels in the kernel cluster containing the kernel of the application, and for using the determined mapping to predict a value of the performance parameter during execution of the kernel of the application on a computing resource of an available network computing node for which measured performance parameter values are not available and which is of the first computing resource type. The management node 400 may further comprise interfaces 450 which may be operable to facilitate communication with computing and/or other network nodes over suitable communication channels.

Figures 1 to 2e discussed above provide an overview of methods which may be performed according to different examples of the present disclosure. These methods may be performed by a management node, as illustrated in Figures 3 and 4, and enable automation of the deployment and configuration of hardware-accelerated applications. This automation includes the selection of the computing node on which the application will be deployed, and may include finding a suitable offloading configuration for the application, while taking into account the capabilities of the computing node. As discussed above, the offloading configuration maps the kernels of a hardware- accelerated application to the computing resources (multi-core CPU, GPU, FPGA, etc.) of the computing node. An offloading configuration can be represented as a list of key- value pairs, in which the key is a kernel name as given by the application owner, and the value is an identification of the computing resource (multi-core CPU, GPU, FPGA etc.) on which the kernel will be executed in order for the application to achieve optimal performance.

There now follows a detailed discussion of how different process steps illustrated in Figures 1 to 2e and discussed above may be implemented, for example by management nodes 300 or 400.

Figure 5 illustrates an example implementation architecture for the methods 100, 200. Figure 5 shows a heterogeneous setting such as an edge computing environment, which comprises a variety of different types of computing node having different computing resources. For example, one computing node may have a CPU and a GPU, while another computing node may have a CPU, multiple GPUs, and an FPGA. The computing resources may be provided by different vendors and have different amounts of resources available. Referring to the nodes illustrated in the Figure, the resources available at an Internet of Things (loT) device node are likely to be significantly more limited than those available at a server node in a cloud or edge cloud deployment, and a large proportion of those resources may already be in use, and consequently not available for application deployment.

Figure 5 shows an application management platform 500 which is responsible for the deployment and configuration of hardware-accelerated applications. The platform 500 comprises a management node 510, and three repositories 520, 530 and 540.

In order to deploy an application, a user provides the management platform 500 with a deployment package through a standard interface such as the application programming interface (API) or command-line interface (CLI). The deployment package comprises application binaries, one or more test suites, and a manifest file. The manifest file contains information which the management node 510 may use in deployment of the application. The test suites contain test cases that are executed to simulate a realistic workload of the application in order to measure the execution time, or other parameters, of the application and its kernels using a specific offloading configuration on a certain computing node. The management node 510 may use the measured execution time data from running the test suite in selecting the computing node on which the application will be deployed, as well as in determining the best offloading configuration to use.

The manifest file may consequently contain:

Application identification information such as name and version,

Name and version of the kernels in the application that can be offloaded,

Types of accelerator devices (e.g., GPU, FPGA, ...) that a kernel can be offloaded to, and

One or more application performance requirements, defined as an upper or lower bound on a performance parameter for the application, such as execution time.

As discussed above, the management node 510 is responsible for the deployment and configuration of the application, which may include the following tasks:

Select the computing node on which the application will be deployed.

Determine the offloading configuration for the application and selected node, and ensure that the application performance meets requirements.

Deploy the application and configure it.

Perform kernel similarity assessment to find similar kernels from a set of all kernels managed by the platform, to enable prediction of performance of kernels of interest. Collect the execution time of the kernels. Run test suites to evaluate the performance of the application using a certain offloading configuration.

The three repositories of the management platform 500 are:

The applications repository 520 that stores application deployment packages, as well as selected computing nodes and determined offloading configurations for the applications,

The monitoring repository 530 that stores the execution time collected from the deployed applications and their kernels during deployment or testing, and

The resources repository 540 that contains information about the available computing nodes in the network and their computing resources.

Kernel Similarity (steps 120 and 220 of the methods 100, 200)

In order to determine the computing node that yields the best performance for a given hardware-accelerated application, one option would be to test the application on all nodes using all possible offloading configuration permutations. However, as discussed above, in a large-scale and heterogeneous environment, there are many node types for which the performance needs to be measured before making a choice, and extensive testing is highly inefficient as well as incurring a high management overhead. Performance prediction is consequently a preferred option. Existing approaches to performance prediction seek to understand on some level of abstraction what the kernel is doing, and predict performance based on performance of kernels that seek to perform similar tasks at the chosen level of abstraction. In contrast, examples of the present disclosure base kernel similarity assessment on observed data of historical kernel performance on different computing resources of a given type. Examples of the present disclosure use the data collected from previously executed kernels of other applications to estimate the performance of the kernels of the application of interest, so avoiding testing the application to measure its performance, and also avoiding the requirements for source code, specific programing languages, specialized profiling tools etc., which are associated with known approaches to assessing kernel similarity for performance prediction.

Examples of the present disclosure consider two kernels to be similar if a measured performance parameter for the kernels (such as execution time) exhibits a coherent variation pattern on a set of computing resources of the same type. Resource types may include GPU, FPGA, multi-core CPU, etc. A coherent pattern means that the performance parameter of the two kernels exhibits fluctuation of a similar shape, thus it rises and falls for each kernel coherently when the used computing resource changes. The pattern of performance parameter variation captures the shifting and scaling correlations exhibited by kernels. When a set of kernels are identified to have similar performance behavior, the management node 510 can use the kernels whose execution time is measured on a certain computing resource to predict the performance of kernels (in the same set) with unknown performance on that resource.

As an example, Figure 6 shows the measured execution time for three kernels over different models of GPU. The three kernels all have a measured execution time for GPU- 1 , GPU-2, GPU-3, and GPU-4, whereas only kernel-1 and kernel-3 have a measured execution time for GPU-5. Figure 6 indicates that all kernels have different execution time values for all GPUs. However, the performance behavior of kernel-1 and kernel-2 shows a coherent pattern, as their execution times over GPU-1, GPU-2, GPU-3, and GPU-4, vary in a similar manner (reducing when executed on GPU-2 when compared to execution on GPU-1, and increasing to a highest level when executed on GPU-3). Kernel-1 and kernel-2 may therefore be considered to be similar, and are assigned to the same kernel similarity cluster, as illustrated in Figure 6. The performance of kernel- 1 can therefore be used to predict the performance of kernel-2 when executed on GPU- 5. For kernel-1 , the execution time is higher on GPU-5 than it is on GPU-4, so it may be assumed that kernel-2 will also have higher execution time on GPU-5 compared to GPU- 4. By analyzing the data points of kernel-1 and kernel-2, it may be inferred that shifting the data points of kernel-1 by a constant of +10 results the data points of kernel-2. The execution time of kernel-2 on GPU-5 may therefor be predicted to be 37 seconds, given that the execution time of kernel- 1 is 27 seconds.

Although the kernel similarity concept is presented above using execution time as an example performance parameter, it will be appreciated that other performance parameters may be used to assess similarity. For example, if the energy consumption of two kernels shows a fluctuation of a similar shape (or coherent pattern) over a set of computing resources of the same type, then these two kernels can be considered to be similar in terms of energy consumption. This similarity can then be used to predict the energy consumption of one kernel using the measured energy consumption data for the second kernel. Pattern-based clustering algorithms can be used to identify kernels that produce a similar execution pattern over a variety of devices. These algorithms group the input (kernels) into clusters based on their performance behavior pattern. The pattern-based clustering presented in H. Wang and J. Pei, “Clustering by Pattern Similarity,” J. Comput. Sci. Technol., vol. 23, no. 4, pp. 481^96, Jul. 2008, doi: 10.1007/s11390-008-9148-5 is an example algorithm that can be used in an implementation of examples of the present disclosure to create clusters of similar kernels.

Different processes can be used to predict unknown values from kernel clusters. One option is to calculate a transformation function between the measured values for each kernel pair in the cluster (i.e. the transformation that maps the points of Kernel 1 to Kernel 2 in Figure 6). It may be expected that this transformation will be linear in most cases, meaning the transformation may be relatively simple to identify and use. However, in some cases the relationship between measured values for different kernels may be non linear, and consequently more computationally intensive techniques may be used to find the mapping between data points for pairs of kernels. These techniques may include Machine Learning models such as Support Vector Machines. As kernel clusters may contain many kernels, in one implementation, it may be appropriate to predict a value for the performance parameter of a target kernel on a specific computing resource on the basis of the transfer function or mapping from all other kernels in the cluster for which the a measured value of the performance parameter on the specific computing resource is available. The final prediction for the target kernel may then be a function of all of the predictions, for example an average or weighted average. In other examples, a Machine Learning model that takes as input all of the available measured performance parameter values from kernels in the cluster and maps these to a single predicted output may be envisaged. It will be appreciated that in circumstances in which there is no measured performance parameter value for any kernel in a cluster when executed on a specific computing resource, it will not be possible to predict performance for any kernel in that cluster when executed on that resource.

Figures 7 to 12 are flow charts illustrating an example implementation of the methods 100, 200 using the management platform 500.

Application Deployment Figure 7 illustrates at a relatively high level of abstraction the steps that may be carried out to deploy a hardware-accelerated application using an implementation of the methods 100, 200. In the implementation of Figures 7 to 12, the generation of predicted performance parameter values is performed during the step of selecting a computing node, but it will be appreciated that this step of generating similarity clusters of kernels and predicting performance parameter values may be performed before the start of computing node selection, or at any other suitable time during execution of an implementation of the methods 100, 200.

Referring to Figure 7, in a first step 730, the management node selects the computing node on which the application will be deployed. The selected computing node should satisfy the placement constraints (if any) set out in the relevant manifest file for the application to be deployed, including for example particular types on computing resource on which particular kernels of the application should be executed. For example, if one of the kernels in the application can only be executed on FPGA, then the chosen computing node should be equipped with an FPGA. Computing node selection is described in greater detail below with reference to Figure 8. If no computing node is available that fulfils the application requirements, at step 731, the management node determines that the deployment has failed.

Following computing node selection, the management node determines, in step 732, an offloading configuration which maps the application’s kernels to the computing resources available on the selected node in a manner so as to attain the optimal application performance on that node. This step is discussed in further detail with reference to Figure 9.

In step 734, the management node evaluates the performance of the application using the determined offloading configuration in order to ensure that any user provided performance requirement in the manifest file is fulfilled. If the application performance fails to meet the requirement, then the management node returns to step 730 and tries to select another computing node.

If the selected computing node and determined offloading configuration pass the performance evaluation at step 734, the management node informs the application of the generated offloading configuration. This may comprise, as illustrated at step 740a, generating a configuration file that contains the offloading configuration. The management node then deploys and starts the application, and the application applies the offload configuration so that the kernels will be executed on the computing resources assigned by the management node. In other examples, the application might communicate with the management platform and management node through an API, and may be informed via the API of the offloading configuration.

Computing Node Selection (Steps 130, 230 of the methods 100, 200)

Figure 8 illustrates in additional detail the step 730 of selecting a computing node (an implementation of steps 130, 230 of the methods 100, 200). The management node first retrieves a list in step 830a from the resources repository 540 of nodes that meet the placement constraints of the application to be deployed, including for example resource availability and type of accelerators installed on the node (if available).

Providing at least one node is present in the list, the management node then determines at step 830b whether to operate in guided mode or exploration mode. The management node chooses between the two modes using a probability function which will converge over time to always choose the guided mode if the infrastructure resources do not change. The probability function defines the probability of selecting the exploration mode as a percentage of computing node types in the infrastructure with unknown performance characteristics. In other words, the probability of selecting guided mode increases as the number of node types in the infrastructure whose performance characteristics are unknown decreases. In this manner, the management node avoids getting trapped in a local optima, as even if the management node may know that a certain application performs well when it is deployed on specific node type, there can be other node types that have never been tested with the application and which may deliver better performance. By using exploration mode, the management node can discover how an application will perform on all distinct node types in the infrastructure, and consequently can select the node type that delivers the best performance and meets the requirements of user.

The performance characteristics of a node type are considered unknown if the percentage of kernel similarity clusters in which kernels of applications which have been deployed on that node type appear is less than a threshold percentage of the total number of kernel similarity clusters. The value of the threshold controls the amount of performance data collected from each node type in the exploration mode, which consequently affects the efficiency of the kernel similarity method. Considering an example in which the threshold is set to 50%, and the kernels of all applications in the system can be grouped into 5 distinct similarity clusters, if all kernels that have been deployed on a certain node type belong to only two similarity clusters, then this node type is considered to have unknown performance characteristics, as the two clusters represent 40% of all clusters, which is less than the 50% threshold. This approach allows the management node, through the exploration mode, to collect sufficient performance data from each node type to ensure that the kernel similarity method is efficient. The threshold value may be set by an operator, and may for example be set to an initial default value, and be updated in accordance with operator priorities for exploration vs exploitation in the deployment of applications on computing nodes in the network.

As illustrated in Figure 8, when the exploration mode is selected, the management node will randomly choose a node from the previously retrieved list of nodes at step 830e, ensuring that the selected node type has unknown performance characteristics.

When the management node operates in guided mode, it first checks using the performance data from the monitoring repository 530, whether all kernel-resource pairs for the application have been tested for the node types of all the listed nodes (step 830ci). If all the data is available, the best node is selected at step 830di. If the data does not contain the execution time (or other performance parameter values) for all kernel- resource pairs, then the management node tries to predict the performance of kernel- resource pairs with unknown execution time in step 820, using the kernel similarity method detailed above and illustrated in Figure 1. The kernel similarity can predict a missing data point if there is at least one similar kernel in system (i.e. if the kernel similarity cluster for the kernel has at least two kernels), and if that similar kernel has a measurement for the resource in question. After applying kernel similarity, if the management node has a complete view at step 830cii of the performance of all kernels on all computing resources available for at least one node (through either measured or predicted data), then the management node will select the node that provides the best overall application performance at step 830dii. If there is no such node in the list, the management node reverts to exploration mode, selecting a node with unknown characteristics at step 830e.

Offloading Configuration (step 232 of the method 200) Figure 9 illustrates in additional detail the step 732 of determining an offloading configuration for the selected computing node (an implementation of step 232 of the method 200). The simplest scenario is when the execution time (or other performance parameter values) of all kernels in the application is available for all computing resources on the node, as checked at step 932a. The execution time may be collected from either (or both) measuring the performance of previous deployments of the application on the same node type, or predicting the performance using kernel similarity as discussed above. In this scenario, the management node can map the kernels to computing resources in step 932b using existing task assignment and scheduling algorithms that have the objective of minimizing the application execution time.

When the execution time is not known or predicted for one or more kernel-resource pairs, then the management node first creates the list of kernel-resource pairs that do not have execution time in the monitoring repository in step 932c. The management node then uses this list to generate the offloading configuration permutations that will be used to test the application and collect the missing performance data in step 932d. In step 932e+f, the management node runs the test for generated offloading configurations and collects the missing execution time data. The process of steps 932c to 932e+f is illustrated in greater detail in Figures 10 and 11.

Figures 10 and 11 illustrate steps 932c to 932e+f with two worked examples. Referring initially to Figure 10, an example in which an application comprising 2 kernels is to be deployed on a selected candidate computing node having 3 computing resources. No measured or predicted data is available. In step 932c, the management node assembles the list of kernel-computing resource pairs for which no data is available. In the illustrated example, this results in a list of 6 pairs. In step 932d, the management node generates the possible offloading configurations that include these pairs. Testing all of the possible configurations may be inefficient, and so the management node may select a subset of the possible offloading configurations to test. In the illustrated example, the management node has selected the three configurations that are the most different. Following testing of the selected configurations, the management node selects one single configuration as offering optimal performance for the application.

Referring now to Figure 11, the same example of an application comprising 2 kernels to be deployed on a selected candidate computing node having 3 computing resources is illustrated, however in Figure 11 , partial data is available. It is assumed that the candidate computing node is node A, having GPU1, GPU2 and GPU3, and that testing data of similar kernels is available for a different node of the same type Node B, having GPU1. Predicted data is therefore available for kernel 1 and kernel 2 on GPU1 of Node A. In step 932c, the management node assembles the list of kernel-computing resource pairs for which no data is available. In the illustrated example, this results in a list of 4 pairs. In step 932d, the management node generates the possible offloading configurations that include only the 4 pairs with unknown data. T esting all of the possible configurations may still be inefficient or too time consuming, and so the management node may select a subset of the possible offloading configurations to test. In the illustrated example, the management node has selected the two configurations that are the most different. Following testing of the selected configurations, the management node selects one single configuration as offering optimal performance for the application.

Referring again to Figure 9, having selected an offloading configuration, the management node checks whether predicted data has been used to select the offloading configuration at step 932g. If predicted performance values have been used, then the management node tests the offloading configuration at step 932h+l. This testing allows the management node to evaluate the accuracy of prediction of performance data by comparing the predicted execution time values with those collected by running a test suite for the application and measuring its performance. This testing also ensures that the selected offloading configuration will deliver the anticipated performance and meet the user’s requirement

As discussed above, the management node may test specific offloading configurations in order to collect the execution time of the application and its kernels. Figure 12 illustrates an example implementation of a testing process (steps 232e and 232h of the method 200). In step 1251, the management node installs the application binaries on the selected node, if the application is not already deployed on the node. The management node then generates the configuration file that contains the offloading configuration in step 1252. The application is then ready to be tested. The management node starts collecting performance metrics in step 1253, runs the test suite or suites in step 1254, collecting the execution time metrics, and finally stores the collected metrics in the monitoring repository 530.

Examples of the present disclosure thus propose methods for automating the deployment and configuration of hardware-accelerated applications in a network environment such as an edge cloud of fog deployment. When an application is submitted to for deployment, the management node selects a computing node to host the application from a set of what may be highly heterogeneous computing nodes. The management node identifies the offloading configuration of the application that enables the application to deliver the optimal performance, as defined by the application owner of network operator, on that computing node. The management node collects the execution time of the application and its kernels and stores them in a monitoring repository.

The management node can operate in two modes to select the node which will host the application:

Exploration mode, which enables the management node to discover how an application will perform on different node types in the network. Thus, the management node can build a global view about the performance characteristics of every node type. This enables the management node to choose the node that provides the best performance for the application of interest.

Guided mode, which allows the management node to perform deterministic node selection, when possible, based on the execution time that is collected by either (or both) measuring the performance of previous application’s deployment (on the same node type), or predicting the performance using kernel similarity.

The management node can choose between the two modes using a probability function that ensures that the probability of selecting guided mode increases as the number of node types with unknown performance characteristics in the infrastructure decreases.

Kernel similarity is used to group the kernels deployed on the platform into clusters based on their performance pattern. The measured performance of the kernels in a cluster can then be used to predict the performance of kernels in the same cluster with unknown performance characteristics.

The example nodes and management platform disclosed herein may be implemented and used within any distributed or centralised network system. The nodes and platform can be implemented in a single module or can be distributed through different nodes in different interconnected modules.

The methods of the present disclosure may be implemented in hardware, or as software modules running on one or more processors. The methods may also be carried out according to the instructions of a computer program, and the present disclosure also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the disclosure may be stored on a computer readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.

It should be noted that the above-mentioned examples illustrate rather than limit the disclosure, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims or numbered embodiments. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim or embodiment, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims or numbered embodiments. Any reference signs in the claims or numbered embodiments shall not be construed so as to limit their scope.

Claims

1. A computer implemented method (100) for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources, the method, performed by a management node of the network, comprising: obtaining measured values of a performance parameter during execution of the kernel on computing resources of available computing nodes in the network (110); for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available, generating predicted values of the performance parameter (120); selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application (130); and initiating deployment of the application on the selected candidate computing node

(140); wherein generating predicted values of the performance parameter comprises: for kernels of applications that have been deployed on computing nodes of the network (120c): obtaining measured values of the performance parameter during execution of the kernels on computing resources of computing nodes, which computing resources are of a first computing resource type (120a); and clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources (120b); and for the kernel cluster containing the kernel of the application (120e), determining a mapping between performance parameter values of different kernels in the kernel cluster (120d); and using the determined mapping to predict a value of the performance parameter during execution of the kernel of the application on a computing resource of an available computing node of the network for which measured performance parameter values are not available and which is of the first computing resource type (120f).

2. The method of claim 1, wherein selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application comprises: selecting as the candidate computing node the available computing node that would optimize a value of a performance parameter for the application if the application were to be deployed on that computing node, based on the measured and predicted values of the performance parameter for the kernel of the application (230i).

3. The method of claim 1 or 2, wherein selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application comprises: assembling a set of available computing nodes from which to select (230a), wherein the set comprises computing nodes that: have available computing capacity (230ai); and comprise computing resources that are consistent with requirements of the application (230aii).

4. The method of any one of claims 1 to 3, wherein selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application comprises: performing at least one of a guided selection or an exploration selection, wherein a guided selection comprises: selecting as the candidate computing node the available computing node that would optimize a value of a performance parameter for the application if the application were to be deployed on that computing node, based on the measured and predicted values of the performance parameter for the kernel of the application (230d).

5. The method of claim 4, wherein an exploration selection comprises: randomly selecting as the candidate computing node an available computing node comprising at least one computing resource for which measured values of the performance parameter during execution of the kernel are not available (230e).

6. The method of claim 4 or 5, wherein selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application further comprises: using a mode selection function to determine whether to perform a guided selection or an exploration selection (230b), wherein the mode selection function causes the probability of performing a guided selection to increase as a number of computing resource types within the network that satisfy a condition for having unknown characteristics decreases (230bi).

7. The method of claim 6, wherein, for a computing resource type, the condition for having unknown characteristics comprises: kernels that have been deployed on computing resources of the computing resource type appearing in a number of kernel clusters that is below a threshold value.

8. The method of any one of claims 4 to 7, wherein selecting, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application further comprises: if measured or predicted values of the performance parameter are not available for all computing resources of at least one available computing node (230c), performing an exploration selection (230e).

9. The method of any one of claims 1 to 8, further comprising: determining an offload configuration for deployment of the application on the candidate computing node, the offload configuration comprising a mapping between the kernel of the application and a computing resource of the candidate computing node (232); and wherein initiating deployment of the application on the selected candidate computing node comprises initiating deployment in accordance with the determined offload configuration (240).

10. The method of claim 9, wherein the offload configuration further comprises configuration settings for the mapped computing resource (232i).

11. The method of claim 9 or 10, wherein determining an offload configuration for deployment of the application on the candidate computing node comprises: if measured or predicted values of the performance parameter during execution of the kernel are available for all computing resources of the candidate computing node (232a), identifying the offload configuration that optimizes a value of a performance parameter for the application, based on the measured and predicted values of the performance parameter for the kernel of the application (232b).

12. The method of claim 11, wherein determining an offload configuration for deployment of the application on the candidate computing node further comprises, if the offload configuration has been identified using at least one predicted value of the performance parameter (232g): performing a test of the identified offload configuration on the candidate computing node (232h); measuring values of the performance parameter for the kernel during the test (232i); and storing the measured values in a repository (232j).

13. The method of claim 12, further comprising: comparing the predicted values of the performance parameter to the measured values of the performance parameter during the test (232k); and updating the mapping between performance parameter values of different kernels in the kernel cluster containing the kernel of the application on the basis of the comparison (232I).

14. The method of any one of claims 9 to 13, wherein determining an offload configuration for deployment of the application on the candidate computing node comprises: if measured or predicted values of the performance parameter during execution of the kernel are not available for all computing resources of the candidate computing node (232a): identifying pairs of kernel and computing resource for which measured or predicted values of the performance parameter are not available (232c); assembling a list of possible offload configurations comprising at least one of the identified pairs (232d); performing tests of possible offload configurations from the assembled list (232e); measuring values of the performance parameter for the kernel during the tests (232f); and identifying the offload configuration that optimizes a value of a performance parameter for the application, based on the measured and predicted values of the performance parameter for the kernel of the application (232b).

15. The method of any one of claims 12 to 14, wherein testing an offload configuration comprises: deploying the application on the candidate computing node in accordance with the offload configuration; and running a testing suite for the application (232h).

16. The method of any one of claims 1 to 15, further comprising: checking whether deployment of the application on the candidate computing node will fulfill a performance requirement for the application (234); and if deployment of the application on the candidate computing node will not fulfill a performance requirement for the application, selecting a new candidate computing node for deployment of the application, based on the measured and predicted values of the performance parameter (230).

17. The method of claim 16, when dependent on any one of claims 9 to 15, wherein checking whether deployment of the application on the candidate computing node will fulfill a performance requirement for the application comprises checking whether at least one determined offload configuration for deployment of the application on the candidate computing node will fulfill the performance requirement for the application (234i).

18. A computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform a method as claimed in any one of claims 1 to 17.

19. A management node (300) for managing deployment of an application on a computing node of a network, wherein the application comprises at least one kernel and the computing node comprises a plurality of computing resources, the management node comprising processing circuitry (302) configured to cause the management node to: obtain measured values of a performance parameter during execution of the kernel on computing resources of available computing nodes in the network; for computing resources of available computing nodes for which measured performance parameter values during execution of the kernel are not available, generate predicted values of the performance parameter; select, based on the measured and predicted values of the performance parameter, a candidate computing node for deployment of the application; and initiate deployment of the application on the selected candidate computing node; the processing circuitry is further configured to cause the management node to generate predicted values of the performance parameter by: for kernels of applications that have been deployed on computing nodes of the network: obtaining measured values of the performance parameter during execution of the kernels on computing resources of computing nodes, which computing resources are of a first computing resource type; and clustering the kernels according to a pattern of variation of the measured performance parameter values for each kernel with execution on different computing resources, and for the kernel cluster containing the kernel of the application, determining a mapping between performance parameter values of different kernels in the kernel cluster; and using the determined mapping to predict a value of the performance parameter during execution of the kernel of the application on a computing resource of an available computing node of the network for which measured performance parameter values are not available and which is of the first computing resource type.

20. The management node as claimed in claim 19, wherein the processing circuitry is further configured to cause the management node to carry out the steps of any one or more of claims 2 to 17.