WO2017028930A1 - Methods and apparatus for running an analytics function - Google Patents

Methods and apparatus for running an analytics function Download PDF

Info

Publication number
WO2017028930A1
WO2017028930A1 PCT/EP2015/069173 EP2015069173W WO2017028930A1 WO 2017028930 A1 WO2017028930 A1 WO 2017028930A1 EP 2015069173 W EP2015069173 W EP 2015069173W WO 2017028930 A1 WO2017028930 A1 WO 2017028930A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
analytics function
network
nodes
cost
Prior art date
Application number
PCT/EP2015/069173
Other languages
French (fr)
Inventor
Tony Larsson
Nicolas Seyvet
Ignacio MULAS VIELA
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2015/069173 priority Critical patent/WO2017028930A1/en
Publication of WO2017028930A1 publication Critical patent/WO2017028930A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity

Definitions

  • This disclosure relates to a method and apparatus for running an analytics function.
  • it relates to determining an optimal node in a distributed network on which to run an analytics function.
  • frameworks such as HadoopTM and SparkTM. Alternatively they may be statistically provisioned/deployed in specific nodes.
  • frameworks can increase network overheads as data has to be copied from one or more storage nodes and moved through the network to a central node for processing.
  • a method in a distributed network for determining an optimal node on which to run an analytics function.
  • the method comprises determining a candidate list of potential nodes on which to run the analytics function.
  • the method further comprises, for each node in the candidate list, calculating at least one parameter for the node relating to the cost of running the analytics function, and determining the optimal node based on the at least one calculated parameter.
  • the step of determining a candidate list of potential nodes comprises determining a preliminary list of potential nodes based on the location of one or more data sources used in the analytics function.
  • the candidate list is determined from the preliminary list using graph analysis methods.
  • the graph analysis method may comprise, for example, shortest path calculations.
  • the analytics function requires data stored on at least a first node and a second node, and wherein the shortest path calculations determine the shortest path between the first node and the second node.
  • the candidate list of potential nodes contains nodes along the shortest path between the first node and the second node.
  • the analytics function further requires data stored on a third node, and wherein the shortest path calculations further determine the shortest path between the first node and the third node and the shortest path between the second node and the third node.
  • the candidate list of potential nodes also contains nodes along the shortest path between the third node and the first node and/or the third node and the second node.
  • the shortest path calculations are weighted according to one or more negotiated link costs between nodes in the network.
  • the method further comprises calculating a total cost of computing the analytics function on each of the nodes in the candidate list, wherein the total cost is a combination of the values of one or more
  • the total cost is a weighted combination of the values of the one or more parameters, and the step of determining the optimal node is based on the weighted combination.
  • the weighting may reflect the frequency that data will need to be accessed on the one or more neighbouring nodes in order to compute the analytics function.
  • the total cost is an average combination of the values of the one or more parameters calculated for the node and/or one or more neighbouring nodes, and the step of determining the optimal node is based on the average combination.
  • the step of determining an optimal node comprises determining a prioritized list of nodes by ranking each candidate node according to its total cost and selecting an optimal node from the prioritized list.
  • the at least one parameter can relate to at least one of:
  • the network node comprises an analytics function location module for determining an optimal node in the distributed network on which to run an analytics function.
  • the analytics function location module is configured to determine a candidate list of potential nodes on which to run the analytics function.
  • the analytics function location module is further configured, for each node in the candidate list, to obtain at least one parameter relating to the cost of running the analytics function and to determine the optimal node based on the at least one parameter.
  • the analytics function location module is configured to store and maintain a list of the locations of data sources in the network.
  • the analytics function location module may be configured to receive one or more updates on the location of data sources in the network and update the list of locations of data sources accordingly.
  • the analytics function location module is further configured to receive one or more parameters relating to the cost of running the analytics function. In some embodiments, the analytics function location module is configured to receive the one or more parameters through gossip communication.
  • a network node in a distributed network comprising a processor and a memory, said memory containing instructions executable by said processor.
  • the network node is operative to determine a candidate list of potential nodes on which to run the analytics function.
  • the network node is further operative to calculate, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function and determine the optimal node based on the at least one calculated parameter.
  • a network node in a distributed network comprising a first module for determining a candidate list of potential nodes on which to run the analytics function; a second module for calculating, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function; and a third module for determining the optimal node based on the at least one calculated parameter.
  • a computer program which, when run on a computer, causes the computer to carry out a method according to any of the embodiments listed above.
  • a computer program product comprising computer readable storage medium and a computer program according to the fifth aspect, stored on the computer readable storage medium.
  • Figure 1 illustrates an example of a method of determining an optimal node on which to run an analytics function, according to an embodiment
  • Figure 2 shows an example distributed network
  • Figure 3 shows an example network node according to an embodiment
  • Figure 4 shows an example network node according to another embodiment.
  • Figure 5 shows an example network node according to a further embodiment. Detailed Description
  • FIG. 1 shows a method according to an embodiment, for determining an optimal node on which to run an analytics function. The method comprises determining a candidate list of potential nodes on which to run the analytics function, step 102.
  • the method further comprises, for each node in the candidate list, calculating at least one parameter for the node relating to the cost of running the analytics function, step 104.
  • an optimal node is determined based on the at least one calculated parameter. In this way, instead of running an analytics function on a centrally located node, the method provides a mechanism for dynamically choosing which node to run the analytics function on. Executing an analytics function at the
  • a distributed network consists of a number of nodes connected in various ways with a number of links.
  • Such a distributed network can be illustrated as a graph, as shown in Figure 2.
  • the circle labelled AC in Figure 2 represents an analytics client (AC).
  • the other circles in Figure 2 represent nodes (or computational resources) that can potentially also run analytics clients or analytics functions.
  • Each straight line represents a network link (or connection) between two nodes.
  • Each node is labelled with a letter (a-y), according to the data stored on it, or streamed though it (e.g. the node labelled a, has data source 'a' on it, or streamed through it).
  • Each node and each link is also labelled with a number representing the cost of using the respective nodes and links when running the analytics function.
  • an Internet of Things (loT) type scenario such as the one in Figure 2, resources can be limited in some parts of the distributed network and this is reflected in the costs.
  • the costs of each node and link are dynamically allocated and can be calculated based on utilization of bandwidth, cpu, ram etc.
  • the costs are dynamically updated to reflect the current status of the nodes and the network.
  • the 'cost' in this sense can be a dimensionless number based on a scale agreed between the nodes in the network.
  • the cost can be the actual monetary cost (e.g. in euros or dollars) of using the node, or another dimensional number, such as the CPU utilisation.
  • Nodes All nodes run an internal process that, based on resource utilization (cpu, ram, i/o), calculates a cost for the node. This cost indicates how utilized the node is at the moment. A low cost means low utilization and a high cost means high utilization. The cost can be a relative, for example, the utilization can be ranked according to an agreed scale such as e.g. 1 to 10. The node and link costs are dynamically updated at each node. This can be done periodically, or based on threshold values for certain metrics (for example cpu, io, ram, nw etc).
  • gossip communication protocols are used in the point-to-point (P2P) networks, which makes them a perfect fit for distributed scenarios.
  • Gossip communication protocols are a way that computers share information across a network using a form of random "peer selection". At specified time intervals, each node in the network picks another node at random and shares any new data or updates e.g. in this case, any updates on calculated node costs. The fact that the chosen nodes are random means that some nodes will receive the data more than once from two or more different nodes.
  • gossip communication protocols are an efficient way of sharing data quickly and efficiently around a distributed network.
  • Link costs Nodes that are directly connected and share a link continuously calculate and agree on a cost for that particular link.
  • the link cost is directional such that there are separate link costs for the uplink and downlink. This is done based on the network metrics available such as bandwidth or latency. If the nodes are the vertices of the graph shown in Figure 2, then the link is the value attached to the edge formed between these two vertices.
  • Link costs can be calculated using a Gather Apply Scatter model: o Gather: Receive information about adjacent vertexes. o Apply: Take computation from gather phase, o Scatter: Update data on adjacent edges.
  • the method 100 as illustrated in Figure 1 can run in the network as follows.
  • the method 100 may run on a node in the network that is configured to run an Analytics Function Location Service (AFLS).
  • AFLS Analytics Function Location Service
  • the AFLS has access to one or more data sources, for example, the AFLS may have access to one or more of (i) a list of nodes in the network (ii) the costs of running the analytics function on each node (ii) a list of link costs for transferring data between each pair of nodes (iii) a list of databases and/or the data sources on each node.
  • the AFLS may keep track of the data sources listed above and may update the data sources (for example, what data is stored on each node) based on updates from the network received, for example, through the gossip
  • the step of determining a candidate list of potential nodes on which to run the analytics function 102 comprises determining a preliminary list of potential nodes based on the location of one or more data sources used in the analytics function. For example, for the analytics function f(a,b,c) the AFLS may generate a preliminary list of nodes based on the location of the data sources a, b and c.
  • the preliminary list of nodes may contain nodes that have a, b or c stored on them, or the candidate list may contain nodes that can accommodate the function "f and have access to a, b and/or c.
  • data source 'a' can be obtained from node a
  • data source 'b' can be obtained from node b
  • data source 'c' can be obtained from node c.
  • the data can be obtained from the respective nodes because, for example, the data is stored on the node, or because the data is being generated and streamed from the node. Therefore, in this example, the preliminary list contains nodes a, b and c.
  • the AFLS will then determine a candidate list from the preliminary list using graph analysis methods.
  • the graph analysis method may comprise shortest path (SP) calculations for all combinations of dependent sources, e.g. calculating all of the shortest paths between the different locations of the data sources a, b and c (a-b, a-c, b-c).
  • SP shortest path
  • the analytics function requires data stored on a first node, a second node and a third node
  • the shortest path calculations determine the shortest path between the first node and the second node, the shortest path between the first node and the third node and the shortest path between the second node and the third node.
  • this may also involve calculating the shortest paths for each repeated data source, for example if there were a second source of data a, a' then the shortest path calculations may determine the shortest paths between (a-b, a-c, b-c, a'-b, a'-c').
  • the shortest path calculations are calculated by the nodes of the network. For example, shortest path calculations can be initiated by the AFLS by sending ComputeSP multicast messages to the first nodes in each unique combination: i. message to a: get shortest path between a-b and a-c
  • a node Once a node receives a ComputeSP message, it will return an OK message and store the AFLS return address, and start a shortest path calculation.
  • the AFLS may calculate the shortest paths itself, based on information available to it about the network and network costs.
  • all information is sent to one node that collects a global view of the network. The information might be received at the node through a dedicated protocol that sends all updates to the central node, or alternatively, through the gossip protocol described above.
  • Shortest path computations can be performed using any already existing algorithm, for example such as Dijkstra's algorithm or the A * (A-star) search algorithm.
  • the shortest path calculations calculate the shortest physical network path between the two nodes, for example, the path(s) that contains the fewest intermediary nodes.
  • the basic shortest path calculations can be modified such that the algorithm is effectively weighted according to the negotiated link costs between the nodes. For example, whilst the path (x->d- >e) in Figure 2, may physically be the same length as the path (x->y->e), the link cost between nodes d and e is higher than the link cost between nodes y and e.
  • link costs may be used as weightings in the shortest path calculations, such that the shortest path calculation would consider (x->y->e) to be 'shorter' than, or preferential to, the path (x->d->e).
  • the results of the shortest path calculation are collected in a format that shows the entire path, including node and link costs. Referring again to Figure 2, the shortest paths for the combinations (a-b, a-c, b-c) are:
  • the results of the shortest path calculations are then sent to the AFLS using a message such as a ResultSP message.
  • the ResultSP message contains a list of all of the nodes along the shortest path between the two nodes (including the end nodes themselves).
  • the AFLS collects all of the nodes along all of the shortest paths and this forms the candidate list of potential nodes on which to run the analytics function.
  • the candidate list 102 may therefore comprise a list of nodes that appear along a shortest path between two nodes that store data needed for the analytics function.
  • the candidate list is sorted so that each node only appears once. According to Figure 2, (a, x, y, b, c) are the unique nodes from the shortest paths between the nodes (a, b, c).
  • the candidate list above is then sorted, using one or more parameters. This may comprise calculating a total cost of computing the analytics function on each of the nodes in the candidate list, wherein the total cost is a combination of the values of one or more parameters calculated for the node and/or one or more neighbouring nodes.
  • the two parameters 'link cost' and 'node cost' can be used to put the candidates in a prioritized order.
  • the end result can then be used to sort the candidates according their total cost: x (5/3), a (6/3), y (6/3), b (7/3), c (7/3).
  • the sorted list above ranks the nodes in order of their link costs (the node with the lowest link cost being ranked the highest).
  • the available resources on each node i.e. the node cost
  • the analytics client can then execute the analytics function according to the prioritised list of nodes.
  • the total cost can also be a weighted combination of the values of the one or more parameters.
  • the link and node costs can be weighted according to how important they are for a particular network or analytics function. The formula to calculate the total cost of running the analytics function on a node a, Cost To tai,a , would then be:
  • COSt T otal,a Weight no de * COStnode, a + Weight
  • in k are the weightings to be applied to the node and link costs respectively. This enables a higher cost to be put on nodes that already are overutilized, by making weight n0 de > weight
  • the examples given above take node costs and link costs into consideration when ranking each node and determining an optimal node on which to calculate the analytics function. More generally, other parameters may also be used, for example, parameters relating to utilisation, such as a node cost or link cost or a measure of network utilization. Alternatively, the one or more parameters may relate to latency, for example, the availability of computing resources, the latency of one or more databases on a node or the speed at which data can be written to and/or read from one or more databases on a node. Costs for the links may be directional (i.e. different costs in different directions depending on load) and thus there may be different parameters describing uplink link cost and downlink link cost for each pair of nodes.
  • the cost in read might depend on the delay of a stream (network capacity) while the cost in write might depend on the disk type of the machine (SSD disk).
  • the choice of which parameters to use may depend on a number of factors, for example, the parameters may be chosen on a network level, such that the AFLS uses the same parameters and cost calculations to determine the optimal nodes for all analytics functions running on the network. For example, if the nodes in a network are generally over-utilised, the equations and calculations above might be modified so as to reduce node utilisation. For instance, the AFLS may calculate the optimal node for each analytics function based solely on the node cost.
  • the parameters may be determined on a network level (i.e. the same parameters may be used for all analytics functions as above) but the parameters chosen may be changed over time, for example, if the AFLS has been calculating the optimal node for each analytics function based solely on the node cost and, over time, node utilisation drops, it may then be preferable for the AFLS to stop calculating the optimal node based on node cost and to use another parameter instead, for example, link utilisation.
  • the parameter(s) may be chosen on a case by case basis for each analytics function.
  • the method may further comprise choosing one or more parameters for the analytics function to use to determine the optimal node on which to run the analytics function. For instance, if a first analytics function requires a node to frequently write to its database, it may be important to choose a node with a low database latency and thus database latency may be used to determine the optimal node.
  • a second analytics function requires particularly resource intensive calculations, it may be more important to choose a node with sufficient memory, and thus a parameter based on CPU may be used to determine the optimal node.
  • different types of costs may be used, depending on the needs of the applications and agreed service level agreements (SLAs). For example, one application might have requirements on maximum latency and therefore a cost model that captures this aspect is needed. At the same time there might be another application that mainly cares about high availability, or a third application that mainly cares about the cost (in dollars) to run the application. All these requirements can be aggregated into different cost values at each node. It is then a matter of selecting the right cost value when executing the method.
  • SLAs agreed service level agreements
  • the chosen parameters relate to potential bottlenecks resulting from the specific tasks that need to be performed in the analytics function. This might include the number of times a particular data source has to be accessed to run the analytics function.
  • the algorithm is also applicable to write access, where the results need to be written in different storage parts. For example, a node might need to frequently update (write) its data and thus the optimal node will reduce the time it takes to write the data to file.
  • Frequency of data access can also be incorporated into the algorithm by putting different weights for different data sources (small weight on high- frequency data sources, and higher weight on low-frequency data sources). This can then be used when doing the link cost calculations, in the sense that all link costs to a certain data source get an additional weight.
  • the effect of this is that the analytics location protocol will try to give high-frequency data sources higher priority, and the optimal node to run the analytics function will be closer to those data sources.
  • FIG. 3 shows a network node 300 in a distributed network, according to another embodiment.
  • the network node 300 comprises an analytics function location module 302 for determining an optimal node in the distributed network on which to run an analytics function.
  • the analytics function location module is configured to determine a candidate list of potential nodes on which to run the analytics function; for each node in the candidate list, obtain at least one parameter relating to the cost of running the analytics function; and determine the optimal node based on the at least one parameter.
  • the analytics function location module 302 is configured to store and maintain a list of locations of data sources in the network.
  • the analytics function location module may then be configured to receive one or more updates of the location of data sources in the network, for instance via the gossip communication protocol referred to above.
  • the analytics function location module may then be configured to update its list of locations of data sources according to the received updates.
  • the analytics function location module may also receive one or more parameters, from a second node in the network relating to the cost of running the analytics function on that node.
  • FIG. 4 shows a network node 400 in a distributed network, according to another embodiment.
  • the network node comprises a processor 402 and a memory 404, said memory containing instructions executable by said
  • the network node is operative to determine a candidate list of potential nodes on which to run the analytics function.
  • the network node is further operative to calculate, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function, and determine the optimal node based on the at least one calculated parameter.
  • the network node may be further operative to carry out any optional steps of method 100.
  • Figure 5 illustrates functional units in another embodiment of a network node in a distributed network which may execute the method 100 of the present invention, for example according to computer readable instructions received from a computer program. It will be understood that the units illustrated in Figure 5 are software implemented functional units, and may be realised in any appropriate combination of software modules.
  • the network node comprises a first module 502 for determining a candidate list of potential nodes on which to run the analytics function.
  • the network node further comprises a second module 504 for calculating, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function, and a third module 506 for determining the optimal node based on the at least one calculated parameter.
  • the network node 500 may comprise a fourth module for storing and maintaining a list of the locations of data sources in the network.
  • the network node 500 may further comprise a fifth module for receiving one or more updates on the location of data sources in the network and updating the list of locations of data sources accordingly.
  • the network node 500 may further comprise a sixth module for receiving one or more parameters relating to the cost of running the analytics function. Furthermore, the network node 500 may comprise a seventh module for receiving the one or more parameters through gossip communication.
  • a computer program which, when run on a computer, causes the computer to carry out a method according to any of the embodiments listed above.
  • a computer program product comprising computer readable storage medium and a computer program according to the above, stored on the computer readable storage medium.

Abstract

A method in a distributed network for determining an optimal node on which to run an analytics function is disclosed. The method comprises determining a candidate list of potential nodes on which to run the analytics function. The method further comprises, for each node in the candidate list, calculating at least one parameter for the node relating to the cost of running the analytics function and determining the optimal node based on the at least one calculated parameter.

Description

METHODS AND APPARATUS FOR RUNNING AN ANALYTICS FUNCTION
Technical field
This disclosure relates to a method and apparatus for running an analytics function. In particular, it relates to determining an optimal node in a distributed network on which to run an analytics function.
Background
Vast amounts of data is generated daily by people and their devices. One of the challenges of modern computing is how to deal with this data. In particular, when analytics clients (ACs) try to analyse data, the required data sources may be stored in different locations and this poses a logistical problem in distributed networks. Existing solutions to perform analytics in a cloud environment (e.g. a
distributed network) tend to be centrally located in data centres with unlimited resources, using frameworks such as Hadoop™ and Spark™. Alternatively they may be statistically provisioned/deployed in specific nodes. However, such frameworks can increase network overheads as data has to be copied from one or more storage nodes and moved through the network to a central node for processing.
Summary According to a first aspect, there is provided a method in a distributed network for determining an optimal node on which to run an analytics function. In a first step, the method comprises determining a candidate list of potential nodes on which to run the analytics function. The method further comprises, for each node in the candidate list, calculating at least one parameter for the node relating to the cost of running the analytics function, and determining the optimal node based on the at least one calculated parameter. Such a method has an advantage of minimizing the network overhead associated with processing analytics functions, by dynamically choosing which node in the network to run an analytics function on. Thus, instead of executing analytics functions at a designated or central node each time, this method allows the analytics function to be run on an optimal node in the network, where the optimal node can be chosen in order to reduce the network overhead associated with issues such as database latency, bottlenecks, actuation and data location. According to an embodiment, the step of determining a candidate list of potential nodes comprises determining a preliminary list of potential nodes based on the location of one or more data sources used in the analytics function. According to a second embodiment, the candidate list is determined from the preliminary list using graph analysis methods. The graph analysis method may comprise, for example, shortest path calculations.
According to a third embodiment, the analytics function requires data stored on at least a first node and a second node, and wherein the shortest path calculations determine the shortest path between the first node and the second node. In this embodiment, the candidate list of potential nodes contains nodes along the shortest path between the first node and the second node.
According to a further embodiment, the analytics function further requires data stored on a third node, and wherein the shortest path calculations further determine the shortest path between the first node and the third node and the shortest path between the second node and the third node. In this
embodiment, the candidate list of potential nodes also contains nodes along the shortest path between the third node and the first node and/or the third node and the second node.
In some embodiments, the shortest path calculations are weighted according to one or more negotiated link costs between nodes in the network.
In some embodiments, the method further comprises calculating a total cost of computing the analytics function on each of the nodes in the candidate list, wherein the total cost is a combination of the values of one or more
parameters calculated for the node and/or one or more neighbouring nodes.
In some embodiments, the total cost is a weighted combination of the values of the one or more parameters, and the step of determining the optimal node is based on the weighted combination. In such embodiments, the weighting may reflect the frequency that data will need to be accessed on the one or more neighbouring nodes in order to compute the analytics function.
In some embodiments, the total cost is an average combination of the values of the one or more parameters calculated for the node and/or one or more neighbouring nodes, and the step of determining the optimal node is based on the average combination.
In further embodiments, the step of determining an optimal node comprises determining a prioritized list of nodes by ranking each candidate node according to its total cost and selecting an optimal node from the prioritized list.
In any of the preceding embodiments, the at least one parameter can relate to at least one of:
- a node cost;
- a link cost;
- an uplink link cost; - a downlink link cost;
- a measure of network utilization;
- the availability of computing resources at a node;
- the latency of one or more databases on a node;
- the speed at which data can be written to and/or read from one or more databases on a node; and
- the number of times a particular data source has to be accessed to run the analytics function. According to a second aspect, there is provided a network node in a
distributed network. The network node comprises an analytics function location module for determining an optimal node in the distributed network on which to run an analytics function. The analytics function location module is configured to determine a candidate list of potential nodes on which to run the analytics function. The analytics function location module is further configured, for each node in the candidate list, to obtain at least one parameter relating to the cost of running the analytics function and to determine the optimal node based on the at least one parameter. According to an embodiment, the analytics function location module is configured to store and maintain a list of the locations of data sources in the network. The analytics function location module may be configured to receive one or more updates on the location of data sources in the network and update the list of locations of data sources accordingly.
In some embodiments, the analytics function location module is further configured to receive one or more parameters relating to the cost of running the analytics function. In some embodiments, the analytics function location module is configured to receive the one or more parameters through gossip communication. According to a third aspect, there is provided, a network node in a distributed network, the network node comprising a processor and a memory, said memory containing instructions executable by said processor. The network node is operative to determine a candidate list of potential nodes on which to run the analytics function. The network node is further operative to calculate, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function and determine the optimal node based on the at least one calculated parameter.
According to a fourth aspect there is provided a network node in a distributed network, the network node comprising a first module for determining a candidate list of potential nodes on which to run the analytics function; a second module for calculating, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function; and a third module for determining the optimal node based on the at least one calculated parameter.
According to a fifth aspect, there is provided a computer program which, when run on a computer, causes the computer to carry out a method according to any of the embodiments listed above.
According to a sixth embodiment, there is provided a computer program product comprising computer readable storage medium and a computer program according to the fifth aspect, stored on the computer readable storage medium.
Brief Description of the Drawings
Features, objects and advantages of the presently disclosed techniques will become apparent to those skilled in the art by reading the following detailed description where references will be made to the appended figures in which: Figure 1 illustrates an example of a method of determining an optimal node on which to run an analytics function, according to an embodiment; Figure 2 shows an example distributed network;
Figure 3 shows an example network node according to an embodiment;
Figure 4 shows an example network node according to another embodiment; and
Figure 5 shows an example network node according to a further embodiment. Detailed Description
Conventionally, in a distributed network, analytics functions that require data from disparate (and perhaps globally distributed) sources have tended to copy the data required for the calculation from each of the sources to a central node to run the analytics function. As the number of requests for performing analytics functions increases over time, the network overheads in copying data to and from a central node in this way also increases. In time this overhead is projected to reach unsustainable levels. New methods are therefore required to minimize the overhead associated with processing analytics functions. Figure 1 shows a method according to an embodiment, for determining an optimal node on which to run an analytics function. The method comprises determining a candidate list of potential nodes on which to run the analytics function, step 102. The method further comprises, for each node in the candidate list, calculating at least one parameter for the node relating to the cost of running the analytics function, step 104. In step 106, an optimal node is determined based on the at least one calculated parameter. In this way, instead of running an analytics function on a centrally located node, the method provides a mechanism for dynamically choosing which node to run the analytics function on. Executing an analytics function at the
"optimal" place in the network is crucial for performance reasons, for example to reduce the network overhead associated with database latency,
bottlenecks, actuation and data location.
It will therefore be appreciated that some of the aforementioned problems mentioned in the background section may be addressed in a distributed network through the use of a method 100 as described in Figure 1 .
In general, a distributed network consists of a number of nodes connected in various ways with a number of links. Such a distributed network can be illustrated as a graph, as shown in Figure 2. The circle labelled AC in Figure 2 represents an analytics client (AC). The other circles in Figure 2 represent nodes (or computational resources) that can potentially also run analytics clients or analytics functions. Each straight line represents a network link (or connection) between two nodes. Each node is labelled with a letter (a-y), according to the data stored on it, or streamed though it (e.g. the node labelled a, has data source 'a' on it, or streamed through it). Each node and each link is also labelled with a number representing the cost of using the respective nodes and links when running the analytics function. In an Internet of Things (loT) type scenario, such as the one in Figure 2, resources can be limited in some parts of the distributed network and this is reflected in the costs.
The costs of each node and link are dynamically allocated and can be calculated based on utilization of bandwidth, cpu, ram etc. The costs are dynamically updated to reflect the current status of the nodes and the network. The 'cost' in this sense can be a dimensionless number based on a scale agreed between the nodes in the network. Alternatively, the cost can be the actual monetary cost (e.g. in euros or dollars) of using the node, or another dimensional number, such as the CPU utilisation.
The following processes and services continuously run on the network in the background:
Nodes: All nodes run an internal process that, based on resource utilization (cpu, ram, i/o), calculates a cost for the node. This cost indicates how utilized the node is at the moment. A low cost means low utilization and a high cost means high utilization. The cost can be a relative, for example, the utilization can be ranked according to an agreed scale such as e.g. 1 to 10. The node and link costs are dynamically updated at each node. This can be done periodically, or based on threshold values for certain metrics (for example cpu, io, ram, nw etc).
Once calculated, the costs are shared throughout the network and
continuously updated by means of data transfer between neighbouring nodes. This can be done, for example, with a simple heartbeat protocol that periodically sends out a heartbeat with the nodes identifier. All neighbouring nodes are cached in the nodes. Also, there are available networking
implementations that keep track of this, for example gossip communication protocols. These algorithms are used in the point-to-point (P2P) networks, which makes them a perfect fit for distributed scenarios. Gossip communication protocols are a way that computers share information across a network using a form of random "peer selection". At specified time intervals, each node in the network picks another node at random and shares any new data or updates e.g. in this case, any updates on calculated node costs. The fact that the chosen nodes are random means that some nodes will receive the data more than once from two or more different nodes. However, gossip communication protocols are an efficient way of sharing data quickly and efficiently around a distributed network.
Link costs: Nodes that are directly connected and share a link continuously calculate and agree on a cost for that particular link. The link cost is directional such that there are separate link costs for the uplink and downlink. This is done based on the network metrics available such as bandwidth or latency. If the nodes are the vertices of the graph shown in Figure 2, then the link is the value attached to the edge formed between these two vertices. Link costs can be calculated using a Gather Apply Scatter model: o Gather: Receive information about adjacent vertexes. o Apply: Take computation from gather phase, o Scatter: Update data on adjacent edges. With the above mentioned processes running in the background, when an analytics client has an analytics function "f that it wants to execute, using data sources a, b, c, for example f(a, b, c), the method 100 as illustrated in Figure 1 can run in the network as follows. The method 100 may run on a node in the network that is configured to run an Analytics Function Location Service (AFLS). In order to determine an optimal node in the network for running an analytics function, the AFLS has access to one or more data sources, for example, the AFLS may have access to one or more of (i) a list of nodes in the network (ii) the costs of running the analytics function on each node (ii) a list of link costs for transferring data between each pair of nodes (iii) a list of databases and/or the data sources on each node. The AFLS may keep track of the data sources listed above and may update the data sources (for example, what data is stored on each node) based on updates from the network received, for example, through the gossip
communication protocol described above. In some embodiments, the step of determining a candidate list of potential nodes on which to run the analytics function 102, comprises determining a preliminary list of potential nodes based on the location of one or more data sources used in the analytics function. For example, for the analytics function f(a,b,c) the AFLS may generate a preliminary list of nodes based on the location of the data sources a, b and c. For example, the preliminary list of nodes may contain nodes that have a, b or c stored on them, or the candidate list may contain nodes that can accommodate the function "f and have access to a, b and/or c. In the example shown in Figure 2, for clarity, data source 'a' can be obtained from node a, data source 'b' can be obtained from node b and data source 'c' can be obtained from node c. In these examples, the data can be obtained from the respective nodes because, for example, the data is stored on the node, or because the data is being generated and streamed from the node. Therefore, in this example, the preliminary list contains nodes a, b and c.
According to some embodiments, the AFLS will then determine a candidate list from the preliminary list using graph analysis methods. For example, the graph analysis method may comprise shortest path (SP) calculations for all combinations of dependent sources, e.g. calculating all of the shortest paths between the different locations of the data sources a, b and c (a-b, a-c, b-c). In general, if the analytics function requires data stored on a first node, a second node and a third node, then the shortest path calculations determine the shortest path between the first node and the second node, the shortest path between the first node and the third node and the shortest path between the second node and the third node.
In the case that there is more than one node in the network that stores a particular data source (e.g. the data is repeated), then this may also involve calculating the shortest paths for each repeated data source, for example if there were a second source of data a, a' then the shortest path calculations may determine the shortest paths between (a-b, a-c, b-c, a'-b, a'-c').
In some embodiments, the shortest path calculations are calculated by the nodes of the network. For example, shortest path calculations can be initiated by the AFLS by sending ComputeSP multicast messages to the first nodes in each unique combination: i. message to a: get shortest path between a-b and a-c
ii. message to b: get shortest path between b-c
Once a node receives a ComputeSP message, it will return an OK message and store the AFLS return address, and start a shortest path calculation. In an alternative embodiment, the AFLS may calculate the shortest paths itself, based on information available to it about the network and network costs. In this embodiment, all information is sent to one node that collects a global view of the network. The information might be received at the node through a dedicated protocol that sends all updates to the central node, or alternatively, through the gossip protocol described above.
Shortest path computations can be performed using any already existing algorithm, for example such as Dijkstra's algorithm or the A* (A-star) search algorithm.
In some embodiments, the shortest path calculations calculate the shortest physical network path between the two nodes, for example, the path(s) that contains the fewest intermediary nodes. In alternative embodiments, the basic shortest path calculations can be modified such that the algorithm is effectively weighted according to the negotiated link costs between the nodes. For example, whilst the path (x->d- >e) in Figure 2, may physically be the same length as the path (x->y->e), the link cost between nodes d and e is higher than the link cost between nodes y and e. These link costs may be used as weightings in the shortest path calculations, such that the shortest path calculation would consider (x->y->e) to be 'shorter' than, or preferential to, the path (x->d->e). The results of the shortest path calculation are collected in a format that shows the entire path, including node and link costs. Referring again to Figure 2, the shortest paths for the combinations (a-b, a-c, b-c) are:
a->x->b (costs for links and nodes are also included)
a->x->y->c (costs for links and nodes are also included) b->x->y->c (costs for links and nodes are also included)
In embodiments where the shortest paths are calculated by the nodes themselves, the results of the shortest path calculations are then sent to the AFLS using a message such as a ResultSP message. The ResultSP message contains a list of all of the nodes along the shortest path between the two nodes (including the end nodes themselves). The AFLS collects all of the nodes along all of the shortest paths and this forms the candidate list of potential nodes on which to run the analytics function. The candidate list 102 may therefore comprise a list of nodes that appear along a shortest path between two nodes that store data needed for the analytics function. The candidate list is sorted so that each node only appears once. According to Figure 2, (a, x, y, b, c) are the unique nodes from the shortest paths between the nodes (a, b, c).
The candidate list above is then sorted, using one or more parameters. This may comprise calculating a total cost of computing the analytics function on each of the nodes in the candidate list, wherein the total cost is a combination of the values of one or more parameters calculated for the node and/or one or more neighbouring nodes.
As an example, the two parameters 'link cost' and 'node cost' can be used to put the candidates in a prioritized order.
First, sort using link costs: For each candidate (a, x, y, b, c) extract link cost to each data source (a, b, c) and weigh them together, by averaging them to compute a 'total cost'. Thus according to the example from figure 2:
• a: (link cost to a + link cost to b + link cost to c)/3 = v1 = (0+3+3)/3=6/3
• b: (link cost to a + link cost to b + link cost to c)/3 = v2 = (3+0+4)/3=7/3
• c: (link cost to a + link cost to b + link cost to c)/3 = v3 = (3+4+0)/3=7/3
• x: (link cost to a + link cost to b + link cost to c)/3 = v4 = (1 +2+2)/3=5/3 (lowest)
• y: (link cost to a + link cost to b + link cost to c)/3 = v5 = (2+3+1 )/3=6/3
The end result can then be used to sort the candidates according their total cost: x (5/3), a (6/3), y (6/3), b (7/3), c (7/3).
The sorted list above ranks the nodes in order of their link costs (the node with the lowest link cost being ranked the highest). The available resources on each node (i.e. the node cost) can then added to the calculations, by weighing the node cost together with the results from the previous step. This calculation can be a simple average operation such as: a: (6/3 + 3)/2 = 15/6
b: (7/3 + 0)12 = 7/6
c: (7/3 + 5)/2 = 22/6
· x: (5/3 + 0)/2 = 5/6
y: (6/3 + 1 )/2 = 9/3 Sorting this list of total costs (in the order of lowest cost to highest cost) gives: x, b, a, y, c. Thus in this example, based on optimising according to link costs and node costs, the optimal node on which to run the analytics function is node x and the second most optimal node is node b.
The analytics client can then execute the analytics function according to the prioritised list of nodes. In other examples, the total cost can also be a weighted combination of the values of the one or more parameters. For example, the link and node costs can be weighted according to how important they are for a particular network or analytics function. The formula to calculate the total cost of running the analytics function on a node a, CostTotai,a , would then be:
COStTotal,a = Weightnode * COStnode, a + Weight|ink * COStiink, a-
Where weightn0de and weight|ink are the weightings to be applied to the node and link costs respectively. This enables a higher cost to be put on nodes that already are overutilized, by making weightn0de > weight|ink-
The examples given above take node costs and link costs into consideration when ranking each node and determining an optimal node on which to calculate the analytics function. More generally, other parameters may also be used, for example, parameters relating to utilisation, such as a node cost or link cost or a measure of network utilization. Alternatively, the one or more parameters may relate to latency, for example, the availability of computing resources, the latency of one or more databases on a node or the speed at which data can be written to and/or read from one or more databases on a node. Costs for the links may be directional (i.e. different costs in different directions depending on load) and thus there may be different parameters describing uplink link cost and downlink link cost for each pair of nodes. Furthermore, there may be differences between the costs of reading and writing data as the cost in read might depend on the delay of a stream (network capacity) while the cost in write might depend on the disk type of the machine (SSD disk). The choice of which parameters to use may depend on a number of factors, for example, the parameters may be chosen on a network level, such that the AFLS uses the same parameters and cost calculations to determine the optimal nodes for all analytics functions running on the network. For example, if the nodes in a network are generally over-utilised, the equations and calculations above might be modified so as to reduce node utilisation. For instance, the AFLS may calculate the optimal node for each analytics function based solely on the node cost.
In some examples, the parameters may be determined on a network level (i.e. the same parameters may be used for all analytics functions as above) but the parameters chosen may be changed over time, for example, if the AFLS has been calculating the optimal node for each analytics function based solely on the node cost and, over time, node utilisation drops, it may then be preferable for the AFLS to stop calculating the optimal node based on node cost and to use another parameter instead, for example, link utilisation.
In alternative embodiments still, the parameter(s) may be chosen on a case by case basis for each analytics function. As such, the method may further comprise choosing one or more parameters for the analytics function to use to determine the optimal node on which to run the analytics function. For instance, if a first analytics function requires a node to frequently write to its database, it may be important to choose a node with a low database latency and thus database latency may be used to determine the optimal node.
However, if a second analytics function requires particularly resource intensive calculations, it may be more important to choose a node with sufficient memory, and thus a parameter based on CPU may be used to determine the optimal node. In general, different types of costs may be used, depending on the needs of the applications and agreed service level agreements (SLAs). For example, one application might have requirements on maximum latency and therefore a cost model that captures this aspect is needed. At the same time there might be another application that mainly cares about high availability, or a third application that mainly cares about the cost (in dollars) to run the application. All these requirements can be aggregated into different cost values at each node. It is then a matter of selecting the right cost value when executing the method. In some examples, the chosen parameters relate to potential bottlenecks resulting from the specific tasks that need to be performed in the analytics function. This might include the number of times a particular data source has to be accessed to run the analytics function. The algorithm is also applicable to write access, where the results need to be written in different storage parts. For example, a node might need to frequently update (write) its data and thus the optimal node will reduce the time it takes to write the data to file. As noted above, there may be different costs associated with reading and writing and thus the correct cost needs to be selected depending on whether the analytics function is predominantly reading or writing data.
Frequency of data access can also be incorporated into the algorithm by putting different weights for different data sources (small weight on high- frequency data sources, and higher weight on low-frequency data sources). This can then be used when doing the link cost calculations, in the sense that all link costs to a certain data source get an additional weight. The effect of this is that the analytics location protocol will try to give high-frequency data sources higher priority, and the optimal node to run the analytics function will be closer to those data sources.
Figure 3 shows a network node 300 in a distributed network, according to another embodiment. The network node 300 comprises an analytics function location module 302 for determining an optimal node in the distributed network on which to run an analytics function. The analytics function location module is configured to determine a candidate list of potential nodes on which to run the analytics function; for each node in the candidate list, obtain at least one parameter relating to the cost of running the analytics function; and determine the optimal node based on the at least one parameter.
In one embodiment of the network node, the analytics function location module 302 is configured to store and maintain a list of locations of data sources in the network. The analytics function location module may then be configured to receive one or more updates of the location of data sources in the network, for instance via the gossip communication protocol referred to above. The analytics function location module may then be configured to update its list of locations of data sources according to the received updates. The analytics function location module may also receive one or more parameters, from a second node in the network relating to the cost of running the analytics function on that node.
Figure 4 shows a network node 400 in a distributed network, according to another embodiment. The network node comprises a processor 402 and a memory 404, said memory containing instructions executable by said
processor. The network node is operative to determine a candidate list of potential nodes on which to run the analytics function. The network node is further operative to calculate, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function, and determine the optimal node based on the at least one calculated parameter. The network node may be further operative to carry out any optional steps of method 100.
Figure 5 illustrates functional units in another embodiment of a network node in a distributed network which may execute the method 100 of the present invention, for example according to computer readable instructions received from a computer program. It will be understood that the units illustrated in Figure 5 are software implemented functional units, and may be realised in any appropriate combination of software modules.
The network node comprises a first module 502 for determining a candidate list of potential nodes on which to run the analytics function. The network node further comprises a second module 504 for calculating, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function, and a third module 506 for determining the optimal node based on the at least one calculated parameter.
The network node 500 may comprise a fourth module for storing and maintaining a list of the locations of data sources in the network. The network node 500 may further comprise a fifth module for receiving one or more updates on the location of data sources in the network and updating the list of locations of data sources accordingly.
In some embodiments, the network node 500 may further comprise a sixth module for receiving one or more parameters relating to the cost of running the analytics function. Furthermore, the network node 500 may comprise a seventh module for receiving the one or more parameters through gossip communication.
According to another embodiment, there is provided a computer program which, when run on a computer, causes the computer to carry out a method according to any of the embodiments listed above. According to another embodiment, there is provided a computer program product comprising computer readable storage medium and a computer program according to the above, stored on the computer readable storage medium.
The embodiments described herein have the advantage of minimizing the network overhead associated with processing analytics functions, by
dynamically choosing which node in the network to run an analytics function on. This allows the analytics function to be run on an optimal node in the network, where the optimal node can be chosen in order to reduce the network overhead associated with issues such as database latency,
bottlenecks, actuation, data location and so on.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.

Claims

1 . A method in a distributed network for determining an optimal node on which to run an analytics function, the method comprising: determining a candidate list of potential nodes on which to run the analytics function (step 102);
for each node in the candidate list, calculating at least one parameter for the node relating to the cost of running the analytics function (step 104); and
determining the optimal node based on the at least one calculated parameter (step 106).
2. A method as claimed in claim 1 , wherein the step of determining a candidate list of potential nodes comprises determining a preliminary list of potential nodes based on the location of one or more data sources used in the analytics function.
3. A method as claimed in claim 2, wherein the candidate list is determined from the preliminary list using graph analysis methods.
4. A method as claimed in claim 3, wherein the graph analysis method comprises shortest path calculations.
5. A method as claimed in claim 4, wherein the analytics function requires data stored on at least a first node and a second node and the shortest path calculations determine the shortest path between the first node and the second node.
6. A method as claimed in claim 5, wherein the candidate list of potential nodes contains nodes along the shortest path between the first node and the second node.
7. A method as claimed in claim 5, wherein the analytics function further requires data stored on a third node and the shortest path calculations further determine the shortest path between the first node and the third node and the shortest path between the second node and the third node.
8. A method as claimed in claim 7, wherein the candidate list of potential nodes contains nodes along the shortest path between the third node and the first node and/or the third node and the second node.
9. A method as claimed in any one of claims 4 to 8, wherein the shortest path calculations are weighted according to one or more negotiated link costs between nodes in the network.
10. A method as claimed in any one of the preceding claims, further comprising calculating a total cost of computing the analytics function on each of the nodes in the candidate list, wherein the total cost is a combination of the values of one or more parameters calculated for the node and/or one or more neighbouring nodes.
1 1 . A method as claimed in claim 10 wherein the total cost is a weighted combination of the values of the one or more parameters, and wherein the step of determining the optimal node is based on the weighted combination.
12. A method as claimed in claim 1 1 , wherein the weighting reflects the frequency that data will need to be accessed on the one or more neighbouring nodes in order to compute the analytics function.
13. A method as claimed in claim 10 wherein the total cost is an average combination of the values of the one or more parameters calculated for the node and/or one or more neighbouring nodes, and wherein the step of determining the optimal node is based on the average combination.
14. A method as claimed in any one of claims 10 to 13, wherein the step of determining an optimal node comprises determining a prioritized list of nodes by ranking each candidate node according to its total cost and selecting an optimal node from the prioritized list.
15. A method as claimed in any one of the preceding claims, wherein the at least one parameter relates to at least one of:
- a node cost;
- a link cost;
- an uplink link cost;
- a downlink link cost;
- a measure of network utilization;
- the availability of computing resources at a node;
- the latency of one or more databases on a node;
- the speed at which data can be written to and/or read from one or more databases on a node; and
- the number of times a particular data source has to be accessed to run the analytics function.
16. A network node (300) in a distributed network, the network node comprising:
an analytics function location module (302) for determining an optimal node in the distributed network on which to run an analytics function, wherein the analytics function location module (302) is configured to: determine a candidate list of potential nodes on which to run the analytics function;
for each node in the candidate list, obtain at least one parameter relating to the cost of running the analytics function; and - determine the optimal node based on the at least one parameter.
17. A network node as claimed in claim 16, wherein the analytics function location module is configured to store and maintain a list of the locations of data sources in the network.
18. A network node as claimed in claim 17, wherein the analytics function location module is configured to receive one or more updates on the location of data sources in the network and update the list of locations of data sources accordingly.
19. A network node as in any one of claims 16, 17 and 18, wherein the analytics function location module is further configured to receive one or more parameters relating to the cost of running the analytics function.
20. A network node as in claim 19, wherein the analytics function location module is configured to receive the one or more parameters through gossip communication.
21 . A network node (400) in a distributed network, the network node comprising a processor (402) and a memory (404), said memory (404) containing instructions executable by said processor (402), whereby said network node (400) is operative to:
- determine a candidate list of potential nodes on which to run the analytics function; - calculate, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function; and
- determine the optimal node based on the at least one
calculated parameter.
22. A network node (500) in a distributed network, the network node comprising:
a first module (502) for determining a candidate list of potential nodes on which to run the analytics function;
a second module (504) for calculating, for each node in the candidate list, at least one parameter for the node relating to the cost of running the analytics function; and
a third module (506) for determining the optimal node based on the at least one calculated parameter.
23. A computer program which, when run on a computer, causes the computer to carry out a method according to any one of claims 1 to 15.
24. A computer program product comprising computer readable storage medium and a computer program according to claim 23 stored on the computer readable storage medium.
PCT/EP2015/069173 2015-08-20 2015-08-20 Methods and apparatus for running an analytics function WO2017028930A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2015/069173 WO2017028930A1 (en) 2015-08-20 2015-08-20 Methods and apparatus for running an analytics function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2015/069173 WO2017028930A1 (en) 2015-08-20 2015-08-20 Methods and apparatus for running an analytics function

Publications (1)

Publication Number Publication Date
WO2017028930A1 true WO2017028930A1 (en) 2017-02-23

Family

ID=54105766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/069173 WO2017028930A1 (en) 2015-08-20 2015-08-20 Methods and apparatus for running an analytics function

Country Status (1)

Country Link
WO (1) WO2017028930A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528904A (en) * 2017-09-01 2017-12-29 星环信息科技(上海)有限公司 Method and apparatus for data distribution formula abnormality detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140372611A1 (en) * 2013-06-14 2014-12-18 Fujitsu Limited Assigning method, apparatus, and system
WO2015032430A1 (en) * 2013-09-04 2015-03-12 Telefonaktiebolaget L M Ericsson (Publ) Scheduling of virtual machines

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140372611A1 (en) * 2013-06-14 2014-12-18 Fujitsu Limited Assigning method, apparatus, and system
WO2015032430A1 (en) * 2013-09-04 2015-03-12 Telefonaktiebolaget L M Ericsson (Publ) Scheduling of virtual machines

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ARSLAN ENGIN ET AL: "Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications", 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS, IEEE, 21 November 2014 (2014-11-21), pages 17 - 24, XP032726756, DOI: 10.1109/DATACLOUD.2014.10 *
BALAJI PALANISAMY ET AL: "Purlieus: Locality-aware resource allocation for MapReduce in a cloud", HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2011 INTERNATIONAL CONFERENCE FOR, IEEE, 12 November 2011 (2011-11-12), pages 1 - 11, XP032081479, ISBN: 978-1-4503-0771-0 *
MANFU MA ET AL: "A Grid-distance Based Scheduling for Grid Resource Management", HIGH-PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, 2005. PROCEEDINGS. EIGHTH INTERNATIONAL CONFERENCE ON BEIJING, CHINA 30-03 NOV. 2005, PISCATAWAY, NJ, USA,IEEE, 30 November 2005 (2005-11-30), pages 1 - 6, XP010895507, ISBN: 978-0-7695-2486-3, DOI: 10.1109/HPCASIA.2005.4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528904A (en) * 2017-09-01 2017-12-29 星环信息科技(上海)有限公司 Method and apparatus for data distribution formula abnormality detection
CN107528904B (en) * 2017-09-01 2020-02-18 星环信息科技(上海)有限公司 Method and apparatus for data distributed anomaly detection

Similar Documents

Publication Publication Date Title
US10255052B2 (en) Dynamic deployment of an application based on micro-services
US11018979B2 (en) System and method for network slicing for service-oriented networks
JP7061693B2 (en) Task scheduling methods, devices, programs and equipment based on graph data
JP4421637B2 (en) Distributed scheduling of subtask processors
US8924392B2 (en) Clustering-based resource aggregation within a data center
US9420513B1 (en) Clustering approach to estimating a network metric for nodes
Tos et al. Dynamic replication strategies in data grid systems: a survey
Ilkhechi et al. Network-aware virtual machine placement in cloud data centers with multiple traffic-intensive components
Kang et al. Virtual network function allocation to maximize continuous available time of service function chains with availability schedule
CN112148492A (en) Service deployment and resource allocation method considering multi-user mobility
Wang et al. Network-aware QoS prediction for service composition using geolocation
Abreu et al. A rank scheduling mechanism for fog environments
JP4265377B2 (en) Load balancing method, apparatus, system, and program
Lu et al. Cost-efficient resource provisioning in delay-sensitive cooperative fog computing
US9124508B2 (en) Communication control device communication control system, communication control method and program
Shefu et al. Fruit fly optimization algorithm for network-aware web service composition in the cloud
Messina et al. Decentralised resource finding and allocation in cloud federations
WO2017028930A1 (en) Methods and apparatus for running an analytics function
CN115361332A (en) Processing method and device for fault-tolerant routing, processor and electronic equipment
Qiao et al. Load balancing in peer-to-peer systems using a diffusive approach
Di et al. Decentralized proactive resource allocation for maximizing throughput of P2P grid
Zhao et al. Distance-aware virtual cluster performance optimization: A hadoop case study
Rahmanian et al. RAVAS: Interference-Aware Model Selection and Resource Allocation for Live Edge Video Analytics
CN115587222B (en) Distributed graph calculation method, system and equipment
Celaya et al. Scalable architecture for allocation of idle CPUs in a P2P network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15762932

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15762932

Country of ref document: EP

Kind code of ref document: A1