WO2017032422A1

WO2017032422A1 - Method and system for scaling of big data analytics

Info

Publication number: WO2017032422A1
Application number: PCT/EP2015/069648
Authority: WO
Inventors: András VERES; Péter MÁTRAY; Gabor Nemeth
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2015-08-27
Filing date: 2015-08-27
Publication date: 2017-03-02

Abstract

It is disclosed a network node (70; 80) and a system (40; 50; 60) configured to enable scaling of distributed applications in an analytics task, and methods performed therein. Based on patterns of data access obtained from monitored data accesses between said applications, a graph model is created (S45; S340) illustrating read-and write dependences of the analytics task. Based on whether a first application out of two or more applications, is in backlog, it is decided whether to initiate (S47; S340) upscaling of an application neighboring along the graph model, causing the backlog, or initiate (S48; S340) downscaling of said first application if the first application does not cause backlog in any other neighboring applications along the graph model. An optimized scaling is initiated for which multiple applications are considered simultaneously. The amount of resources needed is minimized, considering backlog in the applications in question.

Description

METHOD AND SYSTEM FOR SCALING OF BIG DATA ANALYTICS

TECHNICAL FIELD This disclosure relates big data analytics. More particularly, it relates to a network node and a system for scaling of distributed applications performing an analytics task, and to methods performed therein.

BACKGROUND

Systems within Big Data are capable of processing huge amounts of data. Numerous techniques exist to split data and split processing into a large number of pieces and run applications on computers in parallel. Examples thereof include Hadoop map-reduce, Spark, Storm, etc.

In particular, there are systems providing an important capability to deploy applications as jobs performed in distributed containers. Exampled of techniques using container comprise YARN, Mesos, Kubernetes, and Open Flow.

Moreover, workflow management systems also exist to manage jobs depending on the output of each other, of which Chronos is one example.

As of today allocating of containers and/ other resources to applications is to a large extent a trial and error type of task. There are techniques which function for some simple environments. However, for more realistic environments, they fail.

For instance, existing techniques work if data is already distributed and jobs or applications can be run independently of each other. For example, scaling of map-reduce jobs or applications is relatively easy, since requires input is read from disks and generated output is written on disks before any other job or application can process the generated output. Scaling in such an environment is a task of finding the number of parallelism between map- and reduce- phases, and minimizing the time between the start and stop of the job or application.

Workflow systems typically orchestrate jobs to start running jobs when previous job has finished. However, most data analytics of today is done more in parallel, i.e. jobs and applications are running at the same time exchanging data while being active.

Sequence IQ Periscope targets non-batch applications, and monitors statistics of applications. If the statistics falls below a target, it assigns additional containers. This allows for scaling. However, it considers one application in isolation. It is possible to scale a Redis cluster to perform, for instance, 1 million operations per second, but there is no reliable way to tell whether 1 million operations per second is a desired target or not.

To the best of our knowledge, applications or jobs are considered in isolation in all existing techniques. It is thus possible to scale a certain application to reach a certain performance using existing techniques.

However, in reality there are multiple applications/jobs running together performing a larger, overall task. The performance of this task is thus dependent of the performance of said multiple applications/jobs. It is a challenge how to use resources for these multiple applications/jobs to optimize the performance of the overall task.

There is a need for a solution addressing the issues discussed above.

SUMMARY

It is an object of exemplary embodiments herein to address at least some of the issues outlined above and enable scaling of multiple applications of an analytics task. This object and others are achieved by a network node and a system, and methods performed therein, according to the appended independent claims, and by the exemplary embodiments according to the dependent claims.

According to an aspect, the exemplary embodiments provide a method for scaling of distributed applications of a network analytics task, where the method is performed in a network node. The method comprises creating a graph model of said applications based on patterns of data access between said applications, where the graph model defines inter-dependences of said applications. The method also comprises, for each of two or more of the applications, determining whether a first application of said two or more applications is in backlog. Also, the method comprises, when the first application is in backlog, initiating upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. In addition, the method comprises, when the first application is neither in backlog nor causes backlog in any other application(s) neighboring to the first application along the graph model, initiating downscaling of the first application.

According to another aspect, the exemplary embodiments provide a method for scaling of applications of a network analytics task performed in a distributed network, wherein the method is performed in system. The method comprises monitoring, by a data access module, data accesses between said applications. The method comprises extracting, by the data access module, patterns of data access based on the monitored data accesses. The method also comprises creating, by an orchestration module, a graph model of said applications based on the extracted patterns of data access, where the graph model defines inter-dependences of said applications. The method further comprises, for each of two or more of the applications, determining, by the orchestration module, whether a first application of said two or more applications is in backlog. Also, the method comprises, when the first application is in backlog, initiating, by the orchestration module, upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. In addition, the method comprises, when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiating, by the orchestration module, downscaling of the first application.

According to another aspect, the exemplary embodiments provide a network node capable of causing scaling of distributed applications of a network analytics task. The network node is connectable to a data access module adapted to monitor data accesses between said applications. The network node is configured to create a graph model of said applications based on the patterns of data access, where the graph model defines inter-dependences of said applications. The network node is also configured to, for each of two or more of the applications, determine whether a first application of said applications is in backlog. Also, the network node is configured to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. In addition, the network node is configured to, when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

According to yet another aspect, the exemplary embodiments provide a system capable of causing scaling of distributed applications of a network analytics task. The system is capable of causing scaling of applications of a network analytics task performed in a distributed network. The system comprises a data access module that is adapted to interconnect the applications, and an orchestration module connectable to said data access module. The data access module is configured to monitor data accesses between said applications. The data access module is configured to extract patterns of data access based on the monitored data accesses. The orchestration module is configured to create a graph model of said applications based on the extracted patterns of data access, where the graph model defines inter-dependences of said applications. The orchestration module is configured to, for each of two or more of the applications, determine whether a first application of said applications is in backlog. Also, the orchestration module is configured to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. In addition, the orchestration module is configured to, when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

According to further aspects, the object is also achieved by computer programs and computer readable storage media corresponding to the aspects above.

Some advantages of the present exemplary embodiments of the following:

By initiating scaling of applications while considering multiple applications, allocation of resources can be optimized for said multiple applications.

This disclosure may also reduce costs of operation of large clusters eliminating manual dimensioning of applications.

The amount of resources necessary to run a cluster having distributed applications, are reduced for the reason that resources are allocated just at a level necessary to avoid backlog.

It is further an advantage that this disclosure is applicable for real-time analytics.

Other objects, advantages and features of embodiments will be explained in the following detailed description when considered in conjunction with the accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments will now be described in more detail, and with reference to the accompanying drawings, in which:

- Figure 1 schematically illustrates monitoring of data accesses between distributed applications of an analytics task, in a system capable of causing scaling of said applications;

Figure 2 illustrates a graph model of read/write dependences of inter-connected applications;

- Figure 3 presents a signaling diagram of signaling between applications A-D, and a data access module and an orchestration;

Figure 4 illustrates a combined flow-chart and system presentation according to some embodiments;

Figures 5 and 6 schematically present a system according to different embodiments; and Figures 7 and 8 schematically present a network node according to different embodiments.

DETAILED DESCRIPTION

In the following description, exemplary embodiments will be described in more detail, with reference to accompanying drawings. For the purpose of explanation and not limitation, specific details are set forth, such as particular examples and techniques in order to provide a thorough understanding.

This disclosure relates to scenario in which applications in a cluster of network nodes are performing an analytics task, where the applications are distributed in containers. The applications, and hence also the containers communicate with one another using data model operation methods provided herein by a system as presented herein. The system monitors data accesses between applications, and extracts a graph model of read and write dependences of inter-connected applications based on monitored data accesses. Having determined a backlogged application, this graph model is a tool for determining which applications may be allocated additional resources and how to decide that an application can be allocated less resources.

Applications in a cluster of nodes may be implemented as containers running on multiple computing devices.

Figure 1 schematically illustrates interconnection of distributed applications of an analytics task by a data access module for monitoring of data accesses between said applications. In this example, Application A, 102, is distributed and implemented as container 1 , 104, and container 2, 106. Application B, 108 is in this example distributed and implemented as container 3, 1 10, and container 4, 1 12. Application A, 102 is thus connected to Application B, 108, via a data access module 1 14, which monitors data access between applications. An orchestration module 1 16 is further connected to the data access module 1 16.

Each container may be a single process, or a set of processes managed together and sharing a set of resources within an application.

Figure 2 presents one example of a simple graph model, illustrating read and write dependences between applications. Data is accessed between application A, 202, and application C, 206, as well as between application B, 204, and application C, 206. Data may be accessed either by reading data from an application or by writing data to an application. An arrow from application A, 202, to application C, 206, may represent that data is accessed by writing data from application A, 202, to application C, 206, or by reading data from application A, 202, by application C, 206. Each application may thus manipulate data either by reading the data or writing the data.

Figure 3 presents a signaling diagram of signaling between entities involving application A, 302, application B, 304, application C, 308, application D, 310, a data access module 306 and an orchestration module 312.

The signaling between the entities as described in Fig. 3 corresponds to the graph model of interconnected applications as presented in Fig. 2. A data access module may however also monitor data access or intercept data being accessed between a number of other applications, forming other graph models.

The data access module 306 may be configured to provide protocols and communication interface for applications A, 302, application B, 304, application C, 308, and application D, 310, to exchange data or monitor data access between applications.

In S316, data may be accessed from application A, 302, monitored or intercepted by data access module 306, in S318, and received by application C, 308, in S320.

Similarly, in S322, data may be accessed from application B, 304, monitored or intercepted by data access module 306, in S324, and received by application C, 308, in S326.

Application C, 308, may then perform processing in S328 based on data accessed in S320 and/or S326.

In S330, data from processing S328 may be accessed from application C, 308, monitored or intercepted by data access module 306, in S332, and received by application D, in S334.

Based on the data access being monitored in S318, S324 and S332, the data access module 306, may then extract or create one or more patterns of data access.

The data access module 306 may monitor or intercept any data access conveyed via said data access module. The data access module 306 may also provide primitives for modelling of patterns of data access, where said primitives may allow patterns of access such as:

Key-value operations accessed between applications

Stream operations accessed between applications

File access operations between applications

Based on provided primitives for modelling of patterns of data access the above cited operations may be used to extract or create patterns of data access in S336.

In S338, the data access module 306, may report patterns of data access and statistics of accessed data towards the orchestration module 312.

As mentioned above, each application may be distributed and employed as containers running in multiple processing circuits and/or network nodes/hosts. Hence, for scalability reasons the data access module 306 may be run in many instances distributed among multiple hosts.

Also in S338, the orchestration module 312 collects patterns of data access from the data access module 306.

In S340, the orchestration subsequently may perform method steps enabling scaling of applications based on the patterns of data access.

The orchestration module 312 will be further described below.

Figure 4 illustrates a system 40 capable of causing scaling of applications of a network analytics task performed in a distributed network, and modules comprised therein, as well as flow charts of method steps according to some embodiments of this disclosure.

The system 40 comprises a data access module 41 and an orchestration module 42. The data access module 41 is adapted to interconnect the applications, and may be distributed in many instances.

The orchestration module 42 is connectable to the data access module 41 , and may be localized within a network node.

The data access module 41 of the system 40 is configured to:

monitor data accesses between said applications, and

extract patterns of data access based on the monitored data accesses. The orchestration module 42 of the system 40 is configured to:

- create a graph model of said applications based on the extracted patterns of data access, where the graph model defines inter-dependences of said applications;

- for each of two or more of the applications determine whether a first application of said applications is in backlog;

- when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model; and

- when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

The data access module 41 of the system 40 may further be configured to monitor data accesses between said applications, wherein the data accesses comprise key-value operations, stream operations and/or file access operations, between said applications. With reference to the flow chart of Fig. 4, a method for scaling of applications of a network analytics task performed in a distributed network, is now described. The method is performed in a system 40 and comprises the following actions.

Action S43: Monitoring by a data access module, data accesses between said applications.

Action S44: Extracting by the data access module, patterns of data access based on the monitored data accesses.

Action S45: Creating by an orchestration module, a graph model of said applications based on the extracted patterns of data access, where the graph model defines inter- dependences of said applications.

Action S46: For each of two or more of the applications, determining, by the orchestration module, whether a first application of said two or more applications is in backlog.

Action S47: When the first application is in backlog, initiating, by the orchestration module, upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model.

Action S48: When the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiating by the orchestration module, downscaling of the first application.

Action S43, monitoring of data accesses between said applications, may comprise monitoring key-value operations, stream operations and/or file access operations, between said applications.

Actions S43, monitoring of data accesses between said applications, may as well comprise monitoring data access between containers, since the applications are typically distributed in containers.

Action S46 may be performed for a large number of applications, and even an entire number of applications running with a cluster of network nodes performing an analytics task.

It is clarified that "a first application" of said two or more applications, denotes any application of said two or more applications.

An application being "in backlog" herein denotes an application that is waiting for input, such as data, from another application.

Neighboring applications denote applications that are next to each other along a graph model. Using the graph model from Fig. 2, which may have been created in S45, when determining in S46 whether the first application of said two or more applications is in backlog, may for instance comprise determining whether application C out of application C and application D, is in backlog.

Then, in S47 having determined that application C is in backlog, S47 may comprise initiating, by the orchestration module, upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. Based on the received data and patterns of data access, it can be determined which application causes the backlog in application C. In this example, it may be application A or application B that causes the backlog in application C. Hence application A or application B will be subject of initiating upscaling.

Alternatively, even application A and application B may cause the backlog in application C. If it is determined that more than one application is causing backlog in the first application, as herein exemplified as application C, one or more votes for upscaling may be assigned to the applications that causes the backlog in application C. The assignment of votes or the number of votes being assigned may be based on an amount of lagging data between said application and application C, the lagging data being the data that application C is awaiting, and hence causing the backlog.

A relative measure of applications causing backlog in application C may also be envisaged. For instance, two or more applications may be given different priorities, and initiating upscaling may be performed based on these priorities.

By using votes or relative measures of to what extent an application causes another application to be in backlog, the system may pinpoint the weakest application or link in the cluster and initiate commanding allocation of additional resources to the task performing the backlogged task in the application causing the backlog.

Applying the model graph model from Fig. 2, in action S48 when application C, being one example of the first application, is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, i.e. that application C is not causing backlog in application D, or rephrased that application D is not waiting for data or input from application C, initiating by the orchestration module, downscaling of application C may be performed. If there is no application waiting for data from application C, and application is not in backlog C, downscaling of application may thus be initiated.

The present disclosure also comprises a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method, executable in the orchestration module, for scaling of applications of a network analytics task performed in a distributed network. The present disclosure also comprises a computer-readable storage medium, having stored thereon a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method as above, for scaling of applications of a network analytics task performed in a distributed network.

In an alternative way to describe the system, Fig. 5 presents a system 50 capable of causing scaling of applications of a network analytics task performed in a distributed network.

The system 50 comprises a monitoring unit 52, an extracting unit 54, a creating unit 56, a determining unit 58, and an initiating unit 59. Although expressed as units, for instance, the monitoring unit 52 and the extracting unit 54 may be distributed rather than localized. The monitoring unit 52 is configured to monitor data accesses between applications. The extracting unit 54 is configured to extract patterns of data access based on the monitored data accesses. The creating unit 56 is configured to create a graph model of said applications based on the extracted patterns of data access, where the graph model defines inter-dependences of said applications. The determining unit 58 is configured to, for each of two or more of the applications, determine whether a first application of said two or more applications is in backlog. The initiating unit 59 is configured to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. The initiating unit 59 is also configured to, when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

The monitoring unit 52 of the system 50 may further be configured to monitor key-value operations, stream operations and/or file access operations, between said applications.

With reference to Fig. 6, an alternative embodiment of the system is presented. The system 60 capable of causing scaling of applications of a network analytics task performed in a distributed network, comprises a processing circuit 62, a memory 64 and a communication interface 66, according to embodiments of this disclosure. The communication interface 66 may be configured to connect to applications from, or to, which data access may be monitored.

The memory 64 may contain instructions executable by said processing circuit 62, whereby the system 60 may be operative to monitor data accesses between said applications, and to extract patterns of data access based on the monitored data accesses. The system 60 is also operative to create a graph model of said applications based on the extracted patterns of data access, where the graph model defines inter-dependences of said applications. Also, the system 60 is operative to, for each of two or more of the applications, determine whether a first application of said two or more applications is in backlog. In addition, the system 60 is operative to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. Also, the system 60 is operative to when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

Returning to Fig. 4 a method for scaling of applications of a network analytics task performed in a distributed network, wherein the method is performed in an orchestration module 42, is now described. The method comprises the following actions.

Action S45: Creating a graph model of said applications based on the patterns of data access between applications, where the graph model define inter-dependences of said applications.

Action S46: For each of two or more of the applications, determining whether a first application of said two or more applications is in backlog.

Action S47: When the first application is in backlog, initiating upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model.

Action S48: When the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiating downscaling of the first application.

The present disclosure also comprises a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method as above, for scaling of applications of a network analytics task performed in a distributed network.

The present disclosure also comprises a computer-readable storage medium, having stored thereon a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method as above, for scaling of applications of a network analytics task performed in a distributed network.

The method may further comprise receiving said patterns of data access, from one or more instances of a data access module interconnecting applications, wherein the creating the graph model is based on said received patterns of data access.

The receiving of said patterns of data access may comprise receiving of said patterns extracted based on monitored data accesses.

Action S47 of initiating upscaling of the at least one second application, may further comprise assigning to said at least one second application, a vote for upscaling said at least one second application. Action S47 of initiating upscaling may further comprise enabling upscaling of the at least one second application based on one or more votes for upscaling, assigned to said at least one second application.

Action S48 of initiating downscaling of the first application may further comprise assigning to said first application a vote for downscaling said first application.

Action S48 of initiating downscaling may further comprise enabling downscaling of the first application based on one or more votes for downscaling, assigned to said first application.

This disclosure also comprises a network node 70; 80 that is capable of causing scaling of distributed applications of a network analytics task. The network node is connectable to a data access module adapted to monitor data accesses between said applications. The network node is configured to create a graph model of said applications based on the patterns of data access, where the graph model defines inter-dependences of said applications. The network node is also configured to, for each of two or more of the applications, determine whether a first application of said applications is in backlog. In addition, the network node is configured to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. The network node is also configured to, when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

The network node 70; 80 may further be configured to assign to said at least one second application, a vote for upscaling said at least one second application.

The network node 70; 80 may further be configured to enable upscaling of the at least one second application based on one or more votes for upscaling assigned to said at least one second application.

The network node 70; 80 may further be configured to assign to said first application a vote for downscaling said first application.

The network node 70; 80 may further be configured to enable downscaling of the first application based on one or more votes for downscaling assigned to said first application.

The network node 70; 80 may further be an orchestration module 42; 1 16; 312.

In an alternative way to describe the network node, Fig. 7 presents a network node 70 capable of causing scaling of applications of a network analytics task performed in a distributed network. The network node 70 comprises a creating unit 72, a determining unit 74, and an initiating unit 76. The creating unit 72 is configured to create a graph model of said applications based on patterns of data access, where the graph model defines inter-dependences of said applications. The determining unit 74 is configured to, for each of two or more of the applications, determine whether a first application of said two or more applications is in backlog. The initiating unit 76 is configured to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. The initiating unit 76 is also configured to, when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

The determining unit 74 of the network node 70 may be configured to assign to said at least one second application, a vote for upscaling said at least one second application.

The initiating unit 76 of the network node 70 may be configured to enable upscaling of the at least one second application based on one or more votes for upscaling assigned to said at least one second application.

The determining unit 74 of the network node 70 may be configured to assign to said first application, a vote for downscaling said first application.

The initiating unit 76 of the network node 70 may be configured to enable downscaling of the first application based on one or more votes for downscaling assigned to said first application.

Fig. 8 presents an alternative embodiment of the network node. The network node 80 is capable of causing scaling of applications of a network analytics task performed in a distributed network, comprises a processing circuit 82, a memory 84 and a communication interface 86, according to embodiments of this disclosure. The communication interface 86 may be configured to connect to a data access module, which patterns of data access may be received.

The memory 84 may contain instructions executable by said processing circuit 82, whereby the network node 80 may be operative to create a graph model of said applications based on the patterns of data access, where the graph model defines inter-dependences of said applications. Also, the network node 80 is operative to, for each of two or more of the applications, determine whether a first application of said two or more applications is in backlog. In addition, the network node 80 is operative to, when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model. Also, the network node 80 is operative to when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

The modules described above are functional units which may be implemented in software, firmware or any combination thereof. In one embodiment, the modules are implemented as a computer program running on a processor.

Embodiments of the present exemplary embodiments have the following advantages:

The amount of resources needed to run a cluster having distributed applications, are reduced for the reason that resources are allocated just at a level necessary to avoid backlog.

It may be further noted that the above described embodiments are only given as examples and should not be limiting to the present exemplary embodiments, since other solutions, uses, objectives, and functions are apparent within the scope of the embodiments as claimed in the accompanying patent claims.

Claims

A method for scaling of distributed applications of a network analytics task, the method being performed in an orchestration module, the method comprising:

creating (S45) a graph model of said applications based on patterns of data access between said applications, where the graph model defines inter-dependences of said applications;

for each of two or more of the applications, determining (S46) whether a first application of said two or more applications is in backlog;

- when the first application is in backlog, initiating (S47) upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model, and

- when the first application is neither in backlog nor causes backlog in any other application(s) neighboring to the first application along the graph model, initiating (S48) downscaling of the first application.

The method according to claim 1 , wherein initiating (S47) upscaling of the at least one second application, comprises assigning to said at least one second application, a vote for upscaling said at least one second application.

The method according to claim 1 , wherein initiating (S48) downscaling of the first application, comprises assigning () to said first application a vote for downscaling said first application.

The method according to claim 2, further comprising enabling upscaling, by the orchestration module, of the at least one second application based on one or more votes for upscaling assigned to said at least one second application.

The method according to claim 3, further comprising enabling downscaling of the first application based on one or more votes for downscaling assigned to said first application.

A method for scaling of applications of a network analytics task performed in a distributed network, the method being performed in a system and comprising:

monitoring (S43; S318; S324; S332), by a data access module, data accesses between said applications; extracting (S44; S336), by the data access module, patterns of data access based on the monitored data accesses;

creating (S45; S340), by an orchestration module, a graph model of said

applications based on the extracted patterns of data access, where the graph model define inter-dependences of said applications;

for each of two or more of the applications, determining (S46; S340), by the orchestration module, whether a first application of said two or more applications is in backlog;

- when the first application is in backlog, initiating (S47; S340), by the orchestration module, upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model; and

- when the first application is neither in backlog nor causes backlog in any

application(s) neighboring to the first application along the graph model, initiating (S48; S340), by the orchestration module, downscaling of the first application.

The method according to claim 6, wherein monitoring (S43), by the data access module, data accesses between said applications, comprises monitoring key-value operations, stream operations and/or file access operations, between said applications.

A network node (70; 80) capable of causing scaling of distributed applications of a network analytics task, the network node being connectable to a data access module adapted to monitor data accesses between said applications, the network node being configured to:

create a graph model of said applications based on the patterns of data access, where the graph model defines inter-dependences of said applications;

for each of two or more of the applications determine whether a first application of said applications is in backlog;

- when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model, and

- when the first application is neither in backlog nor causes backlog in any

application(s) neighboring to the first application along the graph model, initiate downscaling of the first application.

9. The network node (70; 80) according to claim 8, further being configured to assign () to said at least one second application, a vote for upscaling said at least one second application. 10. The network node (70; 80) according to claim 8, being configured to assign to said first application a vote for downscaling said first application.

The network node (70; 80) according to claim 9, further being configured to enable upscaling of the at least one second application based on one or more votes for upscaling assigned to said at least one second application.

12. The network node (70; 80) according to claim 10, further being configured to enable

downscaling of the first application based on one or more votes for downscaling assigned to said first application.

13. The network node (70; 80) according to any one of claims 8 to 12, wherein the network node is an orchestration module (42; 1 16; 312).

A system (40; 50; 60) capable of causing scaling of applications of a network analytics task performed in a distributed network, the system comprising:

a data access module (41 ; 1 14; 306) being adapted to interconnect the applications, and

an orchestration module (42; 1 16; 312) connectable to said data access module, wherein the data access module (41 ; 1 14; 306) is configured to:

- monitor data accesses between said applications, and

- extract patterns of data access based on the monitored data accesses; and wherein the orchestration module (42; 1 16; 312) is configured to:

- when the first application is in backlog, initiate upscaling of at least one second application that causes the backlog in the first application, where the at least one second application is a neighbor to the first application along the graph model; and when the first application is neither in backlog nor causes backlog in any application(s) neighboring to the first application along the graph model, initiate downscaling of the first application. 15. The system (40; 50; 60) according to claim 14, wherein the data access module () is

further configured to monitor data accesses between said applications, wherein the data accesses comprise key-value operations, stream operations and/or file access operations, between said applications. 16. A computer program, comprising instructions which, when executed on at least one

processor, cause the at least one processor to carry out the method according to any one of claims 1 to 5.

17. A computer program, comprising instructions which, when executed on at least one

processor, cause the at least one processor to carry out the method according to claim or

7.

18. A computer-readable storage medium, having stored thereon a computer program,

comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to any one of claims 1 to 5.

19. A computer-readable storage medium, having stored thereon a computer program,

comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to claim 6 or 7.