WO2019006649A1

WO2019006649A1 - Method and device for network function capacity and scaling management

Info

Publication number: WO2019006649A1
Application number: PCT/CN2017/091603
Authority: WO
Inventors: Huoming DONG; Wei Huang
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2017-07-04
Filing date: 2017-07-04
Publication date: 2019-01-10

Abstract

A method and device for network function capacity and scaling management. A predictive based approach is provided by using data insights through machine learning techniques, instead of the current reactive based approach. Therefore, scaling needs may be predicted by machine learning in advance, and thus scaling decision and action can be made at just right time with just right resources.

Description

METHOD AND DEVICE FOR NETWORK FUNCTION CAPACITY AND SCALING MANAGEMENT

TECHNICAL FIELD

Embodiments of the present disclosure generally relate to the field of communications, and more particularly, to a method and device for network function capacity and scaling management.

BACKGROUND

This section introduces aspects that may facilitate better understanding of the present disclosure. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.

In some communication networks, a number of nodes or network functions (NFs) are connected to provide services. These NFs may communicate with surrounding peers to process and transport control signaling and/or user plane data through standardized communication and network protocols.

Nowadays, a major change brought by network function virtualization (NFV) is that virtualization enables additional dynamic methods rather than just static methods to construct and manage network function graphs or sets combining these network functions. More and more NFs may be virtualized and may be called virtualized network functions (VNFs) which may be deployed in cloud environments. One of advantages using VNF is to have “elastic nature” of resources, such as to scale virtual functions as required and to cater demands.

In order to fulfil different business needs and capacity requirements, scalability of a native NF or VNF is a very important and key feature for service providers. In the cloud environment, scaling in/out is considered as a basic feature and is deemed as an enabler for quickly modifying capacity of products/services on demand in order to meet increasingly business needs.

Some methods and devices have been proposed for network function capacity and scaling management. For example, WO2017/020298A1 invents a method to automatically manage peer connections in cloud, which could reduce manual work for operators and avoid human faults.

Besides human operation cost issue, there is still fundamental technical problem in scaling area, that is how to decide if the scaling (such as both at VM level and VNF level) is needed and what is the right time to perform the scaling if needed. There is no perfect solution currently.

Current methods of VNF capacity scaling are only reactive based approaches, e.g., depending on threshold alarms and key performance indicators (KPIs) . The reactive based approach has some disadvantages, such as has a time lag and is not real-time solution, because when a scaling indicator is triggered and sent to a VNF manager and/or when a scaling operation is started by the VNF manager, possibly the indicated system situation (such as overloaded or underloaded) has sustained for a long time or has been over already.

This time lag could heavily impact on system characteristics (such as overloaded) or system resources (such as underloaded) , and this scenario could happen frequently in a dynamic cloud environment. The time delay could be produced due to analysis and synthesis of different level system alarms from network and processing time of getting system KPIs (such as counters and statistics) from network. For example, there are 5-15 minutes in most of the systems.

Sometimes, in order to mitigate actual impact of the time lag, some preventive measures are adopted when handling VNF scaling, e.g. VNF scaling out is performed when the system load reaches 70％instead of the real overload indication like 90％. This might help in one side, but on the other hand, this also leads to VNF overprovisioning and resource/energy inefficiency.

SUMMARY

In order to solve at least part of the above problems, methods, apparatus, devices and computer programs are provided in the present disclosure. It may be appreciated that embodiments of the present disclosure are not limited to VNF, but could be more widely applied to any application scenario where similar problems exist.

Various embodiments of the present disclosure mainly aim at providing methods, devices and computer programs for network function capacity and scaling management. Other features and advantages of embodiments of the present disclosure will also be understood from the following description of specific embodiments when reading in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of embodiments of the present disclosure.

In general, embodiments of the present disclosure provide a solution for network function capacity and scaling management.

In a first aspect, there is provided a method for network function capacity and scaling management, the method includes: retrieving characteristics and configuration data from a scaling trigger handler； obtaining capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning； and sending the capacity prediction information of the next time period to a scaling decision maker.

In one embodiment, the method is implemented in a virtualized network function capacity prediction engine which enables a predictive based scaling instead of a reactive based scaling.

In one embodiment, the scaling trigger handler is configured or implemented in an element management system or a network management system, the scaling decision maker is configured or implemented in a virtualized network function manager； the virtualized network function capacity prediction engine is configured or implemented in the virtualized network function manager or the element management system or the network management system.

In one embodiment, the characteristics and configuration data comprises one or more of the following information: alarms, key performance indicators, system configurations, system running data, system resources； the time period is configurable and comprises one or more of the following time intervals: minutes, hours, days, weeks, months and years.

In one embodiment, the virtualized network function capacity prediction engine regularly retrieves the characteristics and configuration data from the scaling trigger handler； and the scaling trigger handler regularly retrieves the characteristics and configuration data from the virtualized network functions.

In one embodiment, the virtualized network function capacity prediction engine receives the characteristics and configuration data reported by the scaling trigger handler； and the scaling trigger handler receives the characteristics and configuration data reported by virtualized network functions.

In one embodiment, the method comprises: training a machine learning model by using the characteristics and configuration data of one or more past time periods to discover inherent patterns from the characteristics and configuration data and to build an analytics model； predicting one or more capacities of the next time period using the built analytics model； and concluding resource needs of the next time period by comparing the predicted capacities and allocated resources of the next time period.

In one embodiment, techniques applying the machine learning comprises applying supervised learning models or methods； one or more of following models or methods are used for the machine learning: regression analysis, supported vector machine, decision trees, random forest, artificial neural networks.

In one embodiment, a fused model is used by combining backpropagation neural networks, radial basis functions and generalized regression neural networks.

In one embodiment, the method further comprises: synthesizing multiple predicted capacities with different weights, and/or, combining multiple input data and/or output data with the different weights.

In one embodiment, the different weights are calculated based on the following rules: the more aged the data is, the smaller value the weight has； and/or； the more similar the time period is, the bigger value the weight has.

In one embodiment, the method further comprises: using actual capacity data of a current time period as the latest training data of the machine learning when the current time period is over.

In one embodiment, the method further comprises: discarding the data of the most aged period； and revising a prediction model of the machine learning by using the latest training data.

In a second aspect, there is provided a system for network function capacity and scaling management, comprises a scaling trigger handler and a scaling decision maker； the system further comprises: a capacity prediction engine configured to perform a method according the first aspect.

In one embodiment, the system further comprises an element management system or a network management system, a virtualized network function manager, a virtualized infrastructure manager and a network function virtualization infrastructure.

In one embodiment, the scaling trigger handler is configured or implemented in an element management system or a network management system, the scaling decision maker is configured or implemented in a virtualized network function manager； the capacity prediction engine is used for virtualized network function capacity and scaling management and is configured or implemented in the virtualized network function manager or the element management system or the network management system.

In a third aspect, there is provided a device, comprising a processor and a memory, wherein the memory containing instructions executable by the processor whereby the device is operative to perform a method according to the first aspect.

According to various embodiments of the present disclosure, a predictive based approach is provided by using data insights through machine learning techniques, instead of the current reactive based approach. Therefore, scaling needs may be predicted by machine learning in advance, and thus scaling decision and action can be made at just right time with just right resources.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features, and benefits of various embodiments of the disclosure will become more fully apparent, by way of example, from the following detailed description with reference to the accompanying drawings, in which like reference numerals or letters are used to designate like or equivalent elements. The drawings are illustrated for facilitating better understanding of the embodiments of the disclosure and not necessarily drawn to scale, in which:

Fig. 1 is a schematic diagram which shows an example for VNF scaling；

Fig. 2 is a schematic diagram which shows an example of reactive based VNF scaling；

Fig. 3 is a flowchart which shows a method 300 for network function capacity and scaling management in accordance with an embodiment of the present disclosure；

Fig. 4 is a schematic diagram which shows an example of predictive based VNF scaling in accordance with an embodiment of the present disclosure；

Fig. 5 is a schematic diagram which shows a method of machine learning in accordance with an embodiment of the present disclosure；

Fig. 6 is a schematic diagram which shows an example of VNF resource needs prediction results showing deviations of predicted needs and real needs in accordance with an embodiment of the present disclosure；

Fig. 7 is a schematic diagram which shows an example of prediction accuracy and data size in accordance with an embodiment of the present disclosure；

Fig. 8 shows a block diagram of an apparatus 800 for network function capacity and scaling management in accordance with an embodiment of the present disclosure；

Fig. 9 is a simplified block diagram of a device that is suitable for implementing embodiments of the present disclosure.

DETAILED DESCRIPTION

The present disclosure will now be discussed with reference to several example embodiments. It should be understood that these embodiments are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

References in the specification to “one embodiment, ” “an embodiment, ” “an example embodiment, ” and the like indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

It shall be understood that although the terms “first” and “second” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed terms.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be liming of example embodiments. As used herein, the singular forms “a” , “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” , “comprising” , “has” , “having” , “includes” and/or “including” , when used herein, specify the presence of stated features, elements, and/or components etc., but do not preclude the presence or addition of one or more other features, elements, components and/or combinations thereof.

In the following description and claims, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skills in the art to which this disclosure belongs.

The network device may include processing circuitry, device readable medium, interface, user interface equipment, auxiliary equipment, power source, power delivery circuitry, and antenna. These components are depicted as single boxes located within a single larger box, and in some cases, contain additional boxes therein.

In practice however, the network device may include multiple different physical components that make up a single illustrated component (e.g., interface includes ports or terminals for coupling wires for a wired connection and radio front end circuitry for a wireless connection) . As another example, network device may be a virtual network node. Similarly, network node may be composed of multiple physically separate components (e.g., a NodeB component and a RNC component, a BTS component and a BSC component, etc. ) , which may each have their own respective components.

In certain scenarios in which network device includes multiple separate components, one or more of the separate components may be shared among several network nodes. For example, a single RNC may control multiple NodeB’s . In such a scenario, each unique NodeB and RNC pair, may in some instances be considered a single separate network node. In some embodiments, network node may be configured to support multiple radio access technologies (RATs) . In such embodiments, some components may be duplicated (e.g., separate device readable medium for the different RATs) and some components may be reused (e.g., the same antenna may be shared by the RATs) .

For example, in an IP multimedia subsystem (IMS) network, session border controller (SBC) and media gateway (MGw) as gateway nodes locate at network borders together to provide ability of correlating signaling and media streams (such as audio and video) that pass the network borders； hence it should be provided that a comprehensive suite of functions required to access and interconnect IMS core domains and external IP networks with preserved security, privacy, and quality of service.

Fig. 1 is a schematic diagram which shows an example for VNF scaling. As shown in Fig. 1, H. 248 protocol may be used and there may be some running instances, such as SBC 1, SBC 2, ……； and MGw 1, MGw 2, ……., at virtual machine (VM) level.

In a case for scaling out, information on the scaled SBC instance (for example, including IP addresses) needs to be dispatched manually or automatically to all peer MGw (s) , and therefore the MGw (s) can initiate new connections to the new scaled SBC instance. In another case for scaling in, it has similar procedures.

Fig. 2 is a schematic diagram which shows an example of reactive based VNF scaling. As shown in Fig. 2, there are some modules or components, such as an element management system (EMS) or a network management system (NMS) , a virtualized network function manager (VFN manager) , a virtualized infrastructure manager (VIM) and a network function virtualization infrastructure (NFVI) . Some VNFs may be provided for a plurality of resources, such as memory, central processing unit (CPU) , bandwidth, I/O, disk, and so on.

As shown in Fig. 2, a scaling trigger handler (STH) may be configured or implemented in the EMS/NMS, and a scaling decision maker (SDM) may be configured or implemented in the VNF manager.

For example, some blocks (or steps) may be performed for the VNF scaling.

As shown in Fig. 2, VNF may report alarms and KPIs to EMS/NMS during a running period, at block 201. For example, a report will be generated periodically, such as 15 minutes in many of systems which is hard to be shorten as considering impact of characteristics and performance of the whole network.

The STH included in EMS/NMS may further analyze and synthesize the scaling related alarms and/or KPIs, and decide if a request for scaling should be triggered according to VNF business/application level knowledge. If a request for scaling should be triggered, the STH then sends the request to the VNF manager for further handling, at block 202.

After receiving the request, the SDM included in the VNF manager may analyze the received request, together with other relevant VNF and system data, e.g. VNF life cycle management. The SDM may finally decide if a scaling action (for example, scaling in or scaling out) shall be performed on the VNF.

If a scaling decision is made, the VNF manager may notify the VNF to prepare the scaling, at block 203. For example, the VNF may be notified that a newly instance will be added later if scaling out is decided, or ongoing traffic or sessions will be released later if scaling in is decided.

After receiving an acknowledge from the VNF, the VNF manager may notify VIM to start scaling, at block 204. For example, the VIM may be notified to add VM resources or reduce VM resources. The VIM may notify the NFVI at block 205.

The NFVI may perform VM instance (s) creation or removal and the VIM may receive an acknowledge from the NFVI. In addition, the NFVI may notify the VNF a new status of VM resources, at block 206. The VNF may conclude the scaling and send the result of scaling to the VNF manager, at block 207. The VNF manager may update system data internally.

However, the reactive based approach has time lag and is not real-time solution, thus it could heavily impact on system characteristics and performance (such as overloaded) and/or waste system resources (such as underloaded) in a dynamic cloud environment. In addition, some preventive measures adopted may cause overprotection. Therefore, the reactive based approach may lead to system characteristics decrease and/or resource/energy inefficiency.

Furthermore, triggering and decision for scaling are not easy to be made, as there are too many relevant alarms and KPIs at different system levels which need to be thoroughly analyzed and synthesized. In particular, a lot of cloud parameters and uncertainties are introduced and need to be measured, which needs much efforts and makes that it is almost not possible to perform internal accurate measurements. All of those may lead to a wrong scaling decision by the reactive based approach.

On the other hand, machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from dataset, machine learning allows computers to find hidden patterns without being explicitly programmed. It’s a subfield of artificial intelligence (AI) . Machine learning has been widely applied in many areas.

In this disclosure, a predictive based approach is provided for scaling and capacity management (such as for VNF) by using machine learning techniques. The solution can be applied in any similar cloud scaling scenarios on both VM level and VNF level in networks.

The solution in this disclosure can also be applied/extended to the whole network level scaling, including radio/access network, core network, service network, etc., e.g. based on historical time period data of radio network characteristics, the machine learning based approach could provide predictive based core network scaling in the future time period through learning/training.

First aspect of embodiments

A method for network function capacity and scaling management is provided.

Fig. 3 is a flowchart which shows a method 300 for network function capacity and scaling management in accordance with an embodiment of the present disclosure, and illustrates the method for network function capacity and scaling management by taking a capacity prediction engine as an example.

As shown in Fig. 3, the method 300 includes retrieving, by a capacity prediction engine, characteristics and configuration data from a scaling trigger handler (STH) , at block 301； obtaining, by the capacity prediction engine, capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning, at block 302； and sending, by the capacity prediction engine, the capacity prediction information of the next time period to a scaling decision maker (SDM) .

In an embodiment, the method may be implemented in a VNF capacity prediction engine (VCPE) which enables a predictive based scaling instead of a reactive based scaling. For example, the STH may be configured or implemented in an element management system (EMS) or a network management system (NMS) , the SDM may be configured or implemented in a virtualized network function manager (VNF manager) ； the VCPE may be configured or implemented in the VNF manager or the EMS or the NMS.

In an embodiment, the characteristics and configuration data may include one or more of the following information: alarms, key performance indicators, system configurations, system running data, system resources； however, it is not limited in this disclosure.

In an embodiment, the time period is configurable and may include one or more of the following time intervals: minutes, hours, days, weeks, months and years. However, it is not limited in this disclosure. For example, other time interval (or time unit) may be adopted according to actual scenarios.

Next, a virtualized network function capacity prediction engine (VCPE) will be illustrated as an example.

Fig. 4 is a schematic diagram which shows an example of predictive based VNF scaling in accordance with an embodiment of the present disclosure. As shown in Fig. 4, there are some modules or components, such as an EMS/NMS, a VFN manager, a VIM and a NFVI. Some VNFs may be provided for a plurality of resources, such as memory, central processing unit (CPU) , bandwidth, I/O, disk, and so on.

As shown in Fig. 4, a STH may be configured or implemented in the EMS/NMS, and a SDM may be configured or implemented in the VNF manager. Furthermore, a VCPE may be configured or implemented in the VNF manager. However, it is not limited thereto. For example, the VCPE may also be configured or implemented in the EMS/NMS.

For example, some blocks (or steps) may be performed for the VNF scaling.

As shown in Fig. 4, VNF may report alarms and/or KPIs to EMS/NMS during a running period, at block 401. For example, a report will be generated periodically, such as 15 minutes in many of systems which is hard to be shorten as considering impact of characteristics and performance of the whole network.

As shown in Fig. 4, The STH included in EMS/NMS may regularly retrieve the characteristics and configuration data from the VNF, at block 4021. For example, STH retrieves additional data required by the VCPE from VNF except that is reported and received from VNF already in block 401.

As shown in Fig. 4, the VCPE may receive the characteristics and configuration data reported by the STH, at block 4022； and may also regularly retrieve the characteristics and configuration data from the STH, at clock 4023.

For example, VCPE retrieves the data from STH regularly, e.g. per day time period, including alarms, KPIs, system configurations, system running data (resources) , etc. This data will be used as training (labelled) data for supervised machine learning by VCPE.

In this disclosure, the VCPE may obtain capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning.

For example, when the VCPE get the data (characteristics and/or configuration data) , it will firstly train a machine learning model, and after training it will perform future VNF capacity prediction (such as resource consumption prediction) for next time period. Then, the VCPE may compare the future system resource needs and the allocated resources at that time, to decide when the scaling is needed by pre-configured scaling policy, e.g. scaling out is performed if exceeding over 90％or scaling in is performed if dropping under 50％, during the next time period. The detail of the capacity prediction will be illustrated later.

As shown in Fig. 4, the VCPE may send the capacity prediction information of the next time period to SDM, at block 4024.

For example, the VCPE may send the scaling prediction data of the next time period to the SDM with time indications, thus SDM can plan the VM level or VNF level scaling in advance.

After receiving capacity prediction information of the next time period, the SDM included in the VNF manager may analyze the information and may finally decide if a scaling action (for example, scaling in or scaling out) shall be performed on the VNF.

If a scaling decision is made, the VNF manager may notify the VNF to prepare the scaling, at block 403. For example, the VNF may be notified that a newly instance will be added later if scaling out is decided, or ongoing traffic or sessions will be released later if scaling in is decided.

After receiving an acknowledge from the VNF, the VNF manager may notify VIM to start scaling, at block 404. For example, the VIM may be notified to add VM resources or reduce VM resources. The VIM may notify the NFVI at block 405.

The NFVI may perform VM instance (s) creation or removal and the VIM may receive an acknowledge from the NFVI. In addition, the NFVI may notify the VNF a new status of VM resources, at block 406. The VNF may conclude the scaling and send the result of scaling to the VNF manager, at block 407. The VNF manager may update system data internally.

Therefore, a predictive based approach is provided by using data insights through machine learning techniques, instead of the current reactive based approach. VNF scaling needs may be predicted by machine learning in advance, and thus VNF scaling decision and action can be made at just right time with just right resources.

It should be appreciated that Fig. 4 is only an example of the disclosure, but it is not limited thereto. For example, the order of operations at blocks may be adjusted and/or some blocks may be omitted. Moreover, some blocks not shown in Fig. 4 may be added.

The detail of the capacity prediction will be illustrated in the following examples.

Fig. 5 is a schematic diagram which shows a method of machine learning in accordance with an embodiment of the present disclosure. As shown in Fig. 5, the method 500 may include: training a machine learning model by using the characteristics and configuration data of one or more past time periods, at block 501. This block may be referred to as a training phase.

In the training phase, inherent patterns may be discovered from the characteristics and configuration data and an analytics model may be built. Furthermore, techniques applying the machine learning may include applying supervised learning models or methods； and one or more of following models or methods may be used for the machine learning: regression analysis (linear and/or non-linear) , supported vector machine, decision trees, random forest, Bayesian statistics, artificial neural networks (including deep neural networks) . However, it is not limited in this disclosure.

As shown in Fig. 5, the method 500 may include: predicting one or more capacities of the next time period using the built analytics model, at block 502. This block may be referred to as a predicting phase. In the predicting phase, the future VNF capacity prediction of next time period may be performed by using the learnt patterns stored in the model automatically.

As shown in Fig. 5, the method 500 may include: concluding resource needs of the next time period by comparing the predicted capacities and allocated resources of the next time period, at block 503. This block may be referred to as a synthesizing phase.

For example, by comparing the predicted future capacity needs and the allocated resource at a certain time, VCPE can conclude the VNF resource needs and/or scaling plan at a certain time, and then provides this data to SDM, which can make scaling plan in advance.

In an embodiment, the time period (T1) for how long the subsequent VNF future capacity prediction is done may be configurable at VCPE. It could be hour (s) , day (s) , week (s) , month (s) , and so on. However, it is not limited and may be depending on different VNFs and deployments.

In an embodiment, VCPE retrieves data from EMS/NMS, where the data is related to runtime VNF capacity, including KPIs (there should be no scaling related alarms if VCPE works well) , system configurations and runtime resource data. All data at a certain time actually represents one certain traffic model of system/network and corresponding needed system resource consumption. VCPE will then filter and shape the data as training data for machine learning, for example, removal of unused data, data consolidation per time granularity, data representation (e.g. numeralization) , data standardization and normalization, etc.

In an embodiment, the time granularity (T0) could be seconds, minutes, hours, days and so on, and is configurable at VCPE. This also corresponds to the time granularity of the future VNF capacity prediction during next time period. Usually it’s not necessary to be too small for practical use at scaling and also not impact on system characteristics and performance, but T0 should be smaller than T1.

In an embodiment, the method may further include: using actual capacity data of a current time period as the latest training data of the machine learning when the current time period is over； discarding the data of the most aged period； and revising a prediction model of the machine learning by using the latest training data.

For example, for the training phase, VCPE may use the data of previous N time periods, where N is configurable at VCPE. When the current time period is over, the actual VNF capacity data of current time period will be used as latest new training data, and also used to revise/update the prediction model, e.g. adjust model accordingly to fit better. By learning from the real consequences of model predictions, the system will be able to make better prediction in the future. The data of most aged time period will be phased out and replaced by the latest data. VCPE may predict VNF capacity of next time period based on each training dataset (N in total) from the previous N time periods.

In an embodiment, the method may further include: synthesizing multiple predicted capacities with different weights, and/or, combining multiple input data and/or output data with the different weights. The different weights may be calculated based on the following rules: the more aged the data is, the smaller value the weight has； and/or； the more similar the time period is, the bigger value the weight has.

For example, for the predicting phase, the final prediction will be the synthesis of all N predictions with different weights. The weights are calculated based on below rules: such as, the more aged data, the smaller weight value, and/or, the more similar the time period (traffic model) is, the bigger weight value it has, e.g. traffic model of Friday evenings can be considered similar and traffic model of workday can be seen different from that of weekend.

For another example, for the model training on time-series data, in order to better utilize the historical data, several historical input data, e.g. n input data instead of 1 input data, can be combined to generate the model input. So, each training sample can contain more historic knowledge, e.g. one training sample data as follows,

i (n) , i (n-1) , …, i (1) → o (1)

The historical data will have different weights (principles as described above) to contribute combining the input sample.

Similarly, another enhancement is that the model could predict several outputs over several future time periods, so the current system behavior and model prediction will have impact on a longer time period instead of only one future time period. The weight setting principles as described above shall be utilized to make use all outputs to synthesize the final output for one specific future time period.

It should be appreciated that the above embodiments are only examples of the disclosure, but it is not limited thereto. Therefore, a predictive based approach is provided by using data insights through machine learning techniques, instead of the current reactive based approach. VNF scaling needs may be predicted by machine learning in advance, and thus VNF scaling decision and action can be made at just right time with just right resources.

Next, SBC (Session Border Controller) and MGw (Media Gateway) nodes in IMS networks are used as an example to depict the new method and the results. Where, H. 248 protocol is used to communicate with each other. Here SBC may be P-CSCF, IMS-ALG on UNI/PNI interface, or IBCF on NNI interface. MGW may be C-BGF (IMS Access Gateway) on UNI/PNI interface and I-BGF (TrGW) on the NNI interface.

The input data (traffic model data) and output data (prediction data) are related to VNF capacity. The input data could vary in different systems, but output data is similar, related to VNF capacity (resource) data. Take SBC and MGw as examples.

For SBC, the input data may include: no. of registered users, session initiation protocol (SIP) call rate, transport protocols (UDP, TCP, TLS) , IP version (IPv4 or IPv6) , signaling interface (Gm, Rq, Rf, e2) , call hold time, call answer time, hypervisor type, CPU frequency, hyper-threading switch, etc. The output data may include: memory usage (MB) , CPU usage (％) , Bandwidth usage.

For MGw, the input data may include: subscribers, traffic per subscriber, MHT (mean hold time) , total sessions, traffic type (Access/Core/MSS/MRF/TDM) , audio codec (AMR-WB/AMR-NB/G. 711/G. 722/G. 729) , audio transcoding, hypervisor type, vSwitch variant, etc. The output data may include: no. of vCPUs, no. of VMs, memory of VM, memory of VNF, disk size, bandwidth/packet rates, etc.

In this example, in order to verify feasibility and validity, one typical machine learning model: artificial neural networks (ANN) , is used to evaluate the approach. ANN is a family of models inspired by biological neural networks, and has been widely used in many areas, including data regression and prediction.

The example model (it may be referred to as AlphaCANDI in this disclosure) is a fused model by using information fusion technology to combine three feedforward neural networks: BP (Backpropagation) neural networks, RBF (Radial Basis Functions) neural networks and GRNN (Generalized Regression Neural Networks) .

Furthermore, standard statistics error models: MAE (Mean Absolute Error) and RMSE (Root of Mean Square Error) , are used to evaluate the model performance. Cross-validation is performed to evaluate the overall prediction accuracy and robustness of the model on different training dataset and testing dataset.

In an embodiment, BP network is a typical feedforward neural networks model, which uses BP (backpropagation) learning algorithm during training. Backpropagation is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent.

In an embodiment, RBF network is an artificial neural network that uses Radial Basis Functions (RBF) as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Unlike BP, RBF network does local approximation of non-linear mapping instead of global approximation, thus it only needs few training samples. It features fast learning (convergence) and can achieve better accuracy.

In an embodiment, Generalized Regression Neural Network (GRNN) is a variant of RBF network. It has a radial basis layer and a special linear layer. Unlike the standard RBF, GRNN is distinct in the calculation of the final outputs, i.e. a weighted sum of the radial basis functions and the training values by using normalization and dot product operation, and can be thought of as a normalized RBF network.

It should be appreciated that the fused model is only an example of the disclosure. However, it is not limited thereto, for example, other models may also be adopted according to actual scenarios.

For example, the relevant work has been done on both native product (nSBC) and virtualized products (vSBC and vMGw) to verify the approach with different size datasets. The datasets are divided into training data for learning and testing data for prediction, by 80％: 20％or 85％: 15％randomly during each run. Totally many runs are performed to check the overall performance of AlphaCANDI.

Fig. 6 is a schematic diagram which shows an example of VNF resource needs prediction results showing deviations of predicted needs and real needs in accordance with an embodiment of the present disclosure. Fig. 7 is a schematic diagram which shows an example of prediction accuracy and data size in accordance with an embodiment of the present disclosure.

As shown in Fig. 6 and Fig. 7, the prediction accuracy of model AlphaCANDI is around 90％with small-sized and above 95％with medium-sized dataset respectively. This is good in VNF scaling scenario considering there is still resource capacity buffer reserved in practice.

Furthermore, as compared to native environments, cloud/virtualized environments are more complex but doesn’t bring any additional complexity and difficulty to machine learning based model while it does to today’s approaches.

Furthermore, the more data, the better performance. In practice, the accuracy (performance) and the data size (cost) is needed to balance. In addition, the training time may be at sec/min level, while the prediction time may be at ms level. This can fit in VNF scaling prediction.

As can be seen from the above embodiments, a predictive based approach is provided and time lag issue may be decreased or avoided. Customer/operator would be able to better manage VNF capacity planning and deployments, thus VNF resource usage is just at right time and no waste on system resource/energy.

Furthermore, machine learning based model may simplify complexity of cloud environments and traffic scenarios (complexity and scenario agnostic) , and provide good prediction accuracy and robustness of VNF resource needs in the future time period. This avoids incorrect VNF scaling decisions and operations, i.e. no scaling if actually needed or doing scaling if actually not needed. Therefore, it may reduce user complains and increase resource efficiency, and improve overall system characteristics and network performance.

Furthermore, it’s a generic mechanism which can be applied in any similar VNF capacity and scaling management scenarios. And it can also be applied/extended to the whole network level scaling, including radio/access network, core network, service network, etc., e.g. based on historical time period data of radio network characteristics, the machine learning based approach could provide predictive based core network scaling in the future time period through learning/training. Finally, an automatic close-loop solution of VNF capacity and scaling management without human interventions could be provided to customers.

Second aspect of embodiments

An apparatus for network function capacity and scaling management is provided in an embodiment. The apparatus may be configured in a network device, and the same contents as those in the first aspect of embodiments are omitted.

Fig. 8 shows a block diagram of an apparatus 800 for network function capacity and scaling management in accordance with an embodiment of the present disclosure. As shown in Fig. 8, the apparatus 800 includes: a retrieving unit 801 configured to retrieve characteristics and configuration data from a scaling trigger handler； an obtaining unit 802 configured to obtain capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning； and a sending unit 803 configured to send the capacity prediction information of the next time period to a scaling decision maker.

In an embodiment, the apparatus 800 may be implemented in a virtualized network function capacity prediction engine which enables a predictive based scaling instead of a reactive based scaling.

In an embodiment, the scaling trigger handler may be configured or implemented in an element management system or a network management system； the scaling decision maker may be configured or implemented in a virtualized network function manager； the virtualized network function capacity prediction engine may be configured or implemented in the virtualized network function manager or the element management system or the network management system.

In an embodiment, the characteristics and configuration data may include one or more of the following information: alarms, key performance indicators, system configurations, system running data, system resources； the time period is configurable and may include one or more of the following time intervals: minutes, hours, days, weeks, months and years. However, it is not limited thereto.

In an embodiment, the virtualized network function capacity prediction engine may regularly retrieve the characteristics and configuration data from the scaling trigger handler； and the scaling trigger handler may regularly retrieve the characteristics and configuration data from the virtualized network functions.

The virtualized network function capacity prediction engine may receive the characteristics and configuration data reported by the scaling trigger handler； and the scaling trigger handler may receive the characteristics and configuration data reported by virtualized network functions.

In an embodiment, the obtaining unit 802 may be configured to train a machine learning model by using the characteristics and configuration data of one or more past time periods to discover inherent patterns from the characteristics and configuration data and to build an analytics model； predict one or more capacities of the next time period using the built analytics model； and conclude resource needs of the next time period by comparing the predicted capacities and allocated resources of the next time period.

In an embodiment, techniques applying the machine learning may include applying supervised learning models or methods； one or more of following models or methods are used for the machine learning: regression analysis, supported vector machine, decision trees, random forest, artificial neural networks. However, it is not limited thereto.

In an embodiment, a fused model may be used by combining backpropagation neural networks, radial basis functions and generalized regression neural networks. However, it is not limited thereto.

In an embodiment, multiple predicted capacities may be synthesized with different weights, and/or, multiple input data and/or output data may be combined with the different weights. The different weights may be calculated based on the following rules: the more aged the data is, the smaller value the weight has； and/or； the more similar the time period is, the bigger value the weight has. However, it is not limited thereto.

In an embodiment, actual capacity data of a current time period may be used as the latest training data of the machine learning when the current time period is over. Alternatively, the data of the most aged period may be discarded； and a prediction model of the machine learning may be revised by using the latest training data.

It should be appreciated that components included in the apparatus 800 correspond to the operations of the method 300 or 400. Therefore, all operations and features described above with reference to Fig. 3 or 4 are likewise applicable to the components included in the apparatus 800 and have similar effects. For the purpose of simplification, the details will be omitted.

It should be appreciated that the components included in the apparatus 800 may be implemented in various manners, including software, hardware, firmware, or any combination thereof.

In an embodiment, one or more units may be implemented using software and/or firmware, for example, machine-executable instructions stored on the storage medium. In addition to or instead of machine-executable instructions, parts or all of the components included in the apparatus 800 may be implemented, at least in part, by one or more hardware logic components.

For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs) , Application-specific Integrated Circuits (ASICs) , Application-specific Standard Products (ASSPs) , System-on-a-chip systems (SOCs) , Complex Programmable Logic Devices (CPLDs) , and the like.

The apparatus 800 may be a part of a device. But it is not limited thereto, for example, the apparatus 800 may be the network device, other parts of the network device, such as transmitter and receiver, are omitted in the Fig. 8.

As can be seen from the above embodiments, a predictive based approach is provided by using data insights through machine learning techniques, instead of the current reactive based approach. Scaling needs may be predicted by machine learning in advance, and thus scaling decision and action can be made at just right time with just right resources.

Third aspect of embodiments

A system is provided, the system includes a network device configured to perform a method for network function capacity and scaling management according to the first aspect of embodiments.

For example, as shown in Fig. 4, the system may include a scaling trigger handler, a scaling decision maker, and a capacity prediction engine configured to perform a method according to the first aspect of embodiments. The system may further include an element management system or a network management system, a virtualized network function manager, a virtualized infrastructure manager and a network function virtualization infrastructure.

The scaling trigger handler is configured or implemented in an element management system or a network management system, the scaling decision maker is configured or implemented in a virtualized network function manager； the capacity prediction engine is used for virtualized network function capacity and scaling management and is configured or implemented in the virtualized network function manager or the element management system or the network management system.

A device is provided in an embodiment, and the same contents as those in the first aspect and the second aspect of embodiments are omitted.

Fig. 9 shows a simplified block diagram of a device 900 that is suitable for implementing embodiments of the present disclosure. It would be appreciated that the device 900 may be implemented as at least a part of, for example, the network device.

As shown, the device 900 includes a communicating means 930 and a processing means 950. The processing means 950 includes a data processor (DP) 910, a memory (MEM) 920 coupled to the DP 910. The communicating means 930 is coupled to the DP 910 in the processing means 950. The MEM 920 stores a program (PROG) 940. The communicating means 930 is for communications with other devices, which may be implemented as a transceiver for transmitting/receiving signals.

In some other embodiments where the device 900 acts as a network device. For example, the memory 920 stores a plurality of instructions； and the processor 910 coupled to the memory 920 and configured to execute the instructions to: retrieve characteristics and configuration data from a scaling trigger handler； obtain capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning； and send the capacity prediction information of the next time period to a scaling decision maker.

The PROG 940 is assumed to include program instructions that, when executed by the associated DP 910, enable the device 900 to operate in accordance with the embodiments of the present disclosure, as discussed herein with the methods 300 or 400. The embodiments herein may be implemented by computer software executable by the DP 910 of the device 900, or by hardware, or by a combination of software and hardware. A combination of the data processor 910 and MEM 920 may form processing means 950 adapted to implement various embodiments of the present disclosure.

The MEM 920 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory, as non-limiting examples. While only one MEM is shown in the device 900, there may be several physically distinct memory modules in the device 900. The DP 910 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multicore processor architecture, as non-limiting examples. The device 900 may have multiple processors, such as an application specific integrated circuit chip that is slaved in time to a clock which synchronizes the main processor.

Generally, various embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing devices. While various aspects of embodiments of the present disclosure are illustrated and described as block diagrams, flowcharts, or using some other pictorial representation, it will be appreciated that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

By way of example, embodiments of the present disclosure can be described in the general context of machine-executable instructions, such as those included in program modules, being executed in a device on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, or the like that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device, program modules may be located in both local and remote storage media.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

The above program code may be embodied on a machine-readable medium, which may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.

More specific examples of the machine-readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random-access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the context of this disclosure, the device may be implemented in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. The device may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the present disclosure, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable sub-combination.

Although the present disclosure has been described in language specific to structural features and/or methodological acts, it is to be understood that the present disclosure defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

A method for network function capacity and scaling management, comprising:

retrieving characteristics and configuration data from a scaling trigger handler；

obtaining capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning；

sending the capacity prediction information of the next time period to a scaling decision maker.
The method according to claim 1, wherein the method is implemented in a virtualized network function capacity prediction engine which enables a predictive based scaling instead of a reactive based scaling.
The method according to claim 2, wherein the scaling trigger handler is configured or implemented in an element management system or a network management system, the scaling decision maker is configured or implemented in a virtualized network function manager；

the virtualized network function capacity prediction engine is configured or implemented in the virtualized network function manager or the element management system or the network management system.
The method according to claim 1, wherein the characteristics and configuration data comprises one or more of the following information: alarms, key performance indicators, system configurations, system running data, system resources；

the time period is configurable and comprises one or more of the following time intervals: minutes, hours, days, weeks, months and years.
The method according to claim 2, wherein the virtualized network function capacity prediction engine regularly retrieves the characteristics and configuration data from the scaling trigger handler； and

the scaling trigger handler regularly retrieves the characteristics and configuration data from the virtualized network functions.
The method according to claim 5, wherein the virtualized network function capacity prediction engine receives the characteristics and configuration data reported by the scaling trigger handler； and

the scaling trigger handler receives the characteristics and configuration data reported by virtualized network functions.
The method according to claim 1, wherein the obtaining capacity prediction information of a next time period by using the characteristics and configuration data of one or more past time periods based on machine learning, comprises:

training a machine learning model by using the characteristics and configuration data of one or more past time periods to discover inherent patterns from the characteristics and configuration data and to build an analytics model；

predicting one or more capacities of the next time period using the built analytics model； and

concluding resource needs of the next time period by comparing the predicted capacities and allocated resources of the next time period.
The method according to claim 1, wherein techniques applying the machine learning comprises applying supervised learning models or methods；

one or more of following models or methods are used for the machine learning: regression analysis, supported vector machine, decision trees, random forest, artificial neural networks.
The method according to claim 8, wherein a fused model is used by combining backpropagation neural networks, radial basis functions and generalized regression neural networks.
The method according to claim 1, wherein the method further comprises:

synthesizing multiple predicted capacities with different weights； and/or

combining multiple input data and/or output data with the different weights.
The method according to claim 10, wherein the different weights are calculated based on the following rules: the more aged the data is, the smaller value the weight has； and/or； the more similar the time period is, the bigger value the weight has.
The method according to claim 1, wherein the method further comprises:

using actual capacity data of a current time period as the latest training data of the machine learning when the current time period is over.
The method according to claim 12, wherein the method further comprises:

discarding the data of the most aged period； and

revising a prediction model of the machine learning by using the latest training data.
a system for network function capacity and scaling management, comprises a scaling trigger handler and a scaling decision maker, wherein the system further comprises:

a capacity prediction engine configured to perform a method according to any of claims 1-13.
The system according to claim 14, wherein the system further comprises an element management system or a network management system, a virtualized network function manager, a virtualized infrastructure manager and a network function virtualization infrastructure.
The system according to claim 14, wherein the scaling trigger handler is configured or implemented in an element management system or a network management system, the scaling decision maker is configured or implemented in a virtualized network function manager；

the capacity prediction engine is used for virtualized network function capacity and scaling management and is configured or implemented in the virtualized network function manager or the element management system or the network management system.
A device for network function capacity and scaling management, comprising a processor and a memory, wherein the memory containing instructions executable by the processor whereby the device is operative to perform a method according to any of claims 1-13.
A computer program product being tangibly stored on a computer readable storage medium and including instructions which, when executed on a processor of a device for network function capacity and scaling management, cause the device to perform a method according to any of claims 1-13.