WO2014110293A1

WO2014110293A1 - An improved streaming method and system for processing network metadata

Info

Publication number: WO2014110293A1
Application number: PCT/US2014/010932
Authority: WO
Inventors: Igor Balabine; Alexander VELEDNITSKY
Original assignee: Netflow Logic Corporation
Priority date: 2013-01-10
Filing date: 2014-01-09
Publication date: 2014-07-17
Also published as: RU2015132628A; JP2016508353A; CN105051696A; CA2897664A1; KR20150105436A

Abstract

An improved method and system for processing network metadata is described. Network metadata may be processed by dynamically instantiated executable software modules which make policy-based decisions about the character of the network metadata and about presentation of the network metadata to consumers of the information carried by the network metadata. The network metadata may be type classified and each subclass within a type may be mapped to a definition by a unique fingerprint value. The fingerprint value may be used for matching the network metadata subclasses against relevant policies and transformation rules. For template-based network metadata such as NetFlow v9, an embodiment of the invention can constantly monitor network traffic for unknown templates, capture template definitions, and informs administrators about templates for which custom policies and conversion rules do not exist. Conversion modules can efficiently convert selected types and/or subclasses of network metadata into alternative metadata formats.

Description

An Improved Streaming Method and System for Processing Network Metadata

FIELD OF THE INVENTION

[0001] In general the present invention relates to network monitoring and event management. More specifically it relates to processing of network metadata obtained through network monitoring activities and a subsequent processing of the metadata, which may efficiently result in useful information being reported in a timely manner to a consumer of the metadata.

BACKGROUND

[0002] Network monitoring is a critical information technology (IT) function often used by Enterprises and Service Providers, which involves watching the activities occurring on an internal network for problems related to performance, misbehaving hosts, suspicious user activity, etc. Network monitoring is made possible due to the information provided by various network devices. The information has been generally referred to as network metadata, i.e., a class of information describing activity on the network which is supplemental and complimentary to the rest of information transmitted over the network.

[0003] Syslog is one type of network metadata commonly used for network monitoring. Syslog is a standard for logging program messages and provides devices which would otherwise be unable to communicate a means to notify administrators of problems or performance. Syslog is often used for computer system management and security auditing as well as generalized informational, analysis, and debugging messages. It is supported by a wide variety of devices (like printers and routers) and receivers across multiple platforms. Because of this, syslog can be used to integrate log data from many different types of systems into a central repository.

[0004] More recently, another type of network metadata, referred to by various vendors as NetFlow, jFlow, sFlow, etc., has also been introduced as a part of standard network traffic (hereafter generally referred to as "NetFlow".) NetFlow is a network protocol for collecting IP traffic information that has become an industry standard for traffic monitoring. NetFlow can be generated by a variety of network devices such as routers, switches, firewalls, intrusion detection systems (IDS), intrusion protection systems (IPS), network address translation (NAT) entities and many others. However, until recently, NetFlow network metadata was used exclusively for post factum network supervision purposes such as network topology discovery, locating network throughput bottlenecks, Service Level Agreement (SLA) validation, etc. Such limited use of NetFlow metadata can generally be attributed to the high volume and high delivery rate of information produced by the network devices, the diversity of the information sources and an overall complexity of integrating additional information streams into existing event analyzers. More particularly, NetFlow metadata producers have typically generated more information than consumers could analyze and use in a real time setting. For example, a single medium to large switch or router on a network might generate 400,000 NetFlow records per second.

[0005] Today's syslog collectors, syslog analyzers, security information management (SIM) systems, security event management (SEM) systems, security information and event management (SIEM) systems, etc. (collectively hereafter referred to as an "SIEM system") are either incapable of receiving and analyzing NetFlow, are limited to processing rudimentary information contained in NetFlow packets, or process NetFlow packets at rates much lower than such packets are typically generated.

[0006] The advent of robust network monitoring protocols such as NetFlow v9 (RFC 3954) and IPFIX (RFC 5101 and related IETF RFC) drastically expands the opportunity to use network metadata in the realm of network security and intelligent network management. At the same time, due to the constraints identified above, today's SIEM systems are not generally capable of utilizing network monitoring information beyond simply reporting observed byte and packet counts.

SUMMARY OF THE INVENTION

[0007] Network managers and network security professionals continuously confront and struggle with a problem often referred to in the industry as "Big Data". Some of the issues created by the Big Data problem include an inability to analyze and store massive amounts of machine-generated data that often exists in different formats and structures. The problems commonly experienced can be summarized as follows:

[0008] 1. Too much data to analyze in real time to acquire timely insight into network conditions.

[0009] 2. Data arrives in different formats from different device types on a network, making correlation of data from different device types difficult and slow; and [0010] 3. Too much data to store (e.g., for later analysis and/or for compliance with data retention requirements).

[0011] The present invention provides a system and method capable of addressing all of the above-identified problems associated with Big Data by providing the ability to analyze large volumes of metadata in real time, convert large volumes of metadata into a common format that allows ready correlation with other data within a single monitoring system, and dramatic reduction in the volume of the incoming data through real time data reduction techniques such as packet validation, filtering, aggregation and de-duplication.

[0012] Embodiments of the present invention are able to check the validity of incoming packets of network metadata and discard malformed or improper messages. Embodiments are also able to examine and filter incoming packets of network metadata in real time to identify relevant aspects of their information content and segment or route different streams of incoming network metadata for differing processing within the processing engine of the present invention. Included in such differing processing is the opportunity to reduce output metadata traffic by dropping particular messages or selected streams of messages based upon criteria that can be configured by a network manager and determined during the early examination of incoming messages. This enables a network manager to focus the network analysis, either on an ongoing basis or temporarily in response to a particular network condition. As an example, a network manager can elect to focus attention upon network metadata within the system that is generated only be the edge devices on the network to investigate possible intrusion events.

[0013] Embodiments of the present invention are further able to aggregate the information content contained in incoming packets of network metadata and replace a large quantity of related packets with one or a much smaller number of other packets that capture the same information but generate a much smaller downstream display, analysis and storage requirement than the original metadata flow.

[0014] Embodiments of the present invention are further able to de-duplicate the content of the normal metadata flow generated by the network devices. Because incoming traffic is typically routed within the network through a sequence of network devices to its destination device, and because each network device typically generates network metadata for each flow that traverses it, a significant amount of redundant metadata is generated that contributes to the Big Data problem in the industry. [0015] The present invention relates to a system and method capable of receiving arbitrary structured data, e.g., network or machine-generated metadata, in a variety of data formats (hereafter network metadata), efficiently processing the network metadata and forwarding the received network metadata and/or network metadata derived from the original network metadata in a variety of data formats. Network metadata could be generated by a variety of network devices such as routers, switches, firewalls, intrusion detection systems (IDS), intrusion protection systems (IPS), network address translation (NAT) entities and many others. The network metadata information is generated in a number of formats including but not limited to NetFlow and its variants, (e.g., jFlow, cflowd, sFlow, IPFIX), SNMP, SMTP, syslog, etc. The method and system described herein is able to output network metadata information in a number of formats including but not limited to NetFlow and its versions, O^'Flow, cflowd, sFlow, IPFIX,) SNMP, SMTP, syslog, OpenFlow, etc. In addition, embodiments of the invention are able to output selected types of network metadata information at a rate sufficient to allow real-time or near-real- time network services to be provided. As a result, the system is capable of providing meaningful services in deployments with N (N > 1) producers of the network metadata and M (M > 1) consumers of the original or derived network metadata. It may be appreciated that a particular embodiment of this invention aligns with a definition of IPFIX Mediator as reflected in RFC 5982.

[0016] An embodiment of the present invention provides a method and system for identifying the nature, character and/or type ("class") of received network metadata and organizing received information into categories or classes. This may be of particular usefulness when used in association with NetFlow v9 and similar messages that are template -based and can be of widely varied content and purpose. Once categorized or classified, each individual class member instance can be further processed according to zero, one or a plurality of class specific processing rules or according to a default processing rule ("policies"). This aspect of the invention enables fine grain processing of an unlimited variety of network metadata types.

[0017] By identifying the class of incoming network metadata at an early stage of the operation, the embodiment is able to efficiently organize the processing of network metadata, and in appropriate circumstances, reduce the amount of processing required by filtering, consolidating and/or eliminating portions of the network metadata that is of limited interest to the system administrator, thereby contributing to the real-time or near-real-time operation of the system and potentially reducing storage requirements at a network metadata collector. For example, as a particular body of network traffic traverses multiple devices in a network, network metadata may be generated from each traversed device that contains redundant information. Depending upon the focus or areas of monitoring defined within the SIEM system, it may be desirable to filter, aggregate, consolidate or eliminate metadata records containing redundant information from the metadata flow forwarded to the SIEM system. Policies can be introduced that remove redundancies from certain classes of network metadata that are directed to the SIEM system, while at the same time preserving all such metadata for the flow that is directed to a collector.

[0018] It will thus be appreciated that the policies implemented by embodiments of the invention can be defined in a manner that supports and/or are coordinated with policies or areas of focus of a SIEM system and/or metadata collector that is operating within the network.

[0019] Policies can be introduced for the purpose of detecting important or unusual network events that might be indicative of security attacks, reporting traffic spikes on the network, detecting attacks on the network, fostering better usage of network resources, and/or identifying applications running on the network, for network management and security purposes. Policies can be general purpose or time-based, and can be applied to a specific class or a subset of the network metadata passing through the network. An embodiment of the invention contemplates the provision of multiple working threads that operate in cooperation with multiple policy modules to increase system throughput and performance.

[0020] Working threads can be introduced that are specialized or tuned for use with a particular class or subclass of network metadata to further enhance system performance and throughput. Such specialized working threads and policy modules can perform processing operations on different portions of the stream of network metadata in parallel to enhance system performance and throughput. Further, in response to a heavy volume of a particular class or subclass of network metadata, multiple instances of the specialized working thread and/or policy module can be instantiated to operate in parallel to further enhance system performance and throughput.

[0021] For example, an embodiment of this invention provides a unique capability of detecting externally controlled network hosts ("botnet member") residing on an internal network. Consider an infected network host operated by a central controller ("botnet master"). Typically, detection of malicious content on a network host requires installing a dedicated plug-in module on that host. This method does not work against sophisticated malicious agents ("rootkit") which are undetectable by any host-based means. An embodiment of the present invention introduces a policy which is able to identify and notify a security system about an act of communication between a botnet master and a botnet member on the internal network.

[0022] Due to the use of the network metadata information, intelligence provided by the present invention achieves a higher degree of trustworthiness than intelligence provided by similar-in-purpose devices exposed to the network traffic. For example, an in-line Intrusion Detection System (IDS) or Intrusion Detection System (IPS) exposed to malicious traffic could be compromised or subject to a Denial of Service ("DoS") attack while the present invention can be deployed on an internal network inaccessible to such attackers.

[0023] Furthermore, the present invention enables transforming network metadata which makes it suitable for deployments which require network metadata obfuscation.

[0024] According to another embodiment of the present invention, the method and system may be implemented in a streaming fashion, i.e., processing the input network metadata as it arrives ("in real-time or near-real-time") without the need to resort to persistent storage of the network metadata. This embodiment of the invention allows deployment of the system and method on a computer with limited memory and storage capacity, which makes the embodiment especially well suited for deployments in a computing cloud.

[0025] After processing a class member instance according to a policy or a plurality of policies, an embodiment of the present invention may provide an efficient method for converting the results of the policies' application into zero, one or more representations ("converter") suitable for further processing by recipients of the converted network metadata or the original network metadata. As a result, the system and method disclosed herein is exceptionally well suited for deployments in existing environments where its output may be directed towards existing diverse components such as SIEM systems adapted for use with syslog metadata.

[0026] An embodiment of the invention provides a plurality of converters that may be customized for a particular class or classes of network metadata and/or output format, thereby increasing throughput of the system to better enable real-time or near-real-time services on the network. Further, in response to a heavy volume of a particular class or subclass of network metadata, multiple instances of the customized working thread and/or conversion modules can be instantiated to operate in parallel to further enhance system performance and throughput.

[0027] Furthermore, an embodiment of the present invention is able to ensure integrity of the converted network metadata by appending message authentication codes. This embodiment of the invention enables sophisticated network metadata recipients to verify authenticity of the received information.

[0028] Yet another embodiment of this invention is the ability to deploy the system and method in a fashion transparent to the existing network ecosystem. This embodiment does not require any change in the existing network components' configuration.

[0029] Another embodiment of the present invention provides a method and apparatus for describing network metadata processing and conversion rules either in visual or in textual terms or a combination thereof. Once the policies' description is complete and verified to be non-contradicting, the policies and converters applicable to a class member subject to the rules may be instantiated as one or a plurality of executable modules simultaneously derived from one or a plurality of the network metadata processing and conversion rules definitions. As a result, systemic policy consistency is achieved across a plurality of modules.

Furthermore, the binary nature of the modules implementing the policies and conversion rules makes the system capable of handling the input network metadata at rates significantly exceeding processing rates in environments which interpret comparable processing rules.

BRIEF DESCRIPTION OF THE DRAWINGS

[0030] In order that the present invention may be more clearly ascertained, some

embodiments will now be described, by way of example, with reference to the accompanying drawings, in which:

[0031] FIG. 1 provides a simplified schematic diagram of a software-defined network system including a variety of network devices that generate metadata that can be analyzed in accordance with an embodiment of the present invention;

[0032] FIG. 2 provides a simplified schematic diagram of a software-defined network system including a variety of network devices that generate metadata and a system in accordance with an embodiment of the present invention for managing the network while analyzing such metadata; [0033] FIG. 3 provides a simplified schematic diagram of a cloud-based network system including a variety of network devices that generate metadata that can be analyzed in accordance with an embodiment of the present invention;

[0034] FIG. 4 provides a simplified schematic diagram of a cloud-based network system including a variety of processing modules that cooperate to automate the network while analyzing metadata in accordance with an embodiment of the present invention;

[0035] FIG. 5 provides a somewhat simplified schematic diagram of a software-defined network and cloud-based computing environment, including modules that cooperate to analyze metadata in accordance with an embodiment of the present invention;

[0036] FIG. 6 is a simplified schematic diagram that illustrates an embodiment of the present invention in which short term storage is incorporated in order to provide on-demand NetFlow information;

[0037] FIG. 7 provides another simplified schematic diagram that illustrates an alternative embodiment of the present invention in which short term storage is incorporated in order to provide on-demand NetFlow information; and

[0038] FIG. 8 provides a simplified schematic diagram illustrating an embodiment of the present invention in which botnets may be detected using geo-spatial analysis.

DETAILED DESCRIPTION OF THE INVENTION

[0039] In general the present invention relates to network monitoring and event management. More specifically it relates to processing network metadata obtained as a result of network monitoring activities and subsequent processing of the metadata, which may result in useful information being reported to an event management entity in a timely manner.

[0040] In the following description, the invention is disclosed in the context of network metadata processing for the purposes of illustration only. However, it will be appreciated that the invention is suitable for a broader variety of applications and uses and certain

embodiments of the invention are applicable in contexts other than network metadata processing. For example, in an OpenFlow compliant environment, the system may receive NetFlow information from the network and output instructions to an OpenFlow Controller.

[0041] In one embodiment of this invention, the method and system may be implemented using a NetFlow to Syslog Converter ("NF2SL") - a software program which enables integrating NetFlow versions 1 through 8, NetFlow v9, jFlow, sflowd, sFlow, NetStream, IPFIX and similar ("NetFlow") producers with any SIEM system capable of processing syslog. The integration is achieved by converting network metadata generated by the

NetFlow producers on the network into a lingua franca of network monitoring systems - syslog. Mapping of the NetFlow information to corresponding syslog information may be performed according to policies, rules and priorities established by the NF2SL Administrator.

[0042] NFI for Software Defined Networking

[0043] Software Defined Networking (SDN) is a networking architectural concept which separates network control (decision making about packet forwarding) and network topology (physical connectivity of the network devices). A typical implementation of the SDN architecture puts the decision making process on a separate computing device such as a server and leaves packet forwarding to traditional network devices such as switches and routers.

[0044] Referring to Fig. 1, in an exemplary embodiment communications between the control plane and the data forwarding plane are carried out by means of the OpenFlow protocol 100. This protocol enables a central device, called OpenFlow Controller 101, to direct traffic through one or a plurality of OpenFlow compliant network devices 102 in its domain. In general, the OpenFlow Controller 101 may set up communications paths based on specific characteristics such as fewest number of hops, link bandwidth or latency.

[0045] The OpenFlow Controller 101 sets up communications paths using a flow table abstraction in which a flow is represented by a collection of packet fields against which each packet traversing a network device is matched. When a controlled network device 102 encounters a packet for which it does not have forwarding instructions, the network device 102 forwards the packet to the OpenFlow Controller 101 for examination and providing instructions on how to handle similar packets in the future.

[0046] The OpenFlow Controller 101 makes its decisions based on the OSI Layer 2 (local network connectivity) and OSI Level 3 (routing) network level information. The scope of information available to the OpenFlow Controller 101 makes it impossible to more efficiently make the most of network infrastructure capacity by utilizing information about the applications and the identity of the network users.

[0047] This deficiency of the OpenFlow Controller 101 could be alleviated by introducing an additional component which digests a higher level of information, such as OSI Layer 7 information (applications) and the users identity, according to a policy or a set of policies set forth by a System Administrator, and directs the OpenFlow Controller 101 how to make lower level network packets forwarding decisions taking into account such higher level information.

[0048] Referring to Fig. 2, in an exemplary embodiment, acting through an agent, NFI Server 110 provides higher level information, including but not limited to the OSI Level 7 application-level data that enables the OpenFlow Controller 101 to make more intelligent decisions concerning how to utilize the network.

[0049] Further referring to Fig.2, NFI Server 110 processes NetFlow information 111 generated by OpenFlow 100 compliant networking devices 102 and communicates consolidated flow data to the NFI OpenFlow Agent 113 implemented as an application capable of communicating with the OpenFlow Controller 101. In an exemplary embodiment, the communication between the NFI OpenFlow Agent 113 and the OpenFlow Controller 101 may be implemented by means of the OpenFlow "Northbound" API 114 which supports bidirectional communications between the NFI OpenFlow Agent 113 and the OpenFlow Controller 101.

[0050] It is appreciated that the NFI OpenFlow Agent 113 may communicate with a plurality of OpenFlow Controllers 101 and may receive flow related information from a plurality of NFI Servers 110. It is also appreciated that NFI Server 110 may send flow related

information to a plurality of NFI OpenFlow Agents 113.

[0051] The NFI OpenFlow Agent 113 receives information about the flows, including but not limited to the OSI Level 7 application information and user identity information from the NFI Server 110 via a protected communications channel 112.

[0052] The NFI Server 110 receives OSI Level 7 application information in NetFlow messages generated by the network devices 102 and derives user information from the user- identity-aware NetFlow messages such as NetFlow Security Event Log (NSEL) or in the OSI Layer 2 extensions such as Cisco Secure Group Tags (SGT).

[0053] The OSI Level 7 application information may be supplied by means of a classification such as PANA-L7 accompanied by an application identifier or other similar application classification. The communications channel 112 may be protected by standard cryptographic means such as the SSL/TLS or the DTLS protocol.

[0054] In an exemplary embodiment, the NFI OpenFlow Agent 113 is able to retrieve information about the OSI Layer 2 (local network connectivity) and OSI Layer 3 (routing) from the OpenFlow Controller 101 by the means of the OpenFlow "Northbound" API 114. It is appreciated that the NFI OpenFlow Agent 113 may deduce the OSI Layer 2 (local network connectivity) and OSI Layer 3 (routing) information from the flow data received from the NFI Server 110 or by other means.

[0055] Further, the NFI OpenFlow Agent 113 is able to map the OSI Level 7 application information and the user identity information received from the NFI Server 110 to the policy provided by the system administrator, determine if the state of the network comprised by the network devices 102 satisfies the policy, and instruct the OpenFlow Controller 101 to apply a corrective action if such is required.

[0056] Exemplary NFI OpenFlow Agent 113 policies could include enforcement of a certain network bandwidth allocated to an application for a certain user or a group as determined by a Cisco SGT associated with the network traffic; enforcement of an SLA for a subnet classified by an IP address prefix or a VLAN tag, and so on. An exemplary policy could be expressed as a numeric threshold, in relative terms (e.g., "group A network bandwidth consumption should not exceed network bandwidth consumption of group B"), or in fuzzy terms (e.g., "if network traffic is low, network bandwidth allocated to group A may be increased"). The policies could be expressed in many forms, for example and without any limitation, as an XML document, in a proprietary format, etc. The policies could be based on the application type derived from the OSI Level 7 application information, user or group identity, user or group role, time of day, etc.

[0057] It is appreciated that this invention could be used to increase utilization and quality of servers in the Enterprise networks, data centers, Service Provider networks, and public and private cloud environments.

[0058] It is also appreciated that the NFI OpenFlow Agent 113 is capable of utilizing NetFlow information received from the NFI Server 110 to monitor the health of the network and report potential faults prior to their happening. In an exemplary embodiment, a conclusion about an impending network fault could be made by utilizing the NetFlow protocol for measuring average size of a packet traversing a network device interface. A noticeable drop in the average packet size could indicate a higher level of the network packets fragmentation, which typically indicates faulty hardware. When the average packet size drops below a certain threshold, the NFI Server 110 may notify the NFI OpenFlow Agent 113 about this event. In turn, the NFI OpenFlow Agent 113 may instruct the OpenFlow Controller 101 to take a corrective action by rerouting the traffic around a problematic network device and/or notify the System Administrator about the problem.

[0059] In another exemplary embodiment, the NFI Server 110 may forecast a network fault by comparing dispersion of the traffic rate by volume and processed packets against preset or dynamically computed thresholds. Comparison of dispersion of the flow reports arrival time to a computed or a preset threshold could be another NFI Server 110 network fault reporting criteria.

[0060] It is appreciated that such network fault threshold values could be computed by means of fuzzy-logic-based algorithms, statistical measurements and other methods and network faults may be predicted using linear prediction algorithms such as autoregressive model, moving average model, or other predictive analytics methods. It is also appreciated that the NFI OpenFlow Agent 113 may make its decisions based on information received from a plurality of NFI Servers 110.

[0061] Furthermore, it is appreciated that a protocol used to control the data plane of the network devices 102 could be other than OpenFlow, the control plane implementation other than the OpenFlow Controller 101, the API used to communicate with the control plane could be other than the OpenFlow "Northbound" API 114, and the NFI OpenFlow Agent 113 could be co-located with the control plane or be remote. In case of co-location, the NFI OpenFlow Agent 113 could utilize a local programmatic API or interact with the control plane using a network protocol.

[0062] An obvious benefit of integrating the application level information into the packet forwarding function is the simplicity in which the network administrator could express the network bandwidth utilization policies. This leads to a more optimal use of the existing network resources and increased customer satisfaction due to a better fulfillment of the existing SLA.

[0063] NFI for Infrastructure as a Service

[0064] Infrastructure as a Service (IaaS) is a cloud computing provisioning model in which organizations outsource computing operations including servers, network and storage to a service provider. The provider owns, operates and maintains the hardware. In addition, individual organizations could also pool their existing local computing resources and provide a private IaaS offering limited to use exclusively by the organization. [0065] Referring to Fig. 3, OpenStack is a vendor independent cloud operating system designed to control large groups of computing resources, including servers, storage and networking devices, and manage those resources through a console called an OpenStack Dashboard 120.

[0066] In an exemplary embodiment, the OpenStack system could be used by a service provider to manage its IaaS offering or by an organization to manage its own pool of computing resources.

[0067] Further referring to Fig.3, the OpenStack system provides a collection of web-based APIs, called OpenStack API 124 (OpenStack Compute, OpenStack Object Storage,

OpenStack Identity Service, and OpenStack Image Store), which allow provisioning and manipulating virtual devices deployed in a cloud. The OpenStack API 124 enables cloud operators to provision cloud infrastructure, including virtual machine (VM) instances, storage and identity services, and manipulate Virtualized Devices 125 deployed in a Cloud 123. The OpenStack system provides a number of tools, such as cURL, rest-client, nova, etc., for utilizing the OpenStack system services such as launching a Virtual Device 125, checking Virtual Device 125 status, shutting down a Virtual Device 125, and so on.

[0068] Referring to Fig. 4, a robust OpenStack API 124 provides an opportunity to automate the OpenStack-based system provisioning and maintenance by utilizing the NetFlow information 111 reported by the Hardware or Virtual Network Devices 102. Furthermore, NetFlow 111 information reported by VM hypervisors provides a complete insight into the state of Virtualized Devices 125 by the means of the NFI Server 110.

[0069] Further referring to Fig. 4, the NFI Server 110 processes NetFlow information 111 generated by Hardware or Virtual Network Devices 102 and Virtualized Devices 125 and communicates consolidated flow data to the NFI OpenStack Agent 122 implemented as an application capable of communicating with the OpenStack controlled Virtualized Devices 125 deployed in the Cloud 123. In an exemplary embodiment, the communication between the NFI OpenStack Agent 122 and the OpenStack controlled Cloud 123 may be implemented by means of the OpenStack API 124 which supports bi-directional communications between the NFI OpenStack Agent 113 and the OpenStack controlled Cloud 123.

[0070] Further referring to Fig. 4, in an exemplary embodiment, NFI Server 110 provides network flow information, including but not limited to the OSI Level 7 application-level data that enables the NFI OpenStack Agent 122 to make intelligent decisions how to utilize the Cloud 125 computing resources.

[0071] The NFI OpenStack Agent 122 receives information about the flows, including but not limited to the OSI Level 7 application information and user identity information from the NFI Server 110 via a protected communications channel 121.

[0072] The OSI Level 7 application information may be supplied by means of a classification such as PANA-L7 accompanied by an application identifier or other similar application classification. The communications channel 121 may be protected by standard cryptographic means such as the SSL/TLS or the DTLS protocol.

[0073] The NFI Server 110 receives OSI Level 7 application information in NetFlow messages generated by the network devices 102 and derives user information from the user identity aware NetFlow messages such as NetFlow Security Event Log (NSEL) or in the OSI Layer 2 extensions such as Cisco Secure Group Tags (SGT).

[0074] In an exemplary embodiment, System Administrator configures policies for

Virtualized Devices 125 provisioning and maintenance on the NFI OpenStack Agent 122. The policies could be expressed, without any limitation, as an XML document, in a proprietary format, etc. The policies could be based on the application type derived from the OSI Level 7 application information, user or group identity, user or group role, time of day, etc.

[0075] An exemplary policy configured by the System Administrator on the NFI OpenStack Agent 122 could be creating additional Virtualized Devices 125 when a demand for a particular application increases, provisioning additional resources to the existing Virtualized Devices 125, migration of existing Virtualized Devices 125 to more powerful hardware within the Cloud 123, shutting down idle Virtualized Devices 125, etc.

[0076] By utilizing NetFlow 111 information, the NFI OpenStack Agent 122 is able to automate Cloud 123 management, thus reducing the cloud provider's or cloud owner's operational costs and improving utilization of the physical hardware resources.

[0077] It is appreciated that OpenStack is an example of a cloud operating system and the method disclosed herein is applicable to any vendor specific or generic cloud operating system. [0078] NFI for Virtualized Environment

[0079] It is appreciated that the NFI Server, combined with NFI OpenFlow Agent and NFI OpenStack Agent becomes a linchpin of an integrated virtualized environment which includes an OpenFlow-based software defined network and an OpenStack driven cloud infrastructure.

[0080] Fig. 5 illustrates the NFI Server 110 application to an integrated setting which includes software defined networking and a cloud computing environment.

[0081] Further referring to Fig. 5, the NFI Server 110 processes NetFlow information 111 generated by Hardware or Virtual Network Devices 102 and Virtualized Devices 125 and communicates consolidated flow data to the NFI OpenStack Agent 122 implemented as an application capable of communicating with the OpenStack controlled Virtualized Devices 125 deployed in the Cloud 123. In an exemplary embodiment, the communication between the NFI OpenStack Agent 122 and the OpenStack controlled Cloud 123 may be implemented by means of the OpenStack API 124 which supports bi-directional communications between the NFI OpenStack Agent 113 and the OpenStack controlled Cloud 123.

[0082] Further referring to Fig.5, NFI Server 110 processes NetFlow information 111 generated by OpenFlow compliant networking devices 102 and Virtualized Devices 125 and communicates consolidated flow data to the NFI OpenFlow Agent 113 implemented as an application capable of communicating with the OpenFlow Controller 101. In an exemplary embodiment, the communication between the NFI OpenFlow Agent 113 and the OpenFlow Controller 101 may be implemented by means of the OpenFlow "Northbound" API 114 which supports bi-directional communications between the NFI OpenFlow Agent 113 and the OpenFlow Controller 101.

[0083] Due to a unique position of the NFI Server 110 in the virtualized computing environment, its interaction with the OpenStack controlled Cloud 123 and the OpenFlow Controller 101 results in a robust control mechanism which unifies Cloud 123 computational resources driven by the OpenStack protocol and the networking resources overseen by the OpenFlow Controller 101 thus creating a novel computing paradigm of a Flow Controlled Computing Platform.

[0084] It is appreciated that the NFI Server 110 may interact with a plurality of Clouds 123 and a plurality of OpenFlow Controllers 101. [0085] It is also appreciated that for interacting with a software defined network, a protocol other than OpenFlow may be utilized and an API other than OpenStack may be employed for controlling virtualized computing resources.

[0086] On-Demand NetFlow Information

[0087] Flow information data is notoriously voluminous: a single mid-range router like Cisco ASR1000 is capable of producing 400,000 NetFlow records per second which results in around 1.6TB of data per day. Due to a high rate and volume of data, many of the NFI policies are designed to consolidate and/or filter the data and report only a greatly reduced volume of essential information to a backend system, such as without limitation, a SIEM system.

[0088] Typically, consolidated information provided by NFI is sufficient for the backend system but under certain circumstances, especially in security related situations, the backend system may need more information about the conditions which preceded the event in question and conditions immediately after the event. By taking into consideration the event context, the backend system may be in a much better position to determine the scope and the consequences of the observed event.

[0089] For example, consider a case in which a SIEM system received a notification about a configuration change on a sensitive device D by user A. At first glance this event does not deserve scrutiny because user A may be authorized to configure device D and has sufficient credentials to access device D and make a configuration change. But if the SIEM system is also receiving data from NFI, it may now be capable of correlating the configuration change action with the location on the network from where the configuration change request was issued. A case when a request for configuration change was issued from a network location other than a network location with which user A was associated at the time of the

configuration change event can signify an impersonation attack.

[0090] It is appreciated that the above impersonation attack cannot be detected by the means of the authentication and authorization systems only. From the point of view of the authentication and authorization systems, the configuration change is totally legitimate since the actor possesses valid access credentials.

[0091] An embodiment of the NFI on-demand flow information mechanism disclosed in this invention enables the SIEM system to receive information required for correlating network information with other machine data a posteriori without the need of constantly processing all of what may be a huge flow of inbound network data.

[0092] Referring to Fig. 6, in another exemplary embodiment, NFI Server 110 receives NetFlow data 111 from one or a plurality of network devices. By means of a configured collection of NFI Policies 141, NFI Server 110 processes NetFlow data 111 and reports Consolidated NetFlow data 142 to a SIEM System 140 in a format understood by the SIEM System 140.

[0093] Simultaneously with such actions, NFI Server 110 propagates received NetFlow data 111 to the Short Term Storage 145 where the NetFlow data 111 is placed into the leftmost Time Window 144.

[0094] In an exemplary embodiment, Short Term Storage 145 is a repository with a small access time, possibly in RAM, on SSD or some other fast and/or local storage device.

Logically, Short Term Storage 145 may be split into a configurable number of sections, e.g., Time Windows 144, each of which contains NetFlow data 111 information received over a configurable period At. Short Term Storage 145 generally implements a sliding window schema in which after each period At the right-most Time Window 144 in an augmented NetFlow format 143 is forwarded to the Long Term Storage 146, the Short Term Storage 145 logically shifts and new left-most Time Window 144 is created for storing the incoming NetFlow data 111 information. The Long Term Storage 146 generally has an access time and storage capacity that is greater or equal to the Short Term Storage 145 access time and storage capacity.

[0095] It is appreciated that augmented NetFlow format 143 may be the same as the original NetFlow data 1 11 or may contain additional mark up information for use in the long term storage.

[0096] In an exemplary embodiment, SIEM system 140 may execute a Set of Policies 150 which consume Consolidated NetFlow data 142 supplied by NFI Server 110 and, optionally, Other Machine Data 153. If in the process of execution of a policy from the Set of Policies 150, SIEM system 140 detects an Event 151 which took place at time T, SIEM system 140 can issue a Request 152 to the NFI Server 110 to provide additional NetFlow 111 data received by the NFI Server 110 during a time interval [T - 1, T + 1], where t is the interval half-width selected by the SIEM system 140. [0097] Upon receiving the SIEM system 140 Request 152, the NFI Server 110 determines location of the requested information in the storage based on the beginning time and the ending time of the requested time interval [T - 1, T + 1]. Assuming that at the time of Request 152 the Short Term Storage 145 contains NetFlow 111 data corresponding to the time interval [Tl, T2], T2 > Tl, and the requested time interval [T - 1, T + t] is within the Short Term Storage 145 time interval [Tl, T2], then the NFI Server 110 retrieves requested information from the Short Term Storage 145 and forwards 156 the retrieved information, optionally with additional processing, to the SIEM system 140.

[0098] If the requested time interval [T - 1, T + t] is outside of the Short Term Storage 145 time interval [Tl, T2], then the NFI Server 110 attempts to retrieve the requested information from the Long Term Storage 146 and if successful, upon optionally with additional processing, forwards the retrieved information in Response 156, to the SIEM system 140.

[0099] If the requested time interval [T - 1, T + 1] is split between the Short Term Storage 145 time interval [Tl, T2] and the Long Term Storage 146, then the NFI Server 110 retrieves first part of the requested information from the Short Term Storage 146 and the second part of the requested information from the Long Term Storage 146, concatenates the first retrieved part and the second retrieved part of information and forwards the concatenated information, optionally with additional processing, in Response 156 to the SIEM system 140.

[00100] In a case when the right boundary T + t of the requested time interval [ T - 1, T

+ t ] is outside of the time range of the information in the Long Term Storage 146, or the left boundary T - 1 of the requested time interval [ T - 1, T + t ] is outside of the time range of the information in the Short Term Storage 145, the NFI Server 110 retrieves information for a truncated time range and notifies the SIEM system about the truncation in Response 156.

[00101] In the case in which the requested time interval [ T - 1, T + t ] is outside of the time range covered by the Short Term Storage 145 and the Long Term Storage 146, the NFI server 110 notifies the SIEM system about the error condition in Response 156.

[00102] The novel multi-tiered approach to storing NetFlow data disclosed herein provides a significant advantage when analyzing events which require immediate reporting or action as compared to the traditional single tiered NetFlow information storage used by prior NetFlow collectors. For the events which require immediate reporting or action, search for the requested information in the fast Short Term Storage 145 is significantly faster than in a slower Long Term Storage 146 which results in a better response time of the SIEM system 140.

[00103] It is appreciated that the SIEM system 140 request 152 for additional information may include, besides specifying the time interval, other parameters such as the origin of the NetFlow record, specific flow information, such as, without limitation, a source or destination IP addresses, or a combination thereof. It is also appreciated that NetFlow information in the Short Term Storage 145 and the Long Term Storage 146 may be indexed by time and by zero, one or a plurality of keys based on the information pertinent to the NetFlow such as without limitation, source or destination IP addresses, source or destination OSI Layer 4 ports, and so on.

[00104] Further referring to Fig.7, it is appreciated that the Short Term Storage 145 and the Long Term Storage 146 may be operated by the NFI Server 110, by an instance of the NFI Server 110 other than the instance of the NFI Server 110 which originally processed NetFlow data 111, and/or by a process other than an NFI Server 110. It is also appreciated that the Short Term Storage 145 and the Long Term Storage 146 may be operated by different instances of the NFI Server 110 or by a process other than the NFI Server 110. Furthermore, the access time to the Short Term Storage 145 and the Long Term Storage 146 may be same and there may be a plurality of more than two storage tiers. It is also

appreciated that the Long Term Storage 146 is an optional component and the information in the Short Term Storage 145 may be discarded when it ages past a configured life span.

[00105] A novel approach to associating network and other machine data disclosed herein enables detection of attacks which would be undetected if only the network or other machine data is taken into consideration. A novel approach to the network information storage disclosed here enables provision of the network information on the "only when needed" basis without any preliminary processing.

[00106] Geo-Spatial Analysis-Based Botnet Slaves Detection (see Fig. 8)

[00107] Sophisticated malware agents engage complex evasion detection techniques when communicating with their masters. For example, an agent can contact the master at random time intervals, communicate with multiple masters by selecting next master based on information received during last communication session, obfuscate Command & Control channel traffic patterns, etc. [00108] Method

[00109] Use inline cluster analysis algorithm (BIRCH - Balanced Iterative Reducing and Clustering using Hierarchies) to classify outbound traffic. BIRCH is known for efficiently determining "outliers" - i.e., data points that are not a part of the general underlying pattern.

[00110] Feature Set

[00111] S = { Si }, Si≡ { freq(dist, az), app, fl, £2, O, f4, T }

[00112] freq - communications frequency

[00113] dist - physical distance to the destination host

[00114] az - azimuth

[00115] app - L7 application id or L4 destination port

[00116] f 1 - flow rate, fiows/h

[00117] f2 - number of packets per flow

[00118] £3 - packet size, B

[00119] f4 - traffic rate, bps

[00120] "dist" and "az" are computed based on the source and destination IP addresses found in the flow record. Similarity function, "freq", is frequency of communications to a particular geographic area. Applications are classified into groups each of which is associated with a category assigned to a monitored host ("standard applications").

[00121] Reporting criteria

[00122] Alert to unique or infrequent communications with a peer by a non-standard application or a standard application with unusual traffic characteristics.

[00123] Penetration Testing, Configuration Verifiers)

[00124] As a network grows in size, its topology becomes more complex. Topology complexity, in turn, increases configuration complexity and makes it more error prone. There are a number of tools which help a System Administrator to assess and validate configuration and security posture of the networks under his management. These tools use a variety of methods to determine vulnerabilities in the network. For example, penetration testing tools "attack" an organization's firewalls, configuration verification tools attempt to find loopholes in the authentication and authorization policies, IDS/IPS systems watch the traffic flowing in and out of an organization's network, and so on. These protective technologies were developed over a long period and are mature enough to stop known and sometimes even unpredicted threats.

[00125] A problem with the today's network defensive posture is its static nature: once configured, and possibly verified, network defenses are considered impregnable like the Maginot Line was before the World War II. The protective measures are generally applied once, or at best, are assessed once in a while, leaving the organization without any quality assurance of the real security posture state in between the checks.

[00126] Yet another problem of today's network defenses is the diversity of methods how these protective elements are provisioned and configured. It is very infrequent that all nodes in the protective grid are sourced from a single vendor. A common IT practice is to use best of breed devices which obviously come from diverse network technology providers. Diverse and complex configuration methods increase the probability of error in today's multi- tiered network security deployments.

[00127] NetFlow is a technology which enables creation of the tools capable of providing dynamic quality control of the organization's networking infrastructure. NFI technology, disclosed in this invention, allows introducing arbitrary policies which could monitor network traffic throughout the organization and identify flow instances which were overseen by statically configured defenses.

[00128] While this invention has been described in terms of several embodiments, there are alterations, modifications, permutations, and substitute equivalents, which fall within the scope of this invention. Although sub-section titles have been provided to aid in the description of the invention, these titles are merely illustrative and are not intended to limit the scope of the present invention.

[00129] It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, modifications, permutations, and substitute equivalents as fall within the true spirit and scope of the present invention.

Claims

CLAIMS What is claimed is:

1. A method of improved management of a software-defined network, said network including a network controller and transmitting network traffic using one or more network protocols, the network including devices at least some of which receive network traffic through an ingress interface and transmit network traffic through an egress interface, the method comprising the steps of:

receiving network metadata from a plurality of sources in a data processing system, in at least one data format;

processing said network metadata while said network metadata is in transition on said network between a network device that generated said network metadata and a device that is able to store said network metadata to extract useful information therefrom; and

determining as a result of said metadata processing step, information relating to applications operating on said network; and

using said applications information to enable said network controller to perform more efficient management of said software-defined network.

2. The method as set forth in Claim 1, further comprising the steps of:

determining as a result of said metadata processing step, information relating to the users present on said network; and

using said user information to enable said network controller to perform more efficient management of said software-defined network.

3. A method of improved management of a cloud-based virtual computing environment, said environment including a cloud operating system and a cloud environment controller and transmitting network traffic using one or more network protocols, the network including devices at least some of which receive network traffic through an ingress interface and transmit network traffic through an egress interface, the method comprising the steps of: receiving network metadata from a plurality of sources in said cloud-based virtual computing environment, in at least one data format; processing said network metadata while said network metadata is in transition in said environment between a network device that generated said network metadata and a device that is able to store said network metadata to extract useful information therefrom; and

determining as a result of said metadata processing step, information relating to applications operating in said environment; and

using said applications information to enable said cloud environment controller to perform more efficient management of said cloud-based virtual computing environment.

4. The method as set forth in Claim 3, further comprising the steps of:

determining as a result of said metadata processing step, information relating to the users present in said environment; and

using said user information to enable said cloud environment controller to perform more efficient management of said cloud-based virtual computing environment.

5. A method of providing on-demand access to network metadata relating to an identified potentially security-related network event, in a network in which devices transmit network traffic using one or more network protocols, the network including devices at least some of which receive network traffic through an ingress interface and transmit network traffic through an egress interface, the method comprising the steps of:

processing network metadata in a streaming fashion according to a configured collection of network metadata processing policies;

retaining a time -indexed set of network metadata in a fast-access storage mechanism for a defined period of time;

identifying a potentially security-related network event; and

providing from said time-indexed set a collection of network metadata related in time to said identified potentially security-related network event; and

performing analysis to correlate said collection of network metadata with said identified potentially security-related network event to further characterize said identified potentially security-related network event.

6. The method as set forth in Claim 5, further comprising the step of removing selected network metadata from said fast-access storage mechanism to facilitate the arrival of new network metadata thereto.

7. A method of detecting botnet slaves on a network-connected device comprising:

applying an inline cluster analysis algorithm to classify outbound traffic on a network; said cluster analysis algorithm taking into consideration frequency of communications to the network hosts at identifiable geographic locations and data communication patterns such as, without limitation, application type, flow rate, the number of packets per flow, average packets size in each flow, and traffic rate;

based upon said applying step, identifying outbound traffic on said network that is not part of the general pattern of traffic on said network; and

communicating an alert in the event of outbound traffic on said network that is not part of the general pattern of traffic on said network.

8. A system for improved management of a software-defined network, said network including a network controller and transmitting network traffic using one or more network protocols, the network including devices at least some of which receive network traffic through an ingress interface and transmit network traffic through an egress interface and generate network metadata relating to said network traffic, said management system comprising:

at least one ingress interface for receiving network metadata from a plurality of sources in a software-defined network, in at least one data format;

a processing engine for processing said network metadata while said network metadata is in transition on said network between a network device that generated said network metadata and a device that is able to store said network metadata to extract useful information therefrom;

said processing engine determining information relating to applications operating on said network and using said applications information to enable said network controller to perform more efficient management of said software-defined network.

9. The management system as set forth in Claim 8, wherein said processing engine determines as a result of said metadata processing step, information relating to the users present on said network; and uses said user information to enable said network controller to perform more efficient management of said software-defined network.

10. A system for improved management of a cloud-based virtual computing environment, said environment including a cloud operating system and a cloud environment controller that transmits network traffic using one or more network protocols, the network including devices at least some of which receive network traffic through an ingress interface and transmit network traffic through an egress interface, the management system further comprising: an interface for receiving network metadata from a plurality of sources in said cloud- based virtual computing environment, in at least one data format;

a processing engine for processing said network metadata while said network metadata is in transition in said environment between a network device that generated said network metadata and a device that is able to store said network metadata to extract useful information therefrom;

said processing engine determining as a result of said metadata processing step, information relating to applications operating in said environment; and using said

applications information to enable said cloud environment controller to perform more efficient management of said cloud-based virtual computing environment.

11. The management system as set forth in Claim 10, wherein said processing engine determines as a result of said metadata processing step, information relating to the users present in said environment; and uses said user information to enable said cloud environment controller to perform more efficient management of said cloud-based virtual computing environment.

12. A system for providing on-demand access to network metadata relating to an identified potentially security-related network event, in a network in which devices transmit network traffic using one or more network protocols, the network including devices at least some of which receive network traffic through an ingress interface and transmit network traffic through an egress interface, the system comprising:

a processing engine for processing network metadata in a streaming fashion according to a configured collection of network metadata processing policies;

a fast-access storage mechanism for retaining a time -indexed set of network metadata for a defined period of time;

said processing engine identifying a potentially security-related network event and providing from said time-indexed set a collection of network metadata related in time to said identified potentially security-related network event; and an analysis engine for performing analysis to correlate said collection of network metadata with said identified potentially security-related network event to further characterize said identified potentially security-related network event.

13. The system as set forth in Claim 12, further comprising a memory management engine for removing selected network metadata from said fast-access storage mechanism to facilitate the arrival of new network metadata thereto.

14. A system for detecting botnet slaves on a network-connected device comprising:

a processing engine for applying an inline cluster analysis algorithm to classify outbound traffic on a network;

said cluster analysis algorithm taking into consideration frequency of communications to the network hosts at identifiable geographic locations and data communication patterns such as, without limitation, application type, flow rate, the number of packets per flow, average packets size in each flow, and traffic rate;

an analysis engine that, based upon the results of said cluster analysis algorithm, identifies outbound traffic on said network that is not part of the general pattern of traffic on said network; and

an alert generation engine for communicating an alert in the event of outbound traffic on said network that is not part of the general pattern of traffic on said network.