KR20110004101A

KR20110004101A - Method and apparatus for analyzing abnormal traffic using hierarchical clustering

Info

Publication number: KR20110004101A
Application number: KR1020090061742A
Authority: KR
Inventors: 손춘호; 유재형; 조석형; 손정표
Original assignee: 주식회사 케이티
Priority date: 2009-07-07
Filing date: 2009-07-07
Publication date: 2011-01-13

Abstract

PURPOSE: A method and an apparatus for analyzing abnormal traffic by using a hierarchical clustering in order to classify a traffic pattern about a backbone network are provided to analyze abnormal traffic generated from the other interface. CONSTITUTION: A hierarchical clustering unit(120) layers a plurality of clusters according to the similarity. An abnormal traffic judging unit(130) generates cluster classification groups including at least one cluster by classifying a plurality of clusters according to the predetermined critical similarity. The abnormal traffic judging unit determines abnormal traffic according to the number of traffic included in the cluster classification groups.

Description

계층적 클러스터링을 이용하여 비정상 트래픽을 분석하는 방법 및 장치 {METHOD AND APPARATUS FOR ANALYZING ABNORMAL TRAFFIC USING HIERARCHICAL CLUSTERING}Method and apparatus for analyzing abnormal traffic using hierarchical clustering {METHOD AND APPARATUS FOR ANALYZING ABNORMAL TRAFFIC USING HIERARCHICAL CLUSTERING}

본 발명은 비정상 트래픽을 분석하는 방법 및 장치에 관한 것으로, 더욱 상세하게는, 백본 네트워크에 있어서 가입자의 트래픽을 수집하는 복수의 경계 링크들이 갖는 트래픽 패턴을 분석하여 이를 계층적으로 클러스터링하고, 클러스터 간의 유사도를 측정하여 비정상 트래픽을 분석하는 비정상 트래픽 분석 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for analyzing abnormal traffic, and more particularly, to analyze a traffic pattern of a plurality of boundary links that collect subscriber traffic in a backbone network, clustering them hierarchically, and An abnormal traffic analysis method and apparatus for analyzing abnormal traffic by measuring similarity.

대규모의 네트워크 인프라가 구축됨에 따라, 비정상 트래픽을 분석하기 위한 기술이 종래부터 개발되어 왔다. 비정상적인 트래픽을 분석하는 이유는, 해당 네트워크가 감당하기 어려운 수준의 과도한 트래픽이 발생하기 이전에 어떤 링크에 비정상적인 트래픽이 발생할 가능성이 감지되고 있는지를 감시함으로써 비정상 트래픽의 발생 시에 효율적으로 대처하기 위함이다.As a large network infrastructure is constructed, techniques for analyzing abnormal traffic have been developed in the past. The reason for analyzing abnormal traffic is to efficiently deal with abnormal traffic by monitoring which link detects the possibility of abnormal traffic before the network receives excessive traffic that is difficult to handle. .

종래의 비정상 트래픽을 분석하는 기술로는, 트래픽이 갖는 특정한 플로우 패턴을 기준 패턴으로 삼고, 현재 유입되는 트래픽의 플로우가 이 기준 패턴과 동 일한지 여부를 판별하여 해당 트래픽 플로우가 비정상인지 여부를 분석하는 기술이 있다. 즉, 이 기술은 백본 네트워크에 있어서 개별 링크에 대하여 이전의 트래픽 패턴을 현재의 트래픽 패턴과 비교하여 트래픽 패턴에 차이가 발생하는 경우를 비정상 트래픽으로 간주하는 기법을 사용한다.In the conventional technique for analyzing abnormal traffic, a specific flow pattern of traffic is used as a reference pattern, and whether the flow of incoming traffic is the same as this reference pattern is analyzed to determine whether the corresponding traffic flow is abnormal. There is a technique to do. In other words, this technique uses a technique for comparing individual traffic patterns with current traffic patterns for individual links in the backbone network to regard abnormal traffic as a difference in traffic patterns.

그러나, 이와 같은 종래의 비정상 트래픽 분석 방법은 제한된 패턴의 유입 플로우에 대해서만 트래픽의 비정상 여부를 판단할 수 있기 때문에, 예측하지 못한 패턴에 대해서는 정확한 분류가 가능하지 않다는 문제점을 가지고 있었는바, 이러한 문제점을 극복하기 위하여 "비정상 트래픽 정보 분석 장치 및 그 방법"에 관한 한국특허 제656340호(이하, "선행문헌"이라고 함)는 현재 유입되는 트래픽의 이상 여부를 동적으로 갱신되는 이전 트래픽 정보에 근거하여 판단하는 기술을 제안하고 있다.However, the conventional abnormal traffic analysis method has a problem in that it is not possible to accurately classify an unexpected pattern because it is possible to determine whether the traffic is abnormal for only a limited inflow flow. In order to overcome this problem, Korean Patent No. 656340 (hereinafter referred to as “priority document”) relating to “an apparatus and method for analyzing abnormal traffic information” is determined based on previous traffic information that is dynamically updated to determine whether an abnormal incoming traffic is present. I suggest a technique to do it.

구체적으로 살펴보면, 선행문헌은, 시스템으로 유입되는 트래픽의 정보를 검출하는 트래픽 정보 검출부; 상기 검출된 트래픽 정보를 이전 유입된 트래픽 정보와 비교하여 상관관계를 나타내는 시그너쳐를 생성하고, 상기 검출된 트래픽 정보가 상기 이전 유입된 특정 트래픽 정보와 공통되는 경우를 카운트하여 저장하는 시그너쳐 생성부; 상기 검출된 트래픽 정보와 상기 생성된 시그너쳐를 저장하고, 소정 시간 단위로 삭제/갱신하여 다양한 형태의 비정상 트래픽에 대처 가능한 DB; 상기 시스템으로 유입된 트래픽을 상기 시스템의 최대 대역폭과 비교하여 차지하는 비율을 산출하는 대역폭 차지 비율 산출부; 및 상기 유입된 트래픽이 상기 시그너쳐 생성부 상의 상기 카운트가 높은 트래픽이거나 상기 대역폭 차지 비율 산출부 상의 상기 최대 대역폭 차지 비율이 높은 트래픽인 경우, 상기 트래픽을 탐지하여 차단하는 침입탐지/차단부를 포함하는 것을 특징으로 하는 비정상 트래픽 정보 분석 장치를 제안한다.Specifically, the prior art document, the traffic information detection unit for detecting the information of the traffic flowing into the system; A signature generator configured to compare the detected traffic information with previously introduced traffic information to generate a signature indicating a correlation, and to count and store a case in which the detected traffic information is in common with the previously introduced specific traffic information; A DB capable of coping with various types of abnormal traffic by storing the detected traffic information and the generated signature and deleting / updating by a predetermined time unit; A bandwidth charge ratio calculator configured to calculate a ratio of the traffic introduced into the system to the maximum bandwidth of the system; And an intrusion detection / blocking unit that detects and blocks the traffic when the incoming traffic is high traffic on the signature generator or high traffic on the maximum bandwidth charge ratio on the bandwidth charge ratio calculator. An abnormal traffic information analyzing apparatus is proposed.

즉, 선행문헌은 현재 유입되는 트래픽의 이상 여부를 동적으로 갱신되는 이전 트래픽 정보에 근거하여 판단하도록 하였기 때문에, 이전에 비하여 다양한 형태의 비정상 트래픽에 대한 대응성이 뛰어나다는 이점이 있다.That is, since the prior document is to determine the abnormality of the current incoming traffic based on the previous traffic information that is dynamically updated, there is an advantage that the response to the various types of abnormal traffic is superior to the previous.

그러나, 선행문헌을 포함하여 앞서 살펴본 종래기술은 다음과 같은 문제점을 포함하고 있다.However, the above-described prior art, including the prior document, includes the following problems.

(ⅰ) 우선, 종래기술은 특정 링크가 갖는 과거의 트래픽 정보를 근거로 비정상 트래픽을 분석한다는 점에서, 과거의 트래픽 정보가 축적되지 않는 상황에 대해서는 적용하기 곤란하다는 문제점이 있다.(Iii) First, since the conventional technology analyzes abnormal traffic based on past traffic information of a specific link, it is difficult to apply to a situation where past traffic information is not accumulated.

(ⅱ) 뿐만 아니라, 종래기술에서는 과거의 트래픽 정보를 기반으로 하여 트래픽의 정상 여부를 판정하도록 하고 있기 때문에, 현재의 백본 네트워크에서 발생하고 있는 트래픽의 상황에 맞지 않는 분석이 이루어질 가능성이 존재한다는 문제점이 있다. 즉, 특정한 링크에서 수집된 현재의 트래픽 플로우가 과거의 트래픽 분석 결과에 비추어 보면 비정상이라고 하더라도, 만약 분석 당시 현재의 백본 네트워크 상황에 비추어 볼 때 해당 특정 링크의 트래픽 플로우가 다른 링크들에서 수집된 트래픽 플로우의 패턴과 큰 차이가 없어 비정상이라고 보기 어려운 경우가 존재할 수 있음에도 불구하고, 선행문헌을 비롯한 종래기술에서는 과거의 트래픽 정보에만 기반하여 트래픽을 분석한다는 한계가 있다. 이러한 한계는 종래기술이 현 재의 백본 네트워크 전체에 대한 트래픽 패턴 분류를 염두에 두고 있지 않은 당연한 결과이기도 하다.(Ii) In addition, in the prior art, since it is determined whether the traffic is normal based on past traffic information, there is a possibility that an analysis that is not suitable for the situation of traffic occurring in the current backbone network exists. There is this. That is, even if the current traffic flow collected on a particular link is abnormal in the light of past traffic analysis results, the traffic flow of that specific link is collected from other links in the light of the current backbone network situation at the time of analysis. Although there may be a case where it is difficult to be considered abnormal because there is no big difference from the pattern of the flow, there is a limit in analyzing the traffic based on the past traffic information only in the prior art including the prior literature. This limitation is a natural result of the prior art not having the traffic pattern classification for the current backbone network as a whole.

참고로 상술한 선행문헌의 경우에는 최근의 트래픽 상황을 반영하기 위하여 데이터베이스에 저장된 이전의 트래픽 정보를 일정 시간이 경과하면 삭제 및 갱신하도록 구성하고는 있으나, 이 경우에도 유입 트래픽의 비정상 여부를 판별하는데 있어서 기준이 되는 것은 가장 최근에 수집된(즉, 갱신된) 과거의 기준 트래픽 정보에 지나지 않기 때문에, 현재 백본 네트워크의 다른 링크들에서 발생하고 있는 트래픽 상황을 트래픽의 분석에 반영하지 못하는 한계를 여전히 포함하고 있다.For reference, the above-mentioned prior art documents are configured to delete and update the previous traffic information stored in the database after a predetermined time in order to reflect the recent traffic situation. Since the baseline is nothing more than the most recently collected (ie updated) baseline traffic information, there is still a limit to the analysis of traffic that is currently occurring on other links in the backbone network. It is included.

(ⅲ) 따라서, 종래기술에 따른 비정상 트래픽 분석 장치는 적어도 기준 트래픽 정보를 확보할 수 있을 정도로 충분한 시간 동안은, 링크에서 수집된 과거의 트래픽 데이터를 저장해야 하는 문제점이 있다.(I) Therefore, the apparatus for analyzing abnormal traffic according to the prior art has a problem of storing past traffic data collected on a link for at least enough time to secure reference traffic information.

본 발명은 상술한 종래기술의 문제점을 감안하여 이루어진 것으로, 과거의 트래픽 데이터가 아니라 현재의 다른 인터페이스에서 발생하는 트래픽 패턴을 비교 대상으로 삼아 비정상 트래픽을 분석함으로써, 과거의 트래픽 데이터에 대한 저장이 필요 없고 백본 네트워크의 전체에 대한 트래픽 패턴 분류가 가능한 비정상 트래픽 분석 방법 및 장치를 제공하는 것을 목적으로 한다.SUMMARY OF THE INVENTION The present invention has been made in view of the above-described problems of the prior art, and it is necessary to store past traffic data by analyzing abnormal traffic by using traffic patterns generated by other current interfaces as comparison targets rather than past traffic data. It is an object of the present invention to provide an abnormal traffic analysis method and apparatus capable of classifying traffic patterns for an entire backbone network.

상기 목적을 달성하기 위하여, 본 발명은, 복수의 링크로 구성된 네트워크의 비정상 트래픽을 분석하는 방법에 있어서, 상기 복수의 링크로부터 트래픽을 수집하는 제 1 단계; 상기 트래픽의 패턴을 클러스터링한 복수의 클러스터 간의 유사도를 결정하고, 상기 결정된 유사도에 따라 상기 복수의 클러스터를 계층화하는 제 2 단계; 및 상기 계층화된 복수의 클러스터를 소정의 임계 유사도에 따라 분류함으로써, 적어도 하나의 클러스터를 각각 포함하는 복수의 클러스터 분류군을 생성하고, 상기 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 제 3 단계를 포함하는 것을 특징으로 하는 비정상 트래픽 분석 방법을 제공한다.In order to achieve the above object, the present invention provides a method for analyzing abnormal traffic of a network composed of a plurality of links, comprising: a first step of collecting traffic from the plurality of links; Determining a similarity between the plurality of clusters clustering the traffic patterns and stratifying the plurality of clusters according to the determined similarity; And classifying the plurality of layered clusters according to a predetermined threshold similarity, thereby generating a plurality of cluster classification groups each including at least one cluster, and determining abnormal traffic according to the number of traffic included in the cluster classification group. It provides an abnormal traffic analysis method comprising the three steps.

또한 본 발명은, 복수의 링크로 구성된 네트워크의 비정상 트래픽을 분석하는 장치에 있어서, 상기 복수의 링크로부터 트래픽을 수집하는 트래픽 수집부; 상기 트래픽 수집부에 의해 수집된 상기 트래픽의 패턴을 클러스터링한 복수의 클러스터 간의 유사도를 결정하고, 상기 결정된 유사도에 따라 상기 복수의 클러스터를 계층화하는 계층적 클러스터링부; 및 상기 계층적 클러스터링부에 의해 계층화된 상기 복수의 클러스터를 소정의 임계 유사도에 따라 분류함으로써, 적어도 하나의 클러스터를 각각 포함하는 복수의 클러스터 분류군을 생성하고, 상기 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 비정상 트래픽 판정부를 구비하는 것을 특징으로 하는 비정상 트래픽 분석 장치를 제공한다.The present invention also provides a device for analyzing abnormal traffic of a network composed of a plurality of links, the apparatus comprising: a traffic collector configured to collect traffic from the plurality of links; A hierarchical clustering unit which determines similarity between a plurality of clusters clustering the traffic patterns collected by the traffic collection unit, and stratifies the plurality of clusters according to the determined similarity; And classifying the plurality of clusters layered by the hierarchical clustering unit according to a predetermined threshold similarity, thereby generating a plurality of cluster classification groups each including at least one cluster, and generating a plurality of cluster classification groups. According to the present invention, there is provided an abnormal traffic analyzing apparatus comprising an abnormal traffic determining unit for determining abnormal traffic.

즉, 종래기술에 있어서 비정상 트래픽을 판정하는 방식을 살펴보면, ⅰ) 현재 특정 링크에서 검출된 트래픽 정보를 이전에 해당 링크에서 유입된 트래픽 정보와 비교하여 상관 관계를 나타내는 시그너쳐를 생성하고, 현재 검출된 트래픽 정보가 이전에 유입된 특정 트래픽 정보와 공통되는 경우 카운트를 높여서 이 카운트가 높은 것을 비정상 트래픽으로 간주하거나, ⅱ) 시스템으로 유입된 트래픽을 해당 시스템의 최대 대역폭과 비교하여 차지하는 비율이 높은 트래픽을 비정상 트래픽으로 판정하는 방식이 제공되었다.That is, in the prior art, a method of determining abnormal traffic includes: i) generating a signature indicating a correlation by comparing traffic information currently detected on a specific link with traffic information previously introduced on the corresponding link, and currently detecting the detected traffic. If the traffic information is common with the specific traffic information that has flowed in previously, increase the count to consider the high count as abnormal traffic, or ii) identify the traffic that has a high percentage of the traffic entering the system compared to the maximum bandwidth of the system. A method of determining abnormal traffic has been provided.

이에 반해 본 발명은, 트래픽 패턴의 유사도에 따라 계층적으로 클러스터링된 복수의 클러스터에 대해 임계 유사도를 적용하여 복수의 클러스터를 복수의 클러스터 분류군으로서 다시 클러스터링하고, 복수의 클러스터 분류군에 속하는 트래픽의 수에 따라 정상 트래픽과 비정상 트래픽을 실시간으로 상대 평가하는 비정상 트래픽 판정 기술을 제안하고 있다.In contrast, the present invention applies a critical similarity to a plurality of clusters hierarchically clustered according to the similarity of traffic patterns to cluster the plurality of clusters again as a plurality of cluster classification groups, and to determine the number of traffic belonging to the plurality of cluster classification groups. Therefore, we propose an abnormal traffic determination technique for evaluating normal traffic and abnormal traffic in real time.

참고로, 본 발명에서 계층화된 복수의 클러스터를 소정의 임계 유사도에 따라 재분류한 결과물을 "클러스터 분류군"이라고 명명하였지만, 사실 클러스터 분류군이라는 것도 결국 복수의 클러스터가 뭉쳐져서 생성된 또 다른 클러스터라고 이 해하여도 무방하다.For reference, in the present invention, the result of reclassifying a plurality of hierarchical clusters according to a predetermined critical similarity is referred to as a "cluster taxon", but in fact, a cluster taxonomy is also referred to as another cluster created by aggregating a plurality of clusters. It may harm.

한편, 본 발명에 따른 상기 비정상 트래픽 분석 방법에 있어서, 상기 네트워크는 SNMP(Simple Network Management Protocol)를 지원하고, 상기 제 1 단계는, 상기 SNMP를 이용하여, 상기 네트워크에 속하는 라우터로부터 직접 상기 트래픽을 수집함으로써 수행될 수 있다. 또한, 상기 네트워크를 제공하는 시스템이 NMS(Network Management System)를 포함하는 경우, 본 발명에 따른 상기 비정상 트래픽 분석 방법에 있어서 상기 제 1 단계는, 상기 NMS로부터 제공되는 상기 트래픽에 관한 정보를 수집함으로써 수행될 수도 있다.On the other hand, in the abnormal traffic analysis method according to the invention, the network supports Simple Network Management Protocol (SNMP), and the first step, using the SNMP, the traffic directly from a router belonging to the network By collecting. In addition, when the system providing the network includes a network management system (NMS), the first step in the abnormal traffic analysis method according to the present invention, by collecting information on the traffic provided from the NMS It may also be performed.

본 발명에 있어서 상기 유사도라는 개념은, 적어도 2 이상의 클러스터 간에 연산된 상관 계수(Correlation Coefficient)일 수 있으며, 다시 이 상관 계수에는 피어슨(Pearson) 상관 계수가 포함될 수 있다.In the present invention, the concept of similarity may be a correlation coefficient calculated between at least two clusters, and the correlation coefficient may include a Pearson correlation coefficient.

한편, 상기 임계 유사도가 복수인 경우, 본 발명에 따른 비정상 트래픽 분석 방법 중 상기 제 3 단계는, 상기 비정상 트래픽으로 판정되는 트래픽의 개수가 소정 개수 이하로 될 때까지 상기 임계 유사도를 조정함으로써 반복적으로 수행되는 것이 바람직하다. 이 경우, 비정상 트래픽을 판정하는 정밀도를 네트워크의 상황에 맞게 가변적으로 적용하는 것이 가능하다.On the other hand, when there are a plurality of critical similarities, the third step of the abnormal traffic analysis method according to the present invention is repeatedly performed by adjusting the critical similarity until the number of traffic determined as the abnormal traffic becomes less than a predetermined number. It is preferable to carry out. In this case, it is possible to variably apply the precision for determining abnormal traffic in accordance with the network situation.

나아가 본 발명에 따른 비정상 트래픽 분석 방법 중 상기 제 3 단계에 있어서, 상기 클러스터 분류군에 포함된 트래픽의 개수가 가장 큰 클러스터 분류군 이외의 클러스터 분류군에 속하는 트래픽이 비정상 트래픽으로서 판정되도록 구성할 수 있다. 물론, 트래픽의 개수가 가장 작은 클러스터 분류군에 속하는 트래픽을 비 정상 트래픽으로 판정하도록 구성하는 것도 가능하다.Furthermore, in the third step of the abnormal traffic analysis method according to the present invention, traffic belonging to a cluster classification group other than the cluster classification group having the largest number of traffic included in the cluster classification group may be determined as abnormal traffic. Of course, it is also possible to configure the traffic belonging to the cluster classification group having the smallest number of traffic as abnormal traffic.

본 발명에 의하면, 과거의 트래픽 데이터가 아니라 현재의 다른 인터페이스(혹은 백본 네트워크에 있어서 가입자의 트래픽을 모으는 경계 링크)에서 발생하는 트래픽 패턴을 비교 대상으로 삼아 비정상 트래픽을 분석함으로써, 과거의 트래픽 데이터에 대한 저장이 필요 없고 백본 네트워크의 전체에 대한 트래픽 패턴의 분류가 가능하다.According to the present invention, by comparing traffic patterns generated from other current interfaces (or boundary links that collect subscriber traffic in a backbone network) instead of past traffic data, abnormal traffic is analyzed to analyze the abnormal traffic data. No storage is required and traffic patterns can be classified for the entire backbone network.

즉, 본 발명에 있어서는 네트워크를 구성하는 복수의 링크로부터 수집되는 트래픽 데이터의 비정상 여부에 대한 분석을 마치기까지 일시적으로 트래픽 데이터를 저장할 데이터베이스만 필요할 뿐이며, 종래기술에서와 같이 기준 트래픽 정보를 얻기 위해 일정 기간 동안 특정 링크에 대한 과거의 트래픽 데이터를 데이터베이스에 저장할 필요가 없다.That is, in the present invention, only a database to temporarily store traffic data is needed until the analysis of abnormality of traffic data collected from a plurality of links constituting the network is completed. There is no need to store historical traffic data for a particular link in the database for a period of time.

뿐만 아니라, 종래기술의 경우에는 특정 링크에 대한 과거의 트래픽 데이터에 따라 기준 트래픽 정보를 추출하고, 이를 다시 해당 특정 링크의 현재 트래픽 정보와 비교하도록 하였으나, 본 발명의 경우에는 분석하고자 하는 네트워크에 속하는 복수의 링크 전체에서 현재 발생하고 있는 트래픽 정보를 계층적 클러스터링한 것을 바탕으로 비정상 트래픽 분석을 실시하기 때문에, 백본 네트워크 전체에 대한 트래픽 패턴의 분류가 가능하다.In addition, in the prior art, the reference traffic information was extracted according to past traffic data of a specific link, and compared with the current traffic information of the specific link. However, in the present invention, it belongs to the network to be analyzed. Since abnormal traffic analysis is performed based on hierarchical clustering of traffic information currently occurring in a plurality of links, it is possible to classify traffic patterns for the entire backbone network.

이제 첨부도면을 참조하여 본 발명의 바람직한 실시형태에 대하여 상세하게 설명한다.Reference will now be made in detail to the preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 본 발명에 따른 비정상 트래픽 분석 시스템(400)을 나타내는 도면이다. 도 1을 참조하면, 비정상 트래픽 분석 시스템(400)은 예컨대 백본 네트워크에 있어서 가입자의 트래픽을 수집하는 복수의 경계 링크들이 갖는 트래픽 패턴을 분석하여 이를 계층적으로 클러스터링하고, 클러스터 간의 유사도를 측정하여 비정상 트래픽을 분석하는 비정상 트래픽 분석 장치(100)와, 복수의 링크로 구성된 네트워크의 상태를 진단하고 네트워크를 효율적으로 관리할 수 있도록 하는 NMS(200) 또는 소형 회선들로부터 데이터를 수집하여 전송할 수 있는 대규모 전송회선으로 구성된 백본 네트워크를 통하여 인터넷 서비스를 제공하고 SNMP를 지원하는 ISP(300, Internet Service Provider)를 포함하고 있다.1 is a diagram illustrating an abnormal traffic analysis system 400 according to the present invention. Referring to FIG. 1, the abnormal traffic analysis system 400 analyzes traffic patterns of a plurality of boundary links that collect subscriber traffic in a backbone network, clusters them hierarchically, and measures abnormality between clusters. An abnormal traffic analysis device 100 for analyzing traffic and a large-scale that can collect and transmit data from the NMS 200 or small circuits for diagnosing the state of a network composed of a plurality of links and efficiently managing the network. It includes an Internet Service Provider (300) that provides Internet services and supports SNMP through a backbone network composed of transmission lines.

여기서, 본 발명에 따른 비정상 트래픽 분석 방법을 적용할 수 있는 백본 네트워크는 백본 라우터(미도시) 간의 연결을 구성하는 내부 링크 및 가입자의 트래픽을 수집하는 경계 링크로 구분할 수 있다. 경계 링크는 다수의 가입자의 트래픽을 하나의 링크에 통합하는 구성요소로서, 일반적으로 트래픽의 사용량이 매우 안정적인 특성을 가지고 있는데, 본 발명은 이처럼 백본 네트워크의 경계 링크들이 비슷한 트래픽 패턴을 가지는 특성에 착안하여 경계 링크의 트래픽들을 비슷한 패턴을 가지고 있는 그룹으로 분류하는 기법, 다시 말해 계층적 클러스터링 기법을 제시한다.Here, the backbone network to which the abnormal traffic analysis method according to the present invention can be applied may be divided into an internal link constituting a connection between backbone routers (not shown) and a boundary link collecting traffic of a subscriber. The boundary link is a component that integrates the traffic of multiple subscribers into one link, and generally has a very stable traffic usage. The present invention focuses on the characteristics of the boundary links of the backbone network having similar traffic patterns. In this paper, we propose a method of classifying traffic on the boundary link into groups with similar patterns, that is, hierarchical clustering.

즉, 백본 네트워크에 대해 적용할 경우, 본 발명은 경계 링크의 트래픽을 계층적 클러스터링 기법을 이용하여 정상 클러스터와 비정상 클러스터로 구분하는 기 술로 이해할 수 있다.That is, when applied to the backbone network, the present invention can be understood as a technique of dividing the traffic of the boundary link into a normal cluster and an abnormal cluster using a hierarchical clustering technique.

다시 도 1을 참조하면, 본 발명의 핵심적인 구성인 비정상 트래픽 분석 장치(100)는, 네트워크를 구성하는 복수의 링크로부터 트래픽을 수집하는 트래픽 수집부(110), 트래픽 수집부(110)에 의해 수집된 트래픽을 클러스터링한 복수의 클러스터 간의 유사도를 결정하고, 결정된 유사도에 따라 복수의 클러스터를 계층화하는 계층적 클러스터링부(120) 및 계층적 클러스터링부(120)에 의해 계층화된 복수의 클러스터를 소정의 임계 유사도에 따라 분류함으로써, 적어도 하나의 클러스터를 각각 포함하는 복수의 클러스터 분류군을 생성하고, 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 비정상 트래픽 판정부(130)로 이루어져 있다.Referring back to FIG. 1, the abnormal traffic analysis apparatus 100, which is a core configuration of the present invention, may be configured by a traffic collector 110 and a traffic collector 110 that collect traffic from a plurality of links constituting a network. A plurality of clusters layered by the hierarchical clustering unit 120 and the hierarchical clustering unit 120 determine a similarity between a plurality of clusters clustering the collected traffic, and layer the plurality of clusters according to the determined similarity. By classifying according to the threshold similarity, a plurality of cluster classification groups each including at least one cluster are generated, and the abnormal traffic determination unit 130 determines abnormal traffic according to the number of traffic included in the cluster classification group.

다음으로, 도 2는 본 발명에 따른 비정상 트래픽 분석 방법을 나타내는 순서도이며, 도 3 내지 도 5b는 도 2의 프로세스를 설명하기 위한 도면이다. 도 2 내지 도 5b를 참조하여, 도 1에 도시된 비정상 트래픽 분석 장치(100)가 본 발명에 따른 비정상 트래픽 분석 방법을 수행하는 프로세스를 구체적으로 설명한다.Next, FIG. 2 is a flowchart illustrating a method for analyzing abnormal traffic according to the present invention, and FIGS. 3 to 5B are views for explaining the process of FIG. 2 to 5B, a process of performing the abnormal traffic analysis method according to the present invention by the abnormal traffic analysis apparatus 100 illustrated in FIG. 1 will be described in detail.

Ⅰ. Ⅰ. 트래픽의Of traffic 수집 collection

먼저, 스텝 S1에서 트래픽 수집부(110)는 SNMP를 이용하여 백본 라우터(미도시)로부터 직접 트래픽을 수집하거나, NMS(200)와 연동하여 트래픽 정보를 구축함으로써, 백본 네트워크의 라우터 인터페이스 트래픽을 수집한다.First, in step S1, the traffic collecting unit 110 collects traffic directly from a backbone router (not shown) using SNMP, or establishes traffic information in association with the NMS 200, thereby collecting router interface traffic of the backbone network. do.

도 3은 본 발명에 따른 트래픽 수집부(110)에 의해 수집된 예시적인 복수의 트래픽 패턴을 나타내는 도면으로서, 여기에는 예컨대 트래픽 수집부(110)가 수집 한 라우터 인터페이스의 트래픽 A 내지 H 중 트래픽 A, B, C 및 H가 갖는 예시적인 패턴이 도시되어 있다.3 is a diagram illustrating an exemplary plurality of traffic patterns collected by the traffic collector 110 according to the present invention, for example, traffic A among the traffic A to H of the router interface collected by the traffic collector 110. Exemplary patterns with B, C and H are shown.

트래픽 수집부(110)는 예컨대 평균 5분 또는 평균 1시간 등과 같이 일정 시간 동안 트래픽 현황 정보를 수집하고, 이렇게 수집된 평균적인 트래픽 현황 정보를 비정상 트래픽 분석 장치(100)의 데이터베이스(140)에 저장한다. 참고로 데이터베이스(140)는 도 1에서 트래픽 수집부(110)와 물리적으로 별개의 구성요소로서 도시되어 있지만, 트래픽 수집부(110)의 일 구성요소로서 데이터베이스가 통합된 구조로 구성하는 것도 가능하다. 또한, 본 발명에 있어서 예컨대 5분 내지 1시간 가량의 시간 동안 수집된 트래픽 현황 정보는 특정한 경계 링크에서의 비정상 트래픽 발생 여부를 판정하기 위한 기준으로서 사용하기 위한 것이 아니며, 백본 네트워크 전체에서 현재 발생하고 있는 트래픽 정보라는 점에 주의한다.The traffic collecting unit 110 collects traffic state information for a predetermined time such as, for example, an average of 5 minutes or an hour of 1 hour, and stores the collected average traffic state information in the database 140 of the abnormal traffic analysis apparatus 100. do. For reference, although the database 140 is illustrated as a physically separate component from the traffic collector 110 in FIG. 1, the database 140 may be configured as an integrated structure of the database as a component of the traffic collector 110. . In addition, in the present invention, traffic state information collected, for example, for 5 minutes to 1 hour, is not intended to be used as a criterion for determining whether or not abnormal traffic occurs on a specific boundary link. Note that this is traffic information.

Ⅱ. 계층적 클러스터링II. Hierarchical Clustering

다음으로, 스텝 S2에서 도 1의 계층적 클러스터링부(120)는 트래픽 수집부(110)에 의해 수집된 라우터 인터페이스 트래픽을 클러스터링하여 복수의 클러스터를 생성하고, 이 복수의 클러스터 간의 유사도를 결정하여 결정된 유사도에 따라 복수의 클러스터를 계층화하는 프로세스를 수행하는데, 그 구체적인 예가 도 4에 도시되어 있다.Next, in step S2, the hierarchical clustering unit 120 of FIG. 1 generates a plurality of clusters by clustering router interface traffic collected by the traffic collection unit 110, and determines similarities between the plurality of clusters. A process of stratifying a plurality of clusters according to similarity is performed, a specific example of which is shown in FIG. 4.

도 4는 본 발명에 따른 계층적 클러스터링부(120)에 의해 복수의 트래픽 패턴이 갖는 유사도에 따라 트래픽을 클러스터화하고 이를 계층화하는 프로세스를 설명하기 위한 도면으로서, 가로축은 트래픽 노드이고 세로축은 +1에서 -1에 이르는 유사도로 구성되어 있는 덴드로그램(Dendrogram)의 형태로 나타나 있다.4 is a diagram illustrating a process of clustering and layering traffic according to the similarity of a plurality of traffic patterns by the hierarchical clustering unit 120 according to the present invention, where the horizontal axis is a traffic node and the vertical axis is +1. It is shown in the form of a dendrogram composed of similarities ranging from -1 to -1.

일단 도 3과 같은 형태의 트래픽을 트래픽 수집부(110)가 수집하고 나면, 계층적 클러스터링부(120)는 먼저 예컨대 트래픽 A 내지 H에 이르는 8개의 트래픽을 8개의 클러스터로 간주한다. 그리고, A 내지 H 중 예컨대 2개의 트래픽 간에 존재하는 유사도를 측정하는데, 여기서 "유사도"는 공지된 상관 분석법에 따른 "상관 계수"를 포함하는 개념이다.Once the traffic collection unit 110 collects the traffic of FIG. 3, the hierarchical clustering unit 120 first considers eight traffic, for example, traffic A to H, as eight clusters. And measure the similarity that exists between two traffics, for example A to H, where "similarity" is a concept including "correlation coefficient" according to a known correlation analysis method.

설명의 편의를 위해, 2개의 트래픽 혹은 클러스터 간에 존재하는 선형적인 상관 관계를 분석한 단순 상관 분석법을 적용하고, 2개 변수 X, Y의 상관성을 측정하기 위해 일반적으로 사용되는 피어슨 상관 계수를 유사도로 삼는 경우를 예시하면, 본 발명에 따른 유사도는 다음의 식 (1)과 같은 피어슨 상관 계수 ｒ로 표현될 수 있다. N은 예컨대 X={1,2,3,4,5}이고, Y={2,2,3,4,5}라면, N은 각 변수(X,Y)의 엘리먼트 개수인 5이다. 예를 들어, 1시간 단위로 측정한 트래픽 24개를 하나의 인터페이스 트래픽이라고 보면 N은 24가 된다.For simplicity, we apply a simple correlation method that analyzes the linear correlations that exist between two traffics or clusters, and compare the Pearson correlation coefficients that are commonly used to measure the correlation between two variables X and Y. For example, the degree of similarity according to the present invention may be represented by Pearson's correlation coefficient r as shown in Equation (1) below. N is for example X = {1,2,3,4,5}, and if Y = {2,2,3,4,5}, N is 5, which is the number of elements in each variable (X, Y). For example, if 24 traffics measured per hour are regarded as one interface traffic, N is 24.

(1)

(One)

피어슨 상관 계수 ｒ(즉, 유사도)은 도 4의 덴드로그램의 세로축에도 표시된 바와 같이 +1에서 -1에 이르는 값을 가질 수 있는데, ｒ이 +1인 경우는 2개의 클러스터가 완전히 선형적인 상관 관계를 갖는 경우를 의미하고, ｒ이 -1인 경우는 2개의 클러스터가 완전히 역의 선형적인 상관 관계(다시 말해 음의 선형적인 상관 관 계)를 갖는 경우를 의미하며, ｒ이 0인 경우는 무(無)상관, 좀 더 정확히 말하자면 2개의 클러스터 간에 선형의 상관 관계가 존재하지 않는 경우를 나타낸다.The Pearson's correlation coefficient r (ie, similarity) can have a value ranging from +1 to -1, as indicated on the longitudinal axis of the dendrogram of FIG. If r is -1, it means that two clusters have a completely inverse linear correlation (that is, a negative linear correlation), and if r is 0 Correlation, or more precisely, indicates that there is no linear correlation between the two clusters.

계층적 클러스터링부(120)는 이와 같은 상관 분석법에 따라 클러스터화된 트래픽 A 내지 H에 대한 유사도를 측정한 다음, 비슷한 패턴을 갖는 클러스터를 합쳐서 또 다른 클러스터로 만들어 주는데, 예컨대 도 4에 있어서 클러스터 1에 해당하는 트래픽 A, D, E는 비슷한 패턴을 가지는 것으로 상관 분석되었음을 의미한다. 또한, 도 4에서 트래픽 B, F와 트래픽 C, H는 서로 유사한 트래픽 패턴을 가지고 있어 클러스터 2 및 클러스터 4로서 각각 클러스터화되어 있음을 알 수 있으며, 트래픽 G는 유사한 패턴을 갖는 트래픽이 존재하지 않아 단독으로 클러스터 3으로서 클러스터화되어 있음을 확인할 수 있다.The hierarchical clustering unit 120 measures similarity for clustered traffic A to H according to the correlation analysis method, and then combines clusters having similar patterns into another cluster, for example, cluster 1 in FIG. 4. The traffic A, D, and E corresponding to have a similar pattern and have been correlated. In addition, in FIG. 4, traffic B, F, and traffic C, H have similar traffic patterns, and thus, are clustered as cluster 2 and cluster 4, respectively, and traffic G has no traffic having similar patterns. It can be seen that clustered alone as cluster 3.

도 4의 예에서, 클러스터 1은 클러스터 2와의 관계에서는 유사도 L1을 가지므로, L2의 유사도를 갖는 클러스터 3이나 L3의 유사도를 갖는 클러스터 4보다 높은 유사도를 가지고 있음을 알 수 있다. 따라서, 도 4의 예에서 클러스터 1은 다른 클러스터들 중에서도 클러스터 2와 가장 유사도가 높고 클러스터 4와 가장 유사도가 낮다. 이와 같이, 백본 네트워크로부터 현재 수집된 복수의 트래픽 패턴들의 유사도에 따라, 클러스터화된 복수의 클러스터는 계층적인 트리 구조의 덴드로그램으로 표현될 수 있다.In the example of FIG. 4, since cluster 1 has a similarity L1 in a relationship with cluster 2, it can be seen that cluster 1 has a higher similarity than cluster 3 having a similarity of L2 or cluster 4 having a similarity of L3. Therefore, in the example of FIG. 4, cluster 1 has the highest similarity to cluster 2 and the lowest similarity to cluster 4 among other clusters. As such, according to the similarity of the plurality of traffic patterns currently collected from the backbone network, the plurality of clustered clusters may be represented as a dendrogram of a hierarchical tree structure.

Ⅲ. 비정상 III. abnormal 트래픽의Of traffic 판정 Judgment

도 2에서, 스텝 S2가 완료된 후 도 1의 비정상 트래픽 판정부(130)는 스텝 S3에서 계층적 클러스터링부(120)에 의해 계층화된 복수의 클러스터를 소정의 임계 유사도에 따라 분류함으로써, 적어도 하나의 클러스터를 각각 포함하는 복수의 클러스터 분류군을 생성하고, 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 동작을 수행한다.In FIG. 2, after step S2 is completed, the abnormal traffic determination unit 130 of FIG. 1 classifies the plurality of clusters stratified by the hierarchical clustering unit 120 according to a predetermined threshold similarity in step S3, thereby providing at least one A plurality of cluster classification groups each including a cluster are generated and an abnormal traffic is determined according to the number of traffic included in the cluster classification group.

도 5a 및 도 5b는 본 발명에 따른 비정상 트래픽 판정부(130)에 의해 복수의 클러스터 분류군을 생성하되, 서로 다른 임계 유사도에 따라 복수의 클러스터 분류군을 생성하여 비정상 트래픽을 판정하는 프로세스를 설명하기 위한 도면이다.5A and 5B illustrate a process of determining abnormal traffic by generating a plurality of cluster classification groups by the abnormal traffic determination unit 130 according to the present invention, and generating a plurality of cluster classification groups according to different threshold similarities. Drawing.

먼저 도 5a를 참조하면, 예컨대 임계 유사도가 0.7이고 L1>0.7>L2>0.4>L3라고 가정하면, 앞서 도 4에서 가장 유사하지만 동일한 클러스터로 클러스터화되지는 않았던 클러스터 1 및 클러스터 2가 도 5a에 도시된 바와 같이 새로운 클러스터 분류군 1로서 다시 클러스터링된다. 왜냐하면, 클러스터 1 및 클러스터 2는 임계 유사도 0.7의 기준에서 보면 "유사"한 것으로 판정하여도 무방하기 때문이다.Referring first to FIG. 5A, for example, assuming that the critical similarity is 0.7 and L1> 0.7> L2> 0.4> L3, Cluster 1 and Cluster 2, which were the most similar in FIG. 4 but not clustered into the same cluster, are shown in FIG. 5A. As shown, clustered again as a new cluster taxon 1. This is because cluster 1 and cluster 2 may be determined to be "similar" in the criterion of threshold similarity 0.7.

그러나, 도 4의 클러스터 3과 클러스터 4는 도 5a에서 클러스터 분류군 1에 편입되지 못하는데, 그 이유는 이들이 클러스터 분류군 1과 갖는 유사도가 각각 L2 및 L3로서 임계 유사도인 0.7보다 낮은 유사도를 가질 뿐이기 때문이다. 따라서, 도 4의 클러스터 3과 클러스터 4는 클러스터 분류군 2와 클러스터 분류군 3으로서 클러스터 분류군 1과는 다른 클러스터로서 각각 다시 클러스터링된다.However, clusters 3 and 4 of FIG. 4 are not incorporated into cluster class 1 in FIG. 5A because the similarities they have with cluster class 1 have L2 and L3 similarities below the critical similarity of 0.7, respectively. to be. Therefore, clusters 3 and 4 of FIG. 4 are clustered again as cluster class 2 and cluster class 3 as clusters different from cluster class 1 respectively.

이제 본 발명에 따른 비정상 트래픽 판정부(130)는 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 동작을 수행하는데, 예컨대 클러스터 분류군에 포함된 트래픽의 개수가 가장 작은 클러스터 분류군에 속하는 트래픽을 비정상 트래픽으로서 판정하도록 구성할 수 있다.Now, the abnormal traffic determining unit 130 according to the present invention performs an operation for determining abnormal traffic according to the number of traffic included in the cluster classification group, for example, traffic belonging to the cluster classification group having the smallest number of traffic included in the cluster classification group. Can be determined as abnormal traffic.

즉, 도 5a의 예에서 임계 유사도 0.7을 적용함으로서 클러스터 분류군 1은 트래픽 A, D, E, B, F 등 5개의 트래픽을 가지게 되는 반면, 클러스터 분류군 2는 1개의 트래픽을, 클러스터 분류군 3은 2개의 트래픽을 포함하고 있을 뿐이기 때문에, 클러스터 분류군 2 및 클러스터 분류군 3은 가장 많은 트래픽을 가지는 클러스터 분류군 1에 속하지 않아 비정상 트래픽 판정부(130)에 의하여 "비정상 클러스터"로서 결정되고, 이 비정상 클러스터에 속하는 트래픽 G, C 및 H 3개의 트래픽이 비정상 트래픽으로서 최종 판정된다.That is, by applying the threshold similarity 0.7 in the example of FIG. 5A, cluster taxon 1 has five traffic, such as traffic A, D, E, B, and F, while cluster taxon 2 is one traffic, and cluster taxon 3 is 2 traffic. Since it contains only 2 traffics, cluster classification group 2 and cluster classification group 3 do not belong to cluster classification group 1 having the most traffic, and are determined by the abnormal traffic determination unit 130 as an "abnormal cluster". The traffics G, C and H belonging to three traffic are finally determined as abnormal traffic.

한편, 본 발명은 비정상 트래픽으로 판정되는 트래픽의 개수가 소정 개수 이하로 될 때까지 임계 유사도를 조정함으로써 반복적으로 비정상 트래픽을 판정하는 동작을 수행하도록 구성하는 것도 가능한데, 도 5b는 임계 유사도가 0.4로 조정된 경우를 나타낸다.On the other hand, the present invention can be configured to repeatedly perform the operation of determining the abnormal traffic by adjusting the threshold similarity until the number of traffic determined to be abnormal traffic is less than or equal to a predetermined number, Figure 5b has a threshold similarity of 0.4 It shows the case where it was adjusted.

도 5b를 참조하면, 예컨대 새로운 임계 유사도가 0.4이고 L1>0.7>L2>0.4>L3라고 가정하면, 도 4의 클러스터 3이 도 4의 클러스터 1 및 클러스터 2와 함께 클러스터 분류군 1로서 다시 클러스터링되고, 트래픽 C와 H를 포함하는 도 4의 클러스터 4는 임계 유사도인 0.4보다도 낮은 유사도를 가지고 있어 도 5b에 있어서 비정상 트래픽 판정부(130)에 의해 (클러스터 분류군 1로서 클러스터링되지 못하고) 클러스터 분류군 2로서 다시 클러스터링된다. 따라서 도 5b의 경우, 비정상 트래픽으로 분류되는 트래픽의 개수가 도 5a의 경우와 비교할 때 3개에서 2개로 줄어들어, 도 5a의 경우보다 더 정밀한 비정상 트래픽 분석이 이루어졌음을 알 수 있다.Referring to FIG. 5B, for example, assuming that the new critical similarity is 0.4 and L1> 0.7> L2> 0.4> L3, cluster 3 of FIG. 4 is clustered again as cluster taxon 1 with cluster 1 and cluster 2 of FIG. 4, Cluster 4 of FIG. 4 including traffic C and H has a similarity lower than 0.4, which is a critical similarity, and is again returned as cluster class 2 by the abnormal traffic determination unit 130 (not clustered as cluster class 1) in FIG. 5B. Are clustered. Therefore, in the case of FIG. 5B, the number of traffic classified as abnormal traffic is reduced from three to two compared to the case of FIG. 5A, and thus, more accurate abnormal traffic analysis is performed than in the case of FIG. 5A.

이상 본 발명의 실시형태에 대하여 상세하게 설명하였지만, 앞서 설명한 본 발명의 세부적인 사항들은 본 발명의 기술적 범위를 제한하기 위한 것이 아니라, 본 발명의 바람직한 구현예를 예시적으로 설명하기 위한 것이라는 점에 주의해야 한다.Although the embodiments of the present invention have been described in detail above, the details of the present invention described above are not intended to limit the technical scope of the present invention, but to exemplarily describe preferred embodiments of the present invention. Be careful.

예컨대, 앞서 설명한 예에서는 피어슨 상관 계수를 본 발명의 유사도로서 적용하고 있으나, 이는 2개의 트래픽 혹은 클러스터 간에 존재하는 선형적인 상관 관계를 분석하는 단순 상관 분석법의 경우를 예시한 것일 뿐이다. 따라서, 본 명세서에서 구체적으로 설명하지 않았지만, 3개 이상의 변수들 간 관계의 강도를 측정하는 다중 상관 분석법이나, 다중 상관 분석에서 다른 변수들과의 관계를 고정하고 두 변수만의 관계에 대한 강도를 나타내는 편상관계 분석법(Partial Correlation Analysis) 등 단순 상관 분석 이외에 공지의 상관 분석법을 적용하기 위하여 본 발명을 개조하거나 변경하는 것은 당업자에게는 본 발명에 제시된 기술적 사상으로부터 당연히 도출될 수 있는 사항에 지나지 않는다.For example, in the above-described example, the Pearson correlation coefficient is applied as the similarity of the present invention, but this is only an example of a simple correlation analysis method for analyzing a linear correlation existing between two traffics or clusters. Therefore, although not specifically described herein, in the multi-correlation method for measuring the strength of a relationship between three or more variables, or in a multi-correlation analysis, the relationship between other variables is fixed and the strength of the relationship between only two variables is fixed. Modification or modification of the present invention to apply a known correlation analysis method in addition to simple correlation analysis such as partial correlation analysis, which is shown, is merely a matter that can be naturally derived from those skilled in the art.

마찬가지로, 본 발명에서 채택하는 유사도의 경우, 피어슨 상관 계수 외에 데이터의 서열 척도를 기준으로 상관 관계를 분석하는 스피어만 상관 계수(Spearman Correlation Coefficient) 등 공지의 다른 상관 계수의 개념을 적용하는 것도 당연히 가능하다.Similarly, in the case of the similarity adopted by the present invention, it is naturally possible to apply the concept of other known correlation coefficients such as Spearman Correlation Coefficient, which analyzes the correlation based on the sequence measure of the data in addition to the Pearson correlation coefficient. Do.

따라서, 본 발명의 기술적 사상은 첨부한 청구범위에 의해서만 제한되어야 할 것이다.Therefore, the technical idea of the present invention should be limited only by the appended claims.

도 1은 본 발명에 따른 비정상 트래픽 분석 시스템(400)을 나타내는 도면.1 is a diagram illustrating an abnormal traffic analysis system 400 according to the present invention.

도 2는 본 발명에 따른 비정상 트래픽 분석 방법을 나타내는 순서도.2 is a flowchart illustrating a method for analyzing abnormal traffic according to the present invention.

도 3은 본 발명에 따른 트래픽 수집부(110)에 의해 수집된 예시적인 복수의 트래픽 패턴을 나타내는 도면.3 illustrates an exemplary plurality of traffic patterns collected by the traffic collector 110 in accordance with the present invention.

도 4는 본 발명에 따른 계층적 클러스터링부(120)에 의해 복수의 트래픽 패턴이 갖는 유사도(상관 계수)에 따라 트래픽을 클러스터화하고 이를 계층화하는 프로세스를 설명하기 위한 도면.4 is a diagram illustrating a process of clustering traffic and layering traffic according to the similarity (correlation coefficient) of a plurality of traffic patterns by the hierarchical clustering unit 120 according to the present invention.

도 5a 및 도 5b는 본 발명에 따른 비정상 트래픽 판정부(130)에 의해 복수의 클러스터 분류군을 생성하되, 서로 다른 임계 유사도에 따라 복수의 클러스터 분류군을 생성하여 비정상 트래픽을 판정하는 프로세스를 설명하기 위한 도면.5A and 5B illustrate a process of determining abnormal traffic by generating a plurality of cluster classification groups by the abnormal traffic determination unit 130 according to the present invention, and generating a plurality of cluster classification groups according to different threshold similarities. drawing.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 비정상 트래픽 분석 장치100 abnormal traffic analysis device

110 트래픽 수집부110 traffic collector

120 계층적 클러스터링부120 Hierarchical Clustering

130 비정상 트래픽 판정부130 abnormal traffic judgment unit

140 데이터베이스140 databases

200 NMS200 NMS

300 ISP300 ISP

400 비정상 트래픽 분석 시스템400 abnormal traffic analysis system

Claims

복수의 링크로 구성된 네트워크의 비정상 트래픽을 분석하는 방법에 있어서,In the method for analyzing abnormal traffic of a network composed of a plurality of links,

상기 복수의 링크로부터 트래픽을 수집하는 제 1 단계;A first step of collecting traffic from the plurality of links;

상기 트래픽의 패턴을 클러스터링한 복수의 클러스터 간의 유사도를 결정하고, 상기 결정된 유사도에 따라 상기 복수의 클러스터를 계층화하는 제 2 단계; 및Determining a similarity between the plurality of clusters clustering the traffic patterns and stratifying the plurality of clusters according to the determined similarity; And

상기 계층화된 복수의 클러스터를 소정의 임계 유사도에 따라 분류함으로써, 적어도 하나의 클러스터를 각각 포함하는 복수의 클러스터 분류군을 생성하고, 상기 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 제 3 단계를 포함하는 것을 특징으로 하는 비정상 트래픽 분석 방법.By classifying the plurality of layered clusters according to a predetermined threshold similarity, generating a plurality of cluster classification groups each including at least one cluster and determining abnormal traffic according to the number of traffic included in the cluster classification group An abnormal traffic analysis method comprising the step of.

제 1 항에 있어서,The method of claim 1,

상기 네트워크는 SNMP(Simple Network Management Protocol)를 지원하고,The network supports Simple Network Management Protocol (SNMP),

상기 제 1 단계는, 상기 SNMP를 이용하여, 상기 네트워크에 속하는 라우터로부터 직접 상기 트래픽을 수집함으로써 수행되는 것을 특징으로 하는 비정상 트래픽 분석 방법.And the first step is performed by collecting the traffic directly from a router belonging to the network using the SNMP.

제 1 항에 있어서,The method of claim 1,

상기 네트워크를 제공하는 시스템은 NMS(Network Management System)를 포함하고,The system providing the network includes a network management system (NMS),

상기 제 1 단계는, 상기 NMS로부터 제공되는 상기 트래픽에 관한 정보를 수집함으로써 수행되는 것을 특징으로 하는 비정상 트래픽 분석 방법.The first step is performed by collecting information on the traffic provided from the NMS.

제 1 항에 있어서,The method of claim 1,

상기 유사도는, 적어도 2 이상의 클러스터 간에 연산된 상관 계수(Correlation Coefficient)인 것을 특징으로 하는 비정상 트래픽 분석 방법.The similarity is an abnormal traffic analysis method, characterized in that the correlation coefficient (Correlation Coefficient) calculated between at least two or more clusters.

제 4 항에 있어서,The method of claim 4, wherein

상기 상관 계수는 피어슨(Pearson) 상관 계수인 것을 특징으로 하는 비정상 트래픽 분석 방법.The correlation coefficient is an abnormal traffic analysis method, characterized in that the Pearson correlation coefficient.

제 1 항에 있어서,The method of claim 1,

상기 제 3 단계는, 상기 비정상 트래픽으로 판정되는 트래픽의 개수가 소정 개수 이하로 될 때까지 상기 임계 유사도를 조정함으로써 반복적으로 수행되는 것을 특징으로 하는 비정상 트래픽 분석 방법.And the third step is performed repeatedly by adjusting the threshold similarity until the number of traffic determined to be abnormal traffic is less than or equal to a predetermined number.

제 1 항에 있어서,The method of claim 1,

상기 제 3 단계에 있어서, 상기 클러스터 분류군에 포함된 트래픽의 개수가 가장 큰 클러스터 분류군 이외의 클러스터 분류군에 속하는 트래픽이 비정상 트래픽으로서 판정되는 것을 특징으로 하는 비정상 트래픽 분석 방법.The abnormal traffic analysis method according to claim 3, wherein traffic belonging to a cluster classification group other than the cluster classification group having the largest number of traffic included in the cluster classification group is determined as abnormal traffic.

복수의 링크로 구성된 네트워크의 비정상 트래픽을 분석하는 장치에 있어서,An apparatus for analyzing abnormal traffic of a network composed of a plurality of links,

상기 복수의 링크로부터 트래픽을 수집하는 트래픽 수집부;A traffic collector configured to collect traffic from the plurality of links;

상기 트래픽 수집부에 의해 수집된 상기 트래픽의 패턴을 클러스터링한 복수의 클러스터 간의 유사도를 결정하고, 상기 결정된 유사도에 따라 상기 복수의 클러스터를 계층화하는 계층적 클러스터링부; 및A hierarchical clustering unit which determines similarity between a plurality of clusters clustering the traffic patterns collected by the traffic collection unit, and stratifies the plurality of clusters according to the determined similarity; And

상기 계층적 클러스터링부에 의해 계층화된 상기 복수의 클러스터를 소정의 임계 유사도에 따라 분류함으로써, 적어도 하나의 클러스터를 각각 포함하는 복수의 클러스터 분류군을 생성하고, 상기 클러스터 분류군에 포함된 트래픽의 개수에 따라 비정상 트래픽을 판정하는 비정상 트래픽 판정부를 구비하는 것을 특징으로 하는 비정상 트래픽 분석 장치.By classifying the plurality of clusters stratified by the hierarchical clustering unit according to a predetermined threshold similarity, a plurality of cluster classification groups each including at least one cluster are generated, and according to the number of traffic included in the cluster classification group. An abnormal traffic analysis apparatus comprising: an abnormal traffic determination unit for determining abnormal traffic.

제 8 항에 있어서,The method of claim 8,

상기 네트워크는 SNMP를 지원하고,The network supports SNMP,

상기 트래픽 수집부는, 상기 SNMP를 이용하여, 상기 네트워크에 속하는 라우터로부터 직접 상기 트래픽을 수집하는 것을 특징으로 하는 비정상 트래픽 분석 장치.The traffic collecting unit, the abnormal traffic analysis device, characterized in that for collecting the traffic directly from a router belonging to the network using the SNMP.

제 8 항에 있어서,The method of claim 8,

상기 네트워크를 제공하는 시스템은, 상기 비정상 트래픽 분석 장치와 연동 하는 NMS를 포함하고,The system providing the network includes an NMS interworking with the abnormal traffic analysis apparatus.

상기 트래픽 수집부는, 상기 NMS로부터 상기 트래픽에 관한 정보를 제공받아 상기 트래픽을 수집하는 것을 특징으로 하는 비정상 트래픽 분석 장치.The traffic collecting unit, the abnormal traffic analysis device, characterized in that for receiving the information on the traffic from the NMS to collect the traffic.

제 8 항에 있어서,The method of claim 8,

상기 유사도는, 적어도 2 이상의 클러스터 간에 연산된 상관 계수인 것을 특징으로 하는 비정상 트래픽 분석 장치.And the similarity is a correlation coefficient calculated between at least two clusters.

제 11 항에 있어서,The method of claim 11,

상기 상관 계수는 피어슨 상관 계수인 것을 특징으로 하는 비정상 트래픽 분석 장치.And the correlation coefficient is a Pearson correlation coefficient.

제 8 항에 있어서,The method of claim 8,

상기 비정상 트래픽 판정부는, 상기 비정상 트래픽으로 판정되는 트래픽의 개수가 소정 개수 이하로 될 때까지 상기 임계 유사도를 조정하는 것을 특징으로 하는 비정상 트래픽 분석 장치.The abnormal traffic determination unit adjusts the threshold similarity until the number of traffic determined as the abnormal traffic is equal to or less than a predetermined number.

제 8 항에 있어서,The method of claim 8,

상기 비정상 트래픽 판정부는, 상기 클러스터 분류군에 포함된 트래픽의 개수가 가장 큰 클러스터 분류군 이외의 클러스터 분류군에 속하는 트래픽을 비정상 트래픽으로서 판정하는 것을 특징으로 하는 비정상 트래픽 분석 장치.The abnormal traffic determination unit determines the traffic belonging to a cluster classification group other than the cluster classification group having the largest number of traffic included in the cluster classification group as abnormal traffic.