JP2012105043A

JP2012105043A - Flow classification method, system, and program

Info

Publication number: JP2012105043A
Application number: JP2010251554A
Authority: JP
Inventors: Megumi Takeshita; 恵竹下; Masayuki Tsujino; 雅之辻野; Haruhisa Hasegawa; 治久長谷川; Naohisa Komatsu; 尚久小松; Masatsusugu Ichino; 将嗣市野
Original assignee: Waseda University; Nippon Telegraph and Telephone Corp
Current assignee: Waseda University; Nippon Telegraph and Telephone Corp
Priority date: 2010-11-10
Filing date: 2010-11-10
Publication date: 2012-05-31
Anticipated expiration: 2030-11-10
Also published as: JP5502703B2

Abstract

PROBLEM TO BE SOLVED: To classify various flows included by traffic on a communication network, with higher accuracy.SOLUTION: A flow classification method is characterized in that: a plurality of flows generated within a predetermined period are integrated into one simultaneous flow; a plurality of packets, which are contained by the one simultaneous flow and arrive within a predetermined unit arriving interval, are integrated into one short time flow; flow feature amount is calculated for each of these short time flows; a flow feature vector showing a feature of temporal change of the flow feature amount is calculated; and similarity to an existing service category is determined. When a similarity degree showing the highest similarity among the similarity of each service category is not included in a significant range, a new service class is used for classification.

Description

本発明は、通信管理技術に関し、特にデータ通信トラヒックをアプリケーション種別に基づいて分類するフロー分類技術に関する。 The present invention relates to a communication management technique, and more particularly to a flow classification technique for classifying data communication traffic based on an application type.

近年における通信サービスの充実化やこのような通信サービスを利用するアプリケーションの発展に伴って、通信網上を流れるトラヒックも多様化かつ複雑化している。また、アプリケーションの種別ごとに、必要となる通信設備も異なる。このため、通信サービス事業者では、これらトラヒック需要に対応して、高い品質で通信サービスを提供するためには、需要の高いアプリケーション種別に応じた通信設備を、適切なタイミングで増減設する必要がある。 With the recent enhancement of communication services and the development of applications that use such communication services, the traffic flowing on the communication network has become diversified and complicated. In addition, the required communication facilities differ depending on the type of application. For this reason, in order to provide communication services with high quality in response to these traffic demands, it is necessary for communication service providers to increase or decrease the number of communication facilities according to the types of applications with high demand at appropriate timing. is there.

従来より、特定のビット列など、個々の種別のアプリケーションが有する動作上の特徴に注目し、その特徴付けられる動作の発生を監視することで、アプリケーションに関するトラヒックを検出する技術がある（以下、従来技術１という：例えば非特許文献１など参照）。この技術は、主にＰ２Ｐ型アプリケーションより送出されるトラヒックを検出することに適用されている。 2. Description of the Related Art Conventionally, there is a technology for detecting traffic related to an application by paying attention to operation characteristics of each type of application such as a specific bit string and monitoring the occurrence of the characterized operation (hereinafter referred to as conventional technology). 1: For example, see Non-patent Document 1). This technique is mainly applied to detecting traffic transmitted from a P2P type application.

また、通信網上を流れるトラヒックをフローと呼ばれるトラヒックの単位ごとに、例えばパケットサイズの平均等、パケットのヘッダ情報から得られる統計量からなる特徴量を用いて、データマイニング処理により分析することで、特定のフローを検出する技術が提案されている。この技術は、複数の特徴量を１つの識別器で分析する複数識別器の統合手法の一手段である、フィーチャーレベルのマルチモーダル手法がよく用いられている。 In addition, by analyzing the traffic flowing on the communication network for each traffic unit called a flow, for example, by using data mining processing, which is a feature amount including statistics obtained from packet header information such as an average packet size. A technique for detecting a specific flow has been proposed. In this technique, a feature-level multimodal method, which is a means for integrating a plurality of classifiers that analyzes a plurality of feature quantities with a single classifier, is often used.

また、複数識別器の統合手法は、複数の特徴量それぞれで各アプリケーションとの類似度を求めたあと、類似度から再び識別器にかけることにより高精度な分析を実施するスコアレベルのマルチモーダル手法や、新たなサービスが発生した際にもそれが新たなサービスであることを識別する手法との組み合わせ(以下、従来技術２という：例えば非特許文献２など参照)が提案されている。 In addition, the integration method of multiple classifiers is a score-level multimodal method that performs high-accuracy analysis by obtaining similarity between each feature quantity and each application, and then applying the similarity to the classifier again. In addition, a combination with a technique for identifying that a new service is generated even when a new service is generated (hereinafter referred to as Prior Art 2; see, for example, Non-Patent Document 2) has been proposed.

八木,和泉,角田,根本、「ネットワークアプリケーション弁別のためのペイロード長の遷移パタンの評価方法に関する一検討」、信学技報,TM2007-34、社団法人電子情報通信学会、2007-11Yagi, Izumi, Tsunoda, Nemoto, "A Study on Evaluation Method of Payload Length Transition Pattern for Network Application Discrimination", IEICE Technical Report, TM2007-34, The Institute of Electronics, Information and Communication Engineers, 2007-11 市野、山下、星、小松、竹下、辻野、岩下、吉野、「Internet Traffic Classification Using Score Level Fusion of Multiple Classifier」、ICIS2010, IEEE, 2010-08Ichino, Yamashita, Hoshi, Komatsu, Takeshita, Sagano, Iwashita, Yoshino, "Internet Traffic Classification Using Score Level Fusion of Multiple Classifier", ICIS2010, IEEE, 2010-08

しかしながら、このような従来技術では、いずれの技術もフローの分類について十分な分類精度が得られないという問題点があった。
例えば、従来技術１では、ソフトウェアの動作を基に識別を行うため、ソフトウェアのバージョンアップや新しいソフトウェアにより同じアプリケーションでも動作が異なるケースに対応することが難しい。また、通信内容を基にアプリケーションを判別する方法はプライバシーの問題もある。
また、従来技術２では、ソフトウェアの変更により受ける影響は小さいものの、そもそも分類精度が低いという問題点があった。 However, such conventional techniques have a problem that none of the techniques can provide sufficient classification accuracy for flow classification.
For example, in the prior art 1, since the identification is performed based on the operation of the software, it is difficult to cope with a case where the operation is different even in the same application due to software upgrade or new software. In addition, the method of discriminating an application based on communication contents also has a privacy problem.
The prior art 2 has a problem that the classification accuracy is low in the first place although the influence of the software change is small.

本発明はこのような課題を解決するためのものであり、通信網上のトラヒックに含まれる各種フローをより高い精度で分類できるフロー分類技術を提供することを目的としている。 The present invention has been made to solve such problems, and an object of the present invention is to provide a flow classification technique that can classify various flows included in traffic on a communication network with higher accuracy.

このような目的を達成するために、本発明にかかるフロー分類方法は、通信網から収集した入力パケット列に基づいて通信網上のトラヒックに含まれるフローごとに当該フローの特徴を示す特徴量を計算し、これら特徴量に基づいてフローをデータ通信サービスのサービスカテゴリごとに分類するフロー分類システムで用いられるフロー分類方法であって、特徴量データベースが、サービスカテゴリごとに、当該サービスカテゴリに分類されるフローの特徴を示す代表特徴ベクトルを記憶する記憶ステップと、トラヒックデータ収集部が、入力パケット列に含まれるパケットのうち、当該パケットから取得した分類用の識別情報が同一のパケットであって、かつ到着間隔が基準到着間隔以下である複数のパケットを、１つのフローとして統合し、これらフローのうち、当該フローに含まれるパケットの送信先ＩＰアドレスが同一のフローであって、かつフロー開始間隔が基準開始間隔以下である複数のフローを、１つの同時フローとして統合し、これら同時フローごとに、当該同時フローに含まれるパケットのうち、単位到着間隔内に到着した複数のパケットを、１つの短時間フローとして統合し、これら短時間フローごとに、当該短時間フローに含まれるパケットのパケット数、パケットサイズ、または到着間隔を統計処理することにより、当該短時間フローの特徴を示すフロー特徴量を計算し、これら短時間フローごとのフロー特徴量からなる時系列データから、同時フローごとにフロー特徴量の時間的変化の特徴を示すフロー特徴ベクトルを計算するトラヒックデータ収集ステップと、分類処理部が、各同時フローについて、サービスカテゴリごとに、当該同時フローに関するフロー特徴ベクトルと当該サービスカテゴリの代表特徴ベクトルとに基づいて、当該同時フローと当該サービスカテゴリとの類似性を示す類似度を計算し、これら類似度のうち最も高い類似性を示す最高類似度が所定の有意範囲に含まれる場合には、当該同時フローを当該最高類似度が得られたサービスカテゴリに分類し、最高類似度が有意範囲に含まれない場合には、当該同時フローを新たなサービスカテゴリに分類する分類処理ステップとを備えている。 In order to achieve such an object, the flow classification method according to the present invention includes a feature amount indicating a feature of each flow for each flow included in traffic on the communication network based on an input packet sequence collected from the communication network. A flow classification method used in a flow classification system for calculating and classifying a flow for each service category of a data communication service based on these feature quantities, wherein a feature quantity database is classified into the service category for each service category. A storage step for storing a representative feature vector indicating the characteristics of a flow, and a traffic data collection unit, wherein packets included in the input packet sequence have the same identification information for classification acquired from the packet, Combine multiple packets whose arrival interval is less than or equal to the standard arrival interval as one flow Of these flows, a plurality of flows having the same transmission destination IP address of packets included in the flow and having a flow start interval equal to or less than the reference start interval are integrated as one simultaneous flow, For each simultaneous flow, among the packets included in the simultaneous flow, a plurality of packets that arrive within the unit arrival interval are integrated as one short-time flow, and each short-time flow is included in the short-time flow. By statistically processing the number of packets, the packet size, or the arrival interval, a flow feature value indicating the characteristics of the short-time flow is calculated, and the time-series data consisting of the flow feature values for each short-time flow is calculated simultaneously. A traffic data collection process that calculates a flow feature vector that indicates the characteristics of temporal changes in flow features for each flow. And the classification processing unit, for each simultaneous flow, for each service category, based on the flow feature vector related to the simultaneous flow and the representative feature vector of the service category, the similarity between the simultaneous flow and the service category If the highest similarity indicating the highest similarity among these similarities is included in the predetermined significance range, the simultaneous flow is classified into the service category that obtained the highest similarity. When the highest similarity is not included in the significant range, a classification processing step for classifying the simultaneous flow into a new service category is provided.

この際、トラヒックデータ収集ステップで、時系列データを線形予測分析を行うことにより同時フローのフロー特徴量を示す伝達関数の線形予測係数を求め、これら線形予測係数をケプストラム分析することにより、当該同時フローのフロー特徴量に関するスペクトル包絡特性を示すケプストラム係数を求め、これらケプストラム係数からフロー特徴ベクトルを生成するようにしてもよい。 At this time, in the traffic data collection step, linear prediction analysis is performed on the time series data to obtain linear prediction coefficients of the transfer function indicating the flow feature quantity of the simultaneous flow, and the linear prediction coefficients are analyzed by cepstrum analysis. A cepstrum coefficient indicating a spectral envelope characteristic related to the flow feature quantity of the flow may be obtained, and a flow feature vector may be generated from the cepstrum coefficient.

また、記憶ステップで、代表特徴ベクトルとして、当該サービスカテゴリに含まれる短時間フローのフロー特徴量をクラスタリングして得られたクラスタごとに、当該クラスタに含まれるフロー特徴量から計算した代表ベクトルを記憶し、分類処理ステップで、代表特徴ベクトルを構成する代表値ごとに、各フロー特徴ベクトルとの差分の最小値を求め、これら最小差分を統計処理することにより類似度を計算するようにしてもよい。 In addition, for each cluster obtained by clustering the flow feature quantities of the short-time flow included in the service category, a representative vector calculated from the flow feature quantities included in the cluster is stored as a representative feature vector in the storage step. In the classification processing step, for each representative value constituting the representative feature vector, a minimum value of a difference from each flow feature vector may be obtained, and the similarity may be calculated by statistically processing these minimum differences. .

また、記憶ステップで、サービスカテゴリごとに、複数の異なる特徴種別のそれぞれについて代表特徴ベクトルを記憶し、トラヒックデータ収集ステップで、短時間フローごとに、特徴種別のそれぞれについてフロー特徴ベクトルを計算し、分類処理ステップで、同時フローとサービスカテゴリとの類似度に代えて、特徴種別ごとに種別類似度を計算し、これら種別類似度を統計処理することにより類似度を計算するようにしてもよい。 In the storage step, representative feature vectors are stored for each of a plurality of different feature types for each service category, and in the traffic data collection step, a flow feature vector is calculated for each of the feature types for each short-time flow. In the classification processing step, instead of the similarity between the simultaneous flow and the service category, the type similarity may be calculated for each feature type, and the type similarity may be statistically processed to calculate the similarity.

また、サービスカテゴリ作成部が、対応するアプリケーションがパケットごとにそれぞれ既知である教師パケット列について、入力パケット列と同様にして生成した短時間フローごとにフロー特徴量を計算し、アプリケーションのうちから選択した２つの異なるアプリケーションごとに、これらアプリケーションに属するフロー特徴量の分散、平均値、および要素数に基づいて、当該アプリケーション間のクラス内分散とクラス間分散との分散比を計算し、得られた分散比が判定しきい値より小さい場合には、これらアプリケーションを同一サービスカテゴリに分類し、得られた分散比が判定しきい値以上の場合には、これらアプリケーションを別個のサービスカテゴリに分類することにより、サービスカテゴリをそれぞれ作成するサービスカテゴリ作成ステップと、特徴量計算部が、サービスカテゴリ作成ステップで作成したサービスカテゴリごとに、当該サービスカテゴリに分類されたアプリケーションに属するフロー特徴量から、当該サービスカテゴリに分類されるフローの特徴を示すフロー特徴量を計算して、特徴量データベースへ保存する特徴量計算ステップとをさらに備えてもよい。 In addition, the service category creation unit calculates a flow feature amount for each short-time flow generated in the same manner as the input packet sequence for the teacher packet sequence whose corresponding application is known for each packet, and selects from the applications For each of the two different applications, the distribution ratio between intra-class distribution and inter-class distribution between the applications was calculated based on the distribution, average value, and number of elements of the flow feature values belonging to these applications. If the distribution ratio is smaller than the decision threshold, classify these applications into the same service category, and if the obtained distribution ratio is greater than or equal to the decision threshold, classify these applications into separate service categories. To create each service category For each service category created by the category creation step and the service category creation step, the feature quantity calculation unit indicates the feature of the flow classified into the service category from the flow feature quantity belonging to the application classified into the service category. It may further comprise a feature amount calculating step of calculating the flow feature amount and storing it in the feature amount database.

この際、特徴量計算ステップで、当該サービスカテゴリに分類されたアプリケーションに属するフロー特徴量をクラスタリングし、得られたクラスタごとに、当該クラスタに属するフロー特徴量からなる時系列データに基づき、これらフロー特徴量の時間的変化の特徴を示す特徴ベクトルを計算し、得られた特徴ベクトルを当該サービスクラスの代表特徴ベクトルとして特徴量データベースへ保存するようにしてもよい。 At this time, in the feature amount calculation step, the flow feature amounts belonging to the application classified in the service category are clustered, and for each obtained cluster, these flows are based on time-series data composed of the flow feature amounts belonging to the cluster. It is also possible to calculate a feature vector indicating the feature of the temporal change of the feature amount and store the obtained feature vector in the feature amount database as a representative feature vector of the service class.

また、本発明にかかるフロー分類システムは、通信網から収集した入力パケット列に基づいて通信網上のトラヒックに含まれるフローごとに当該フローの特徴を示す特徴量を計算し、これら特徴量に基づいてフローをデータ通信サービスのサービスカテゴリごとに分類するフロー分類システムであって、サービスカテゴリごとに、当該サービスカテゴリに分類されるフローの特徴を示す代表特徴ベクトルを記憶する特徴量データベースと、入力パケット列に含まれるパケットのうち、当該パケットから取得した分類用の識別情報が同一のパケットであって、かつ到着間隔が基準到着間隔以下である複数のパケットを、１つのフローとして統合し、これらフローのうち、当該フローに含まれるパケットの送信先ＩＰアドレスが同一のフローであって、かつフロー開始間隔が基準開始間隔以下である複数のフローを、１つの同時フローとして統合し、これら同時フローごとに、当該同時フローに含まれるパケットのうち、単位到着間隔内に到着した複数のパケットを、１つの短時間フローとして統合し、これら短時間フローごとに、当該短時間フローに含まれるパケットのパケット数、パケットサイズ、または到着間隔を統計処理することにより、当該短時間フローの特徴を示すフロー特徴量を計算し、これら短時間フローごとのフロー特徴量からなる時系列データから、同時フローごとにフロー特徴量の時間的変化を示す特徴ベクトルを計算するトラヒックデータ収集部と、各同時フローについて、サービスカテゴリごとに、当該同時フローに関するフロー特徴ベクトルと当該サービスカテゴリの代表特徴ベクトルとに基づいて、当該同時フローと当該サービスカテゴリとの類似性を示す類似度を計算し、これら類似度のうち最も高い類似性を示す最高類似度が所定の有意範囲に含まれる場合には、当該同時フローを当該最高類似度が得られたサービスカテゴリに分類し、最高類似度が有意範囲に含まれない場合には、当該同時フローを新たなサービスカテゴリに分類する分類処理部とを備えている。 In addition, the flow classification system according to the present invention calculates a feature value indicating a feature of the flow for each flow included in the traffic on the communication network based on an input packet sequence collected from the communication network, and based on the feature value. A flow classification system for classifying a flow for each service category of a data communication service, and for each service category, a feature amount database storing a representative feature vector indicating a feature of the flow classified into the service category, and an input packet Among the packets included in the column, a plurality of packets having the same classification identification information acquired from the packets and having an arrival interval equal to or less than the reference arrival interval are integrated as one flow. Among the flows, the destination IP address of the packet included in the flow is the same flow In addition, a plurality of flows whose flow start intervals are equal to or less than the reference start interval are integrated as one simultaneous flow, and for each of these simultaneous flows, a plurality of packets that arrive within the unit arrival interval among the packets included in the simultaneous flow. Are integrated as a single short-time flow, and for each short-time flow, the number of packets, packet size, or arrival interval of the packets included in the short-time flow are statistically processed, A traffic data collection unit that calculates a flow feature amount indicating a feature and calculates a feature vector indicating a temporal change of the flow feature amount for each simultaneous flow from time-series data including the flow feature amount for each short-time flow; For each concurrent flow, for each service category, the flow feature vector and the service for the concurrent flow. Based on the representative feature vector of the category, the similarity indicating the similarity between the concurrent flow and the service category is calculated, and the highest similarity indicating the highest similarity among these similarities is included in the predetermined significance range. If the maximum similarity is not included in the significance range, a classification process is performed to classify the simultaneous flow into a new service category. Department.

この際、トラヒックデータ収集部で、時系列データを線形予測分析を行うことにより同時フローのフロー特徴量を示す伝達関数の線形予測係数を求め、これら線形予測係数をケプストラム分析することにより、当該同時フローのフロー特徴量に関するスペクトル包絡特性を示すケプストラム係数を求め、これらケプストラム係数からフロー特徴ベクトルを生成するようにしてもよい。 At this time, the traffic data collection unit performs linear prediction analysis on the time series data to obtain linear prediction coefficients of the transfer function indicating the flow feature quantity of the simultaneous flow, and performs cepstrum analysis on the linear prediction coefficients to obtain the simultaneous prediction coefficient. A cepstrum coefficient indicating a spectral envelope characteristic related to the flow feature quantity of the flow may be obtained, and a flow feature vector may be generated from the cepstrum coefficient.

また、本発明にかかるプログラムは、コンピュータに、前述したいずれか１つのフロー分類方法の各ステップを実行させるためのプログラムである。 The program according to the present invention is a program for causing a computer to execute each step of any one of the flow classification methods described above.

アプリケーションには同時フローの時間軸上での変動に特徴があるが、フロー特徴量そのものを利用した場合、時間軸上での変動情報を扱うことはできない。本発明によれば、同時フローの特徴量に関する時間軸上での緩やかな変動を考慮してフローの類似性を判定することができる。このため、フロー特徴量そのものに基づき類似性を判定する場合と比較して、トラヒックの時間的な変動を利用した識別を行うことができる。
したがって、本実施の形態をデータ通信サービスのネットワーク管理に適用すれば、映像フローのような品質条件が厳しいフローに対して優先制御を行う等のトラヒック制御を用いることで、同じ設備量でよりユーザ満足度の高いネットワークを提供することができる。また、アプリケーションごとのトラヒック需要量を正確に把握できるため、高精度なトラヒック需要の予測を行うことができ、結果として、ユーザ満足度の更なる向上を実現することが可能となる。 The application is characterized by fluctuations of the simultaneous flow on the time axis, but when the flow feature quantity itself is used, the fluctuation information on the time axis cannot be handled. According to the present invention, the similarity of flows can be determined in consideration of a gradual change on the time axis regarding the feature amount of the simultaneous flow. For this reason, compared with the case where similarity is determined based on the flow feature amount itself, it is possible to perform identification using temporal variation of traffic.
Therefore, if this embodiment is applied to network management of a data communication service, traffic control such as priority control for a flow having a severe quality condition such as a video flow can be used, so that the user can use the same amount of equipment. A highly satisfactory network can be provided. Further, since the traffic demand amount for each application can be accurately grasped, it is possible to predict the traffic demand with high accuracy, and as a result, it is possible to further improve the user satisfaction.

第１の実施の形態にかかるフロー分類システムの構成を示すブロック図である。It is a block diagram which shows the structure of the flow classification system concerning 1st Embodiment. 特徴量データベースの構成例である。It is a structural example of a feature-value database. トラヒックデータ収集部の構成を示すブロック図である。It is a block diagram which shows the structure of a traffic data collection part. サービスカテゴリ分類部の構成を示すブロック図である。It is a block diagram which shows the structure of a service category classification | category part. フロー生成処理を示すフローチャートである。It is a flowchart which shows a flow production | generation process. 同時フロー生成処理を示すフローチャートである。It is a flowchart which shows a simultaneous flow production | generation process. 短時間フロー生成処理を示すフローチャートである。It is a flowchart which shows a short time flow production | generation process. 特徴量計算処理を示すフローチャートである。It is a flowchart which shows a feature-value calculation process. 入力パケット列の構成例である。It is a structural example of an input packet sequence. 同時フローの構成例である。It is a structural example of a simultaneous flow. 短時間フローの構成例である。It is a structural example of a short-time flow. トラヒックデータ収集部の動作例を示すシーケンス図である。It is a sequence diagram which shows the operation example of a traffic data collection part. 特徴ベクトル計算処理を示すフローチャートである。It is a flowchart which shows a feature vector calculation process. 類似度計算処理を示すフローチャートである。It is a flowchart which shows a similarity calculation process. 類似度計算処理を示す説明図である。It is explanatory drawing which shows a similarity calculation process. 分類判定処理を示すフローチャートである。It is a flowchart which shows a classification | category determination process. 分類判定処理を示す説明図である。It is explanatory drawing which shows a classification determination process. 第２の実施の形態にかかるフロー分類システムの構成を示すブロック図である。It is a block diagram which shows the structure of the flow classification system concerning 2nd Embodiment. 学習処理部の構成を示すブロック図である。It is a block diagram which shows the structure of a learning process part. サービスカテゴリ作成処理を示すフローチャートである。It is a flowchart which shows a service category creation process. 代表特徴ベクトル計算処理を示すフローチャートである。It is a flowchart which shows a representative feature vector calculation process.

次に、本発明の実施の形態について図面を参照して説明する。
［第１の実施の形態］
まず、図１を参照して、本発明の第１の実施の形態にかかるフロー分類システムについて説明する。図１は、第１の実施の形態にかかるフロー分類システムの構成を示すブロック図である。 Next, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
First, the flow classification system according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of a flow classification system according to the first embodiment.

このフロー分類システム１０は、全体として、１つまたは複数の、サーバ装置やパーソナルコンピュータなどの情報処理装置から構成されて、通信網３０から収集した入力パケット列に基づいて通信網３０上のトラヒックに含まれるフローごとに当該フローの特徴を示す特徴量を計算し、これら特徴量に基づいて当該トラヒックをデータ通信サービスのサービスカテゴリごとに分類し、その分類結果を管理端末２０へ出力して表示するシステムである。 The flow classification system 10 as a whole is composed of one or a plurality of information processing apparatuses such as a server apparatus and a personal computer. Based on an input packet sequence collected from the communication network 30, the flow classification system 10 A feature value indicating the feature of the flow is calculated for each included flow, the traffic is classified for each service category of the data communication service based on the feature value, and the classification result is output to the management terminal 20 and displayed. System.

本実施の形態は、入力パケット列に含まれる各パケットの送信先ＩＰアドレスと到着時刻に基づいて、これらパケットを同時フローとして統合するとともに、これら同時フローを短時間フローに分割して、短時間フローごとに当該短時間フローの特徴を示すフロー特徴量を計算し、これら短時間フローごとのフロー特徴量からなる時系列データから、同時フローごとにフロー特徴量の時間的変化を示すフロー特徴ベクトルを計算し、同時フローとサービスカテゴリとの組み合わせごとに、当該同時フローに関するフロー特徴ベクトルと当該サービスカテゴリの代表特徴ベクトルとに基づいて、当該同時フローと当該サービスカテゴリとの類似性を示す類似度を計算し、これら類似度のうち最も高い類似性を示す最高類似度が所定の有意範囲に含まれる場合には、当該同時フローを当該最高類似度のサービスカテゴリに分類し、最高類似度が有意範囲に含まれない場合には、当該同時フローを新たなサービスカテゴリに分類するようにしたものである。 The present embodiment integrates these packets as simultaneous flows based on the transmission destination IP address and arrival time of each packet included in the input packet sequence, and divides these simultaneous flows into short-time flows for a short time. A flow feature vector indicating the temporal change of the flow feature amount for each simultaneous flow is calculated from the time series data consisting of the flow feature amount for each short-time flow. For each combination of simultaneous flow and service category, the similarity indicating the similarity between the simultaneous flow and the service category based on the flow feature vector related to the simultaneous flow and the representative feature vector of the service category The highest similarity that shows the highest similarity among these similarities is within the predetermined significance range. If the maximum similarity is not included in the significance range, the simultaneous flow is classified into a new service category. It is.

［フロー分類システム］
次に、図１を参照して、本実施の形態にかかるフロー分類システムの構成について詳細に説明する。 [Flow classification system]
Next, the configuration of the flow classification system according to this exemplary embodiment will be described in detail with reference to FIG.

フロー分類システム１０には、主な機能部として、トラヒックデータ収集部１１、特徴量データベース（以下、特徴量ＤＢという）１２、およびサービスカテゴリ分類部１３が設けられている。
これら機能部のうち、トラヒックデータ収集部１１およびサービスカテゴリ分類部１３は、ＣＰＵなどのマイクロプロセッサで記憶部（図示せず）から読み出したプログラムを実行することにより各機能部を実現する演算処理部で実現されており、特徴量ＤＢ１２は、ハードディスクなどの記憶装置で実現されている。 The flow classification system 10 includes a traffic data collection unit 11, a feature amount database (hereinafter referred to as a feature amount DB) 12, and a service category classification unit 13 as main functional units.
Among these functional units, the traffic data collection unit 11 and the service category classification unit 13 are arithmetic processing units that implement each functional unit by executing a program read from a storage unit (not shown) by a microprocessor such as a CPU. The feature DB 12 is realized by a storage device such as a hard disk.

トラヒックデータ収集部１１は、通信網３０から収集した時系列の入力パケット列に含まれる各パケットの送信先ＩＰアドレスと到着時刻に基づいて、入力パケット列に含まれるフローＦを同時フローＦＣとして統合する機能と、これら同時フローＦＣから複数の短時間フローＦＳを生成する機能と、各同時フローＦＣｉの短時間フローＦＳｉｊごとに、当該短時間フローＦＳｉｊの特徴を示すフロー特徴量ＲＳｉｊを計算する機能を有している。 The traffic data collection unit 11 integrates the flow F included in the input packet sequence as a simultaneous flow FC based on the destination IP address and arrival time of each packet included in the time-series input packet sequence collected from the communication network 30. A function for generating a plurality of short-time flows FS from these simultaneous flows FC, and a function for calculating a flow feature value RSij indicating the characteristics of the short-time flow FSij for each short-time flow FSij of each simultaneous flow FCi have.

特徴量ＤＢ１２は、フローを分類するサービスカテゴリＸごとに、当該サービスカテゴリＸｍに分類されるフローの特徴を示す代表特徴ベクトルＶｍを記憶する機能を有している。
図２は、特徴量データベースの構成例である。ここでは、各種のサービスカテゴリＸごとに、各特徴種別Ｙとその代表特徴ベクトルＶｍとが組として登録されている。 The feature amount DB 12 has a function of storing, for each service category X that classifies flows, a representative feature vector Vm that indicates the characteristics of the flow classified into the service category Xm.
FIG. 2 is a configuration example of the feature amount database. Here, for each service category X, each feature type Y and its representative feature vector Vm are registered as a set.

このうち、サービスカテゴリＸとしては、ＦＴＰ、電子メール送信・受信、チャット、オンラインゲーム、動画など、各種アプリケーションに対応したデータ通信サービスが登録されている。また、特徴種別Ｙとしては、パケット総数、パケットサイズの合計・平均値・標準偏差、パケット到着間隔の合計・平均値・標準偏差など、短時間フローＦＳに含まれるパケットに関するパケット数、パケットサイズ、または到着間隔を統計処理して得られる特徴種別が登録されている。また、代表特徴ベクトルＶｍとしては、当該特徴種別Ｙの特徴量をクラスタリングして得られた各クラスタ１，クラスタ２，…ごとに、当該クラスタに属する特徴量の時間的変化の特徴を示す特徴ベクトルが登録されている。 Among these, as the service category X, data communication services corresponding to various applications such as FTP, e-mail transmission / reception, chat, online game, and video are registered. Further, as the feature type Y, the total number of packets, the total / average value / standard deviation of packet sizes, the total / average value / standard deviation of packet arrival intervals, etc., the number of packets related to packets included in the short-time flow FS, the packet size, Alternatively, a feature type obtained by statistically processing the arrival interval is registered. Further, as the representative feature vector Vm, for each cluster 1, cluster 2,... Obtained by clustering the feature amount of the feature type Y, a feature vector indicating the temporal change feature of the feature amount belonging to the cluster. Is registered.

サービスカテゴリ分類部１３は、トラヒックデータ収集部１１で生成された、分類対象となる同時フローＦＣｉについて、特徴量ＤＢ１２に登録されているサービスカテゴリＸｍごとに、当該同時フローＦＣｉに関するフロー特徴ベクトルＶＳｉｊと、特徴量ＤＢ１２から取得した当該サービスカテゴリＸｍの代表特徴ベクトルＶｍとに基づいて、当該同時フローＦＣｉと当該サービスカテゴリＸｍとの類似性を示す類似度Ｓｉｍを計算する機能と、これら類似度Ｓｉｍのうち最も高い類似性を示す最高類似度Ｓｉｈが所定の有意範囲に含まれる場合には、当該同時フローＦＣｉを当該最高類似度Ｓｉｈが得られたサービスカテゴリＸｍに分類し、最高類似度Ｓｉｈが有意範囲に含まれない場合には、当該同時フローＦＣｉを新たなサービスカテゴリＸｎｅｗに分類する機能とを有している。 The service category classification unit 13 generates the flow feature vector VSij related to the simultaneous flow FCi for each service category Xm registered in the feature amount DB 12 for the simultaneous flow FCi to be classified generated by the traffic data collection unit 11. , Based on the representative feature vector Vm of the service category Xm acquired from the feature amount DB 12, a function for calculating the similarity Sim indicating the similarity between the simultaneous flow FCi and the service category Xm, and the similarity Sim If the highest similarity Sih showing the highest similarity is included in the predetermined significant range, the simultaneous flow FCi is classified into the service category Xm from which the highest similarity Sih is obtained, and the highest similarity Sih is significant. If it is not included in the range, the simultaneous flow FCi is added to the new service. And a function to be classified into categories Xnew.

［トラヒックデータ収集部］
次に、図３を参照して、トラヒックデータ収集部１１の構成について詳細に説明する。図３は、トラヒックデータ収集部の構成を示すブロック図である。
トラヒックデータ収集部１１には、主な処理部として、フロー生成部１１Ａ、同時フロー生成部１１Ｂ、短時間フロー生成部１１Ｃ、特徴量計算部１１Ｄ、特徴ベクトル計算部１１Ｅ、およびトラヒック情報データベース（以下、トラヒック情報ＤＢという）１１Ｆが設けられている。 [Traffic data collection unit]
Next, the configuration of the traffic data collection unit 11 will be described in detail with reference to FIG. FIG. 3 is a block diagram showing the configuration of the traffic data collection unit.
The traffic data collection unit 11 includes, as main processing units, a flow generation unit 11A, a simultaneous flow generation unit 11B, a short-time flow generation unit 11C, a feature amount calculation unit 11D, a feature vector calculation unit 11E, and a traffic information database (hereinafter referred to as a traffic information database). 11F) (referred to as traffic information DB).

フロー生成部１１Ａは、通信網３０から収集した入力パケット列のうち、当該パケットから取得したフロー分類用の識別情報が同一のパケットであって、かつパケット到着間隔が基準到着間隔以下である複数のパケットを、１つのフローＦとして統合することにより、通信網３０から収集した入力パケット列からフローＦを生成する機能を有している。 The flow generation unit 11A includes a plurality of input packet sequences collected from the communication network 30 that have the same flow classification identification information acquired from the packet and whose packet arrival interval is equal to or less than the reference arrival interval. By integrating the packets as one flow F, the flow F is generated from the input packet sequence collected from the communication network 30.

同時フロー生成部１１Ｂは、フロー生成部１１Ａで生成されたフローＦのうち、当該フローＦに含まれるパケットの送信先ＩＰアドレスが同一のフローであって、かつフロー開始間隔が基準開始間隔ｔｃ以下である複数のフローを、１つの同時フローＦＣｉとして統合することにより、フロー生成部１１Ａで生成されたフローから同時フローＦＣを生成する機能を有している。 The simultaneous flow generation unit 11B is a flow having the same destination IP address of packets included in the flow F among the flows F generated by the flow generation unit 11A, and the flow start interval is equal to or less than the reference start interval tc. Are integrated as one simultaneous flow FCi, thereby generating a simultaneous flow FC from the flow generated by the flow generation unit 11A.

短時間フロー生成部１１Ｃは、同時フロー生成部１１Ｂで生成された同時フローＦＣごとに、当該同時フローＦＣｉに含まれるパケットのうち、単位到着間隔ｔｓ内に続けて到着したパケットを、１つの短時間フローＦＳｉｊとして統合することにより、同時フロー生成部１１Ｂで生成された同時フローＦＣｉのそれぞれから短時間フローＦＳｉｊを生成する機能を有している。 For each simultaneous flow FC generated by the simultaneous flow generation unit 11B, the short-time flow generation unit 11C converts packets that have arrived within the unit arrival interval ts out of the packets included in the simultaneous flow FCi into one short flow. By integrating as a time flow FSij, it has a function of generating a short-time flow FSij from each of the simultaneous flows FCi generated by the simultaneous flow generation unit 11B.

特徴量計算部１１Ｄは、短時間フロー生成部１１Ｃで生成された短時間フローＦＳｉｊごとに、当該短時間フローＦＳｉｊに含まれるパケットのパケット数、パケットサイズ、または到着間隔を統計処理することにより、当該短時間フローＦＳｉｊの特徴を示すフロー特徴量ＲＳｉｊを計算する機能を有している。 For each short-time flow FSij generated by the short-time flow generation unit 11C, the feature amount calculation unit 11D statistically processes the number of packets, the packet size, or the arrival interval of the packets included in the short-time flow FSij, It has a function of calculating a flow feature amount RSij indicating the feature of the short-time flow FSij.

特徴ベクトル計算部１１Ｅは、特徴量計算部１１Ｄで計算された各同時フローＦＣｉに属する短時間フローＦＳｉｊごとのフロー特徴量ＲＳｉｊからなる時系列データから、同時フローＦＣｉごとに当該フロー特徴量ＲＳｉｊの時間的変化の特徴を示すフロー特徴ベクトルＶｉを計算する機能を有している。
トラヒック情報ＤＢ１１Ｆは、同時フローＦＣｉごとに、特徴ベクトル計算部１１Ｅで計算したフロー特徴ベクトルＶｉを記憶する機能を有している。 The feature vector calculation unit 11E calculates the flow feature value RSij for each simultaneous flow FCi from the time series data composed of the flow feature values RSij for each short-time flow FSij belonging to each simultaneous flow FCi calculated by the feature value calculation unit 11D. It has a function of calculating a flow feature vector Vi indicating the characteristics of temporal change.
The traffic information DB 11F has a function of storing the flow feature vector Vi calculated by the feature vector calculation unit 11E for each simultaneous flow FCi.

［サービスカテゴリ分類部］
次に、図４を参照して、サービスカテゴリ分類部１３の構成について詳細に説明する。図４は、サービスカテゴリ分類部の構成を示すブロック図である。
サービスカテゴリ分類部１３には、主な処理部として、類似度計算部１３Ａ−１３Ｎ、類似度統合部１３Ｐ、および分類判定部１３Ｑが設けられている。 [Service Category Classification Department]
Next, the configuration of the service category classification unit 13 will be described in detail with reference to FIG. FIG. 4 is a block diagram illustrating a configuration of the service category classification unit.
The service category classification unit 13 is provided with a similarity calculation unit 13A-13N, a similarity integration unit 13P, and a classification determination unit 13Q as main processing units.

類似度計算部１３Ａ−１３Ｎは、特徴量ＤＢ１２に登録されている特徴種別Ｙｎ（ｎ＝１…Ｎ）ごとに設けられて、トラヒックデータ収集部１１で生成された同時フローＦＣｉと、特徴量ＤＢ１２に登録されているサービスカテゴリＸｍとの組み合わせごとに、当該同時フローＦＣｉに関するフロー特徴ベクトルＶｉと当該サービスカテゴリＸｍの代表特徴ベクトルＶｍとに基づいて、特徴種別Ｙｎに関する当該同時フローＦＣｉと当該サービスカテゴリＸｍとの類似性を示す類似度Ｓｉｍｎをそれぞれ計算する機能を有している。 The similarity calculation unit 13A-13N is provided for each feature type Yn (n = 1... N) registered in the feature value DB 12, and includes the simultaneous flow FCi generated by the traffic data collection unit 11 and the feature value DB 12. For each combination with the service category Xm registered in, based on the flow feature vector Vi for the simultaneous flow FCi and the representative feature vector Vm for the service category Xm, the simultaneous flow FCi and the service category for the feature type Yn It has a function of calculating similarity Simn indicating similarity to Xm.

類似度統合部１３Ｐは、各同時フローＦＣｉとサービスカテゴリＸｍとの組み合わせごとに、類似度計算部１３Ａ−１３Ｎでそれぞれ計算した特徴種別Ｙｎに関する類似度Ｓｉｍｎを、１つのカテゴリ類似度Ｓｉｍに統合する機能を有している。 The similarity integration unit 13P integrates the similarity Sim for the feature type Yn calculated by the similarity calculation unit 13A-13N into one category similarity Sim for each combination of the simultaneous flow FCi and the service category Xm. It has a function.

分類判定部１３Ｑは、各同時フローＦＣｉについて、類似度統合部１３Ｐで当該同時フローＦＣｉとサービスカテゴリＸｍとの組み合わせごとに得られた類似度Ｓｉｍのうち、最も高い類似性を示す最高類似度Ｓｉｈと有意範囲とを比較する機能と、最高類似度Ｓｉｈが所定の有意範囲に含まれる場合には、当該同時フローＦＣｉを当該最高類似度ＳｉｈのサービスカテゴリＸｍに分類し、最高類似度Ｓｉｈが有意範囲に含まれない場合には、当該同時フローＦＣｉを新たなサービスカテゴリＸｎｅｗに分類する機能とを有している。 For each simultaneous flow FCi, the classification determination unit 13Q has the highest similarity Sih indicating the highest similarity among the similarity Sims obtained for each combination of the simultaneous flow FCi and the service category Xm by the similarity integration unit 13P. When the highest similarity Sih is included in the predetermined significance range, the simultaneous flow FCi is classified into the service category Xm of the highest similarity Sih, and the highest similarity Sih is significant. When it is not included in the range, it has a function of classifying the simultaneous flow FCi into a new service category Xnew.

［第１の実施の形態の動作］
次に、本実施の形態にかかるフロー分類システム１０の動作について詳細に説明する。 [Operation of First Embodiment]
Next, the operation of the flow classification system 10 according to this exemplary embodiment will be described in detail.

［フロー生成動作］
まず、図５を参照して、本実施の形態にかかるトラヒックデータ収集部１１におけるフロー生成動作について説明する。図５は、フロー生成処理を示すフローチャートである。
トラヒックデータ収集部１１のフロー生成部１１Ａは、通信網３０から収集した入力パケット列からフローを生成する際、図５のフロー生成処理を実行する。 [Flow generation operation]
First, the flow generation operation in the traffic data collection unit 11 according to the present embodiment will be described with reference to FIG. FIG. 5 is a flowchart showing the flow generation process.
When the flow generation unit 11A of the traffic data collection unit 11 generates a flow from the input packet sequence collected from the communication network 30, the flow generation processing of FIG.

フロー生成部１１Ａは、まず、通信網３０から収集した入力パケット列を取得し（ステップ１００）、この入力パケット列の先頭から、いずれのフローにも統合していない未処理のパケットを１つ選択し（ステップ１０１）、当該パケットから取得したフロー分類用の識別情報が、統合処理中フローのいずれかと一致するか確認する（ステップ１０２）。この際、フロー分類用の識別情報としては、送信元・送信先のＩＰアドレス、送信元・送信先のポート番号、プロトコル種別などを組み合わせて用いればよい。 First, the flow generation unit 11A acquires an input packet sequence collected from the communication network 30 (step 100), and selects one unprocessed packet that is not integrated into any flow from the top of the input packet sequence. Then (step 101), it is confirmed whether the identification information for flow classification acquired from the packet matches any of the integration processing flows (step 102). At this time, the identification information for flow classification may be a combination of the source / destination IP address, the source / destination port number, the protocol type, and the like.

ここで、識別情報が一致する処理中フローが存在しない場合（ステップ１０２：ＮＯ）、フロー生成部１１Ａは、新たなフロー番号を当該選択パケットに付加して新たなフローを生成し、当該新たなフロー番号のフローを処理中フローとする（ステップ１０３）。
一方、識別情報が一致する処理中フローが存在する場合（ステップ１０２：ＹＥＳ）、フロー生成部１１Ａは、当該処理中フローの最後尾パケットと選択パケットとの到着間隔を基準到着間隔ｔｆとを比較する（ステップ１０４）。 Here, when there is no in-process flow that matches the identification information (step 102: NO), the flow generation unit 11A generates a new flow by adding a new flow number to the selected packet, and generates the new flow. The flow with the flow number is set as a processing flow (step 103).
On the other hand, when there is a process flow that matches the identification information (step 102: YES), the flow generation unit 11A compares the arrival interval between the last packet of the process flow and the selected packet with the reference arrival interval tf. (Step 104).

ここで、最後尾パケットと選択パケットとの到着間隔が基準到着間隔ｔｆより大きかった場合（ステップ１０４：ＮＯ）、フロー生成部１１Ａは、当該処理中フローが終了したと判定して処理中フローから除外した後（ステップ１０５）、ステップ１０３に移行して、新たなフロー番号を当該選択パケットに付加して新たなフローを生成し、当該新たなフロー番号のフローを処理中フローとする。
一方、最後尾パケットと選択パケットとの到着間隔が基準到着間隔ｔｆ以内であった場合（ステップ１０４：ＹＥＳ）、フロー生成部１１Ａは、当該処理中フローのフロー番号を選択パケットに付加する（ステップ１０６）。 Here, when the arrival interval between the last packet and the selected packet is larger than the reference arrival interval tf (step 104: NO), the flow generation unit 11A determines that the in-process flow has ended, and starts from the in-process flow. After the exclusion (step 105), the process proceeds to step 103, where a new flow number is added to the selected packet to generate a new flow, and the flow with the new flow number is set as a processing flow.
On the other hand, if the arrival interval between the last packet and the selected packet is within the reference arrival interval tf (step 104: YES), the flow generation unit 11A adds the flow number of the current process flow to the selected packet (step 104). 106).

このようにして、選択パケットにフロー番号を付加することによりフローＦを生成した後、フロー生成部１１Ａは、選択パケットをフラグなどにより処理済みとし（ステップ１０７）、入力パケット列のうち全てのパケットについて処理が終了したか確認する（ステップ１０８）。
ここで、未処理のパケットが存在する場合（ステップ１０８：ＮＯ）、ステップ１０１に戻って次のパケットの処理を繰り返し実行する。また、全てのパケットについて処理が終了している場合（ステップ１０８：ＹＥＳ）、フロー生成部１１Ａは、個々のパケットにフロー番号を付加した入力パケット列を出力し（ステップ１０９）、一連のフロー生成処理を終了する。 In this way, after generating the flow F by adding the flow number to the selected packet, the flow generating unit 11A determines that the selected packet has been processed with a flag or the like (step 107), and all packets in the input packet sequence are processed. It is confirmed whether the process has been completed (step 108).
Here, when there is an unprocessed packet (step 108: NO), the process returns to step 101 to repeatedly execute the process of the next packet. When processing has been completed for all packets (step 108: YES), the flow generation unit 11A outputs an input packet sequence in which a flow number is added to each packet (step 109), and a series of flow generations End the process.

［同時フロー生成動作］
次に、図６を参照して、本実施の形態にかかるトラヒックデータ収集部１１における同時フロー生成動作について説明する。図６は、同時フロー生成処理を示すフローチャートである。 [Simultaneous flow generation operation]
Next, the simultaneous flow generation operation in the traffic data collection unit 11 according to the present embodiment will be described with reference to FIG. FIG. 6 is a flowchart showing the simultaneous flow generation process.

トラヒックデータ収集部１１の同時フロー生成部１１Ｂは、フロー生成部１１Ａで生成されたフローから同時フローを生成する際、図６の同時フロー生成処理を実行する。 The simultaneous flow generation unit 11B of the traffic data collection unit 11 executes the simultaneous flow generation process of FIG. 6 when generating a simultaneous flow from the flow generated by the flow generation unit 11A.

同時フロー生成部１１Ｂは、まず、フロー生成部１１Ａで生成されたフロー番号付き入力パケット列を取得し（ステップ１１０）、この入力パケット列から、いずれの同時フローにも統合していない未処理のフローＦｔを１つ選択する（ステップ１１１）。
次に、同時フロー生成部１１Ｂは、選択フローＦｔのパケットから送信先ＩＰアドレスを取得し、他の未処理のフローＦのうち、送信先ＩＰアドレスが選択フローＦｔと同じフローの有無を確認する（ステップ１１２）。 The simultaneous flow generation unit 11B first acquires the input packet sequence with the flow number generated by the flow generation unit 11A (step 110), and from this input packet sequence, unprocessed that has not been integrated into any simultaneous flow One flow Ft is selected (step 111).
Next, the simultaneous flow generation unit 11B acquires the destination IP address from the packet of the selected flow Ft, and checks whether there is a flow having the same destination IP address as the selected flow Ft among other unprocessed flows F. (Step 112).

ここで、送信先ＩＰアドレスが選択フローＦｔと同じフローを確認した場合（ステップ１１２：ＹＥＳ）、同時フロー生成部１１Ｂは、これら確認フローＦｃのうちから、選択フローＦｔとのフロー開始間隔（絶対値）が基準開始間隔ｔｃ以下のフローＦａを抽出し（ステップ１１３）、これら抽出フローＦａと選択フローＦｔの両方のパケットに、新たな同時フロー番号を付加して新たな同時フローＦＣｉを生成し（ステップ１１４）、これらフローＦｔ，Ｆａをフラグなどにより処理済みとする（ステップ１１５）。 Here, when the flow having the same destination IP address as the selected flow Ft is confirmed (step 112: YES), the simultaneous flow generating unit 11B determines a flow start interval (absolutely) from the selected flow Ft among these confirmation flows Fc. A value Fa) having a reference start interval tc or less is extracted (step 113), and a new simultaneous flow FCi is generated by adding a new simultaneous flow number to both packets of the extracted flow Fa and the selected flow Ft. (Step 114), it is assumed that these flows Ft and Fa have been processed by a flag or the like (Step 115).

一方、送信先ＩＰアドレスが選択フローＦｔと同じフローＦを確認できなかった場合（ステップ１１２：ＮＯ）、同時フロー生成部１１Ｂは、選択フローＦｔのパケットのそれぞれに、新たな同時フロー番号を付加して新たな同時フローＦＣｉを生成し（ステップ１１６）、選択フローＦｔをフラグなどにより処理済みとする（ステップ１１７）。 On the other hand, when the flow F having the same destination IP address as the selected flow Ft cannot be confirmed (step 112: NO), the simultaneous flow generation unit 11B adds a new simultaneous flow number to each packet of the selected flow Ft. Then, a new simultaneous flow FCi is generated (step 116), and the selected flow Ft is processed with a flag or the like (step 117).

このようにして、選択フローＦｔや抽出フローＦａに同時フロー番号を付加することにより同時フローＦＣｉを生成した後、同時フロー生成部１１Ｂは、入力パケット列のうち全てのフローＦについて処理が終了したか確認する（ステップ１１８）。
ここで、未処理のフローＦが存在する場合（ステップ１１８：ＮＯ）、ステップ１１１に戻って次のフローＦの処理を繰り返し実行する。また、全てのフローについて処理が終了している場合（ステップ１１８：ＹＥＳ）、同時フロー生成部１１Ｂは、個々のパケットに同時フロー番号を付加した入力パケット列を出力し（ステップ１１９）、一連の同時フロー生成処理を終了する。 In this way, after the simultaneous flow FCi is generated by adding the simultaneous flow number to the selected flow Ft and the extraction flow Fa, the simultaneous flow generation unit 11B completes the processing for all the flows F in the input packet sequence. (Step 118).
Here, when there is an unprocessed flow F (step 118: NO), the process returns to step 111 to repeatedly execute the process of the next flow F. If the processing has been completed for all the flows (step 118: YES), the simultaneous flow generation unit 11B outputs an input packet sequence in which the simultaneous flow number is added to each packet (step 119), The simultaneous flow generation process ends.

［短時間フロー生成動作］
次に、図７を参照して、本実施の形態にかかるトラヒックデータ収集部１１における短時間フロー生成動作について説明する。図７は、短時間フロー生成処理を示すフローチャートである。 [Short-time flow generation operation]
Next, a short-time flow generation operation in the traffic data collection unit 11 according to the present embodiment will be described with reference to FIG. FIG. 7 is a flowchart showing the short-time flow generation process.

トラヒックデータ収集部１１の短時間フロー生成部１１Ｃは、同時フロー生成部１１Ｂで生成された同時フローから短時間フローを生成する際、図７の同時フロー生成処理を実行する。 The short-time flow generation unit 11C of the traffic data collection unit 11 executes the simultaneous flow generation processing of FIG. 7 when generating a short-time flow from the simultaneous flow generated by the simultaneous flow generation unit 11B.

短時間フロー生成部１１Ｃは、まず、同時フロー生成部１１Ｂで生成された同時フロー番号付き入力パケット列を取得し（ステップ１２０）、この入力パケット列から、短時間フローを生成していない未処理の同時フローＦＣｉを１つ選択し（ステップ１２１）、この選択同時フローＦＣｉの先頭から、短時間フロー番号を付加していない未処理のパケットを１つ選択する（ステップ１２２）。
次に、短時間フロー生成部１１Ｃは、選択同時フローＦＣｉに含まれる他のパケットのうち、選択パケットとの到着間隔が単位到着間隔ｔｓ内に到着した近接パケットが存在するか確認する（ステップ１２３）。 The short-time flow generation unit 11C first acquires the input packet sequence with the simultaneous flow number generated by the simultaneous flow generation unit 11B (step 120), and has not generated a short-time flow from this input packet sequence. One simultaneous flow FCi is selected (step 121), and one unprocessed packet to which no short flow number is added is selected from the head of the selected simultaneous flow FCi (step 122).
Next, the short-time flow generation unit 11C confirms whether there is a neighboring packet whose arrival interval with the selected packet has arrived within the unit arrival interval ts among other packets included in the selected simultaneous flow FCi (step 123). ).

ここで、近接パケットの存在を確認した場合（ステップ１２３：ＹＥＳ）、短時間フロー生成部１１Ｃは、これら近接パケットと選択パケットの両方に、同一の新たな短時間フロー番号を付加することにより新たな短時間フローＦＳｉｊを生成し（ステップ１２４）、これらパケットをフラグなどにより処理済みとする（ステップ１２５）。
一方、近接パケットの存在を確認できなかった場合（ステップ１２３：ＮＯ）、短時間フロー生成部１１Ｃは、選択パケットに新たな短時間フロー番号を付加することにより新たな短時間フローＦＳｉｊを生成し（ステップ１２６）、ステップ１２５へ移行して、選択パケットをフラグなどにより処理済みとする。 Here, when the presence of the proximity packet is confirmed (step 123: YES), the short-time flow generation unit 11C adds the same new short-time flow number to both the proximity packet and the selected packet to newly A short-time flow FSij is generated (step 124), and these packets are processed by a flag or the like (step 125).
On the other hand, when the presence of the proximity packet cannot be confirmed (step 123: NO), the short-time flow generation unit 11C generates a new short-time flow FSij by adding a new short-time flow number to the selected packet. (Step 126), the process proceeds to step 125, and the selected packet is processed by a flag or the like.

このようにして、近接パケットおよび選択パケット、または選択パケットに短時間フロー番号を付加した後、短時間フロー生成部１１Ｃは、選択同時フローのうち全てのパケットについて処理が終了したか確認する（ステップ１２７）。
ここで、選択同時フローに未処理のパケットが存在する場合（ステップ１２７：ＮＯ）、ステップ１２２に戻って次のパケットの処理を繰り返し実行する。また、選択同時フローの全てのパケットについて処理が終了している場合（ステップ１２７：ＹＥＳ）、短時間フロー生成部１１Ｃは、入力パケット列のうち全ての同時フローＣについて処理が終了したか確認する（ステップ１２８）。 After adding the short-time flow number to the proximity packet and the selection packet or the selection packet in this way, the short-time flow generation unit 11C confirms whether the processing has been completed for all the packets in the selected simultaneous flow (step) 127).
Here, when an unprocessed packet exists in the selected simultaneous flow (step 127: NO), the process returns to step 122 to repeatedly execute the process of the next packet. When processing has been completed for all packets of the selected simultaneous flow (step 127: YES), the short-time flow generation unit 11C confirms whether processing has been completed for all the simultaneous flows C in the input packet sequence. (Step 128).

ここで、未処理の同時フローが存在する場合（ステップ１２８：ＮＯ）、短時間フロー生成部１１Ｃは、ステップ１２１に戻って次の同時フローＦＣの処理を繰り返し実行する。また、全ての同時フローＦＣについて処理が終了している場合（ステップ１２８：ＹＥＳ）、短時間フロー生成部１１Ｃは、個々のパケットに短時間フロー番号と同時フロー番号を付加した入力パケット列を出力し（ステップ１２９）、一連の短時間フロー生成処理を終了する。 Here, when there is an unprocessed simultaneous flow (step 128: NO), the short-time flow generation unit 11C returns to step 121 and repeatedly executes the process of the next simultaneous flow FC. If processing has been completed for all the simultaneous flows FC (step 128: YES), the short-time flow generation unit 11C outputs an input packet sequence in which the short-time flow number and the simultaneous flow number are added to each packet. Then (step 129), a series of short-time flow generation processing ends.

［特徴量計算動作］
次に、図８を参照して、本実施の形態にかかるトラヒックデータ収集部１１における特徴量計算動作について説明する。図８は、特徴量計算処理を示すフローチャートである。 [Feature calculation]
Next, with reference to FIG. 8, the feature amount calculation operation in the traffic data collection unit 11 according to the present embodiment will be described. FIG. 8 is a flowchart showing the feature amount calculation process.

トラヒックデータ収集部１１の特徴量計算部１１Ｄは、短時間フロー生成部１１Ｃで生成された短時間フローについてフロー特徴量を計算する際、図８の特徴量計算処理を実行する。ここでは、短時間フローごとに、各特徴種別Ｙについてそれぞれフロー特徴量が計算される。 The feature value calculation unit 11D of the traffic data collection unit 11 executes the feature value calculation process of FIG. 8 when calculating the flow feature value for the short-time flow generated by the short-time flow generation unit 11C. Here, a flow feature amount is calculated for each feature type Y for each short-time flow.

特徴量計算部１１Ｄは、まず、短時間フロー生成部１１Ｃで生成された短時間フローと同時フロー番号を付加した入力パケット列を取得し（ステップ１３０）、この入力パケット列から、特徴量を計算していない未処理の同時フローＦＣｉを１つ選択し（ステップ１３１）、この選択同時フローＦＣｉの先頭から、特徴量を計算していない未処理の短時間フローＦＳｉｊを１つ選択する（ステップ１３２）。 First, the feature amount calculation unit 11D acquires an input packet sequence to which the short-time flow generated by the short-time flow generation unit 11C and the simultaneous flow number are added (step 130), and calculates the feature amount from this input packet sequence. One unprocessed simultaneous flow FCi that has not been selected is selected (step 131), and one unprocessed short-time flow FSij whose feature value has not been calculated is selected from the head of the selected simultaneous flow FCi (step 132). ).

次に、特徴量計算部１１Ｄは、選択短時間フローＦＳｉｊに含まれる各パケットについて、前述の図２に示した特徴種別Ｙごとにフロー特徴量ＲＳｉｊを計算し（ステップ１３３）、選択短時間フローＦＳｉｊをフラグなどにより処理済みとする（ステップ１３４）。
この後、特徴量計算部１１Ｄは、選択同時フローＦＣｉのうち全ての短時間フローＦＳｉｊについて処理が終了したか確認する（ステップ１３５）。 Next, the feature amount calculation unit 11D calculates a flow feature amount RSij for each feature type Y shown in FIG. 2 described above for each packet included in the selected short-time flow FSij (step 133), and the selected short-time flow. It is assumed that FSij has been processed by a flag or the like (step 134).
Thereafter, the feature amount calculation unit 11D confirms whether the processing has been completed for all the short-time flows FSij in the selected simultaneous flow FCi (step 135).

ここで、未処理の短時間フローＦＳｉｊが存在する場合（ステップ１３５：ＮＯ）、ステップ１３２へ戻って次の短時間フローＦＳｉｊの処理を繰り返し実行する。また、全ての短時間フローＦＳについて処理が終了している場合（ステップ１３５：ＹＥＳ）、特徴量計算部１１Ｄは、入力パケット列のうち全ての同時フローＦＣについて処理が終了したか確認する（ステップ１３６）。 Here, when there is an unprocessed short-time flow FSij (step 135: NO), the process returns to step 132 and the process of the next short-time flow FSij is repeatedly executed. Further, when the processing has been completed for all the short-time flows FS (step 135: YES), the feature amount calculating unit 11D confirms whether the processing has been completed for all the simultaneous flows FC in the input packet sequence (step). 136).

ここで、未処理の同時フローＦＣｉが存在する場合（ステップ１３６：ＮＯ）、ステップ１３１に戻って次の同時フローＦＣの処理を繰り返し実行する。また、全ての同時フローＦＣについて処理が終了している場合（ステップ１３６：ＹＥＳ）、特徴量計算部１１Ｄは、同時フロー番号ごとに、各特徴種別に関するフロー特徴量ＲＳｉｊを、各短時間フローＦＳｉｊに対応する時系列データとして出力し（ステップ１３７）、一連の特徴量計算処理を終了する。 Here, when there is an unprocessed simultaneous flow FCi (step 136: NO), the process returns to step 131 to repeatedly execute the process of the next simultaneous flow FC. If processing has been completed for all the simultaneous flows FC (step 136: YES), the feature amount calculation unit 11D calculates the flow feature amount RSij for each feature type for each simultaneous flow number, and each short-time flow FSij. Is output as time-series data corresponding to (step 137), and the series of feature amount calculation processing ends.

［トラヒックデータ収集部の動作例］
次に、図９〜図１２を参照して、本実施の形態にかかるトラヒックデータ収集部１１の動作例について説明する。図９は、入力パケット列の構成例である。図１０は、同時フローの構成例である。図１１は、短時間フローの構成例である。図１２は、トラヒックデータ収集部の動作例を示すシーケンス図である。 [Operation example of traffic data collection unit]
Next, an operation example of the traffic data collection unit 11 according to the present embodiment will be described with reference to FIGS. FIG. 9 is a configuration example of an input packet sequence. FIG. 10 is a configuration example of the simultaneous flow. FIG. 11 is a configuration example of a short-time flow. FIG. 12 is a sequence diagram illustrating an operation example of the traffic data collection unit.

トラヒックデータ収集部１１では、まず、フロー生成部１１Ａにより、通信網３０から収集した入力パケット列から、フローＦが生成される。この動作例では、送信元・送信先のＩＰアドレス、送信元・送信先のポート番号、プロトコル種別の組み合わせ（以下、５−ｔｕｐｌｅという）により、フローＦが識別されている。
ここで、入力パケット列のうち、５−ｔｕｐｌｅが同じであり、基準到着間隔ｔｆ以内のパケットＰ１，Ｐ４からなるフローＦ１、パケットＰ２からなるフローＦ２、パケットＰ３からなるフローＦ３の合わせて３つのフローが生成されたものとする。 In the traffic data collection unit 11, first, the flow F is generated from the input packet sequence collected from the communication network 30 by the flow generation unit 11A. In this operation example, the flow F is identified by a combination of a source / destination IP address, a source / destination port number, and a protocol type (hereinafter referred to as 5-tuple).
Here, among the input packet sequences, the 5-tuple is the same, and the flow F1, consisting of the packets P1, P4 within the reference arrival interval tf, the flow F2, consisting of the packet P2, and the flow F3 consisting of the packet P3 are three in total. Assume that a flow has been generated.

次に、同時フロー生成部１１Ｂにより、同一送信先ＩＰアドレスであるこれらフローＦ１〜Ｆ３について、同時フローＦＣへの統合可否が確認される。ここで、基準開始間隔ｔｃ＝５秒とした場合、Ｆ１の先頭パケットＰ１とＦ２の先頭パケットＰ２との到着間隔ｔ１２が１秒であり、ｔ１２＜ｔｃであることから、Ｆ１とＦ２は１つの同時フローＦＣ１に統合される。また、Ｆ３の先頭パケットＰ３との到着間隔ｔ１３が４秒であり、ｔ１３＜ｔｃであることから、Ｆ３も同時フローＦＣ１に統合される。これにより、Ｆ１〜Ｆ３がＦＣ１に統合されたことになる。 Next, the simultaneous flow generation unit 11B confirms whether or not the flows F1 to F3 having the same destination IP address can be integrated into the simultaneous flow FC. Here, if the reference start interval tc = 5 seconds, the arrival interval t12 between the leading packet P1 of F1 and the leading packet P2 of F2 is 1 second, and t12 <tc, so that F1 and F2 are one Integrated into the simultaneous flow FC1. Further, since the arrival interval t13 of F3 with the leading packet P3 is 4 seconds and t13 <tc, F3 is also integrated into the simultaneous flow FC1. As a result, F1 to F3 are integrated into FC1.

次に、短時間フロー生成部１１Ｃにより、同時フローＦＣ１について短時間フローＦＳの生成可否が確認される。ここで、単位到着間隔ｔｓ＝１秒とした場合、Ｐ１とＰ２の到着間隔ｔ１２が１秒であり、ｔ１２≦ｔｓであることから、Ｐ１とＰ２が１つの短時間フローＦＳ１１に統合される。また、Ｐ２とＰ３の到着間隔ｔ２３が３秒であり、ｔ２３＞ｔｓであることから、Ｐ２，Ｐ３は１つの短時間フローとして統合されず、Ｐ２は新たな短時間フローＦＳ１２となる。また、Ｐ３とＰ４の到着間隔ｔ３４が５秒であり、ｔ３４＞ｔｓであることから、Ｐ３，Ｐ４は１つの短時間フローとして統合されず、Ｐ３，Ｐ４はそれぞれ新たな短時間フローＦＳ１３，ＦＳ１４となる。 Next, whether or not the short-time flow FS can be generated for the simultaneous flow FC1 is confirmed by the short-time flow generation unit 11C. Here, when the unit arrival interval ts = 1 second, the arrival interval t12 between P1 and P2 is 1 second, and t12 ≦ ts, so that P1 and P2 are integrated into one short-time flow FS11. Further, since the arrival interval t23 between P2 and P3 is 3 seconds and t23> ts, P2 and P3 are not integrated as one short-time flow, and P2 becomes a new short-time flow FS12. In addition, since the arrival interval t34 between P3 and P4 is 5 seconds and t34> ts, P3 and P4 are not integrated as one short-time flow, and P3 and P4 are new short-time flows FS13 and FS14, respectively. It becomes.

［特徴ベクトル計算動作］
次に、図１３を参照して、本実施の形態にかかるトラヒックデータ収集部１１における特徴ベクトル計算動作について説明する。図１３は、特徴ベクトル計算処理を示すフローチャートである。 [Feature vector calculation operation]
Next, the feature vector calculation operation in the traffic data collection unit 11 according to the present embodiment will be described with reference to FIG. FIG. 13 is a flowchart showing the feature vector calculation process.

トラヒックデータ収集部１１の特徴ベクトル計算部１１Ｅは、特徴量計算部１１Ｄで計算されたフロー特徴量の時系列データから、同時フローごとにフロー特徴ベクトルを計算する際、図１３の特徴量計算処理を実行する。ここでは、同時フローＦＣｉごとに、各特徴種別Ｙに関するフロー特徴ベクトルが計算される。 When the feature vector calculation unit 11E of the traffic data collection unit 11 calculates the flow feature vector for each simultaneous flow from the time series data of the flow feature amount calculated by the feature amount calculation unit 11D, the feature amount calculation process of FIG. Execute. Here, a flow feature vector for each feature type Y is calculated for each simultaneous flow FCi.

特徴ベクトル計算部１１Ｅは、まず、特徴量計算部１１Ｄで計算された、同時フローＦＣごとに、各特徴種別Ｙに関するフロー特徴量の時系列データを取得して（ステップ１４０）、この時系列データから、フロー特徴ベクトルを計算していない未処理の同時フローＦＣｉを１つ選択する（ステップ１４１）。 First, the feature vector calculation unit 11E obtains time series data of the flow feature amount related to each feature type Y for each simultaneous flow FC calculated by the feature amount calculation unit 11D (step 140). Then, one unprocessed simultaneous flow FCi for which no flow feature vector is calculated is selected (step 141).

続いて、特徴ベクトル計算部１１Ｅは、選択同時フローＦＣｉについて、フロー特徴ベクトルを計算していない未処理の特徴種別Ｙｎを選択し（ステップ１４２）、これら選択同時フローＦＣｉおよび選択特徴種別Ｙｎに対応する時系列データを、所定の時間長を有するフレームＦＲごとに分割する（ステップ１４３）。フレームＦＲの時間長は、短時間フローごとに計算したフロー特徴量を、後述するケプストラム分析を実行できる数だけ含む十分な時間長に設定しておく。なお、同時フローＦＣ内に複数のフレームが存在していてもよい。また、同時フローの時間長がフレーム時間長よりも短い場合には、同時フローの時間長を使用して計算を行う。 Subsequently, the feature vector calculation unit 11E selects an unprocessed feature type Yn for which a flow feature vector has not been calculated for the selected simultaneous flow FCi (step 142), and corresponds to the selected simultaneous flow FCi and the selected feature type Yn. The time series data to be divided is divided for each frame FR having a predetermined time length (step 143). The time length of the frame FR is set to a sufficient time length including the flow feature amount calculated for each short-time flow as many as the number of cepstrum analysis described later can be executed. A plurality of frames may exist in the simultaneous flow FC. When the time length of the simultaneous flow is shorter than the frame time length, the calculation is performed using the time length of the simultaneous flow.

この後、特徴ベクトル計算部１１Ｅは、選択同時フローＦＣｉの選択特徴種別Ｙｎについて、フレームＦＲｒごとに得られたフロー特徴量を、それぞれのフロー特徴量の各要素に対して平均などの統計処理を行うことにより、選択同時フローＦＣの選択特徴種別Ｙｎに対応する１つのフロー特徴量を求める（ステップ１４４）。 Thereafter, the feature vector calculation unit 11E performs statistical processing such as averaging the flow feature amount obtained for each frame FRr on the selected feature type Yn of the selected simultaneous flow FCi for each element of the flow feature amount. By doing so, one flow feature amount corresponding to the selected feature type Yn of the selected simultaneous flow FC is obtained (step 144).

続いて、特徴ベクトル計算部１１Ｅは、分割して得られたフレームのうちから、フロー特徴量の統計量を計算していない未処理のフレームＦＲｉｎｒに属する時系列データを選択する（ステップ１４５）。 Subsequently, the feature vector calculation unit 11E selects time-series data belonging to an unprocessed frame FRinr for which the statistic of the flow feature value is not calculated from the frames obtained by division (step 145).

次に、特徴ベクトル計算部１１Ｅは、選択した時系列データを線形予測分析処理することにより、当該フレームの線形予測係数を計算する（ステップ１４６）
これら時系列データのうちの任意の標本値ｘｐについて、これに隣接する過去のｑ個（ｑは２以上の整数）の過去標本値ｘｐ−ｑ（ｑ＝１，２，…，Ｑ）との間に、過去標本値ｘｐ−ｑとある係数αｑをそれぞれ乗算して加え合わせた線形１次結合が成立し、次の式（１）が成り立つ。

Next, the feature vector calculation unit 11E calculates a linear prediction coefficient of the frame by performing a linear prediction analysis process on the selected time series data (step 146).
With respect to an arbitrary sample value xp of these time series data, the past sample values xp-q (q = 1, 2,..., Q) adjacent to the past q values (q is an integer of 2 or more). In the meantime, a linear primary combination obtained by multiplying and adding the past sample value xp-q and a certain coefficient αq is established, and the following equation (1) is established.

これら係数αｑは、線形予測係数と呼ばれ、線形予測残差ｘｐ−＾ｘｐの自乗平均が最小となるよう計算される。この線形予測係数の計算方法については、自己相関を求めて決定するレビンソン・ダービン法（Levinson-Durbin algorithm）などの公知の計算手法を用いればよい。 These coefficients αq are called linear prediction coefficients, and are calculated so that the root mean square of the linear prediction residuals xp− ^ xp is minimized. As a method for calculating the linear prediction coefficient, a known calculation method such as a Levinson-Durbin algorithm that obtains and determines an autocorrelation may be used.

この後、特徴ベクトル計算部１１Ｅは、得られた線形予測係数αｇをケプストラム（cepstrum）分析することにより、当該フレームＦＲｉｎｒの時系列データ、すなわちフロー特徴量に関するスペクトル包絡特性を示すケプストラム係数Ｃｐを求め、これらケプストラム係数から当該フレームＦＲｉｎｒに関するフロー特徴ベクトルＶｉｎｒを生成する（ステップ１４７）。 Thereafter, the feature vector calculation unit 11E performs cepstrum analysis on the obtained linear prediction coefficient αg, thereby obtaining time-series data of the frame FRinr, that is, a cepstrum coefficient Cp indicating a spectral envelope characteristic related to the flow feature amount. Then, a flow feature vector Vinr for the frame FRinr is generated from these cepstrum coefficients (step 147).

線形予測係数αｇは、次の線形システムを決定することに用いられ、その線形システムにインパルス列または白色雑音を印加することによって、トラヒックが生成されたものと見なす。これにより、スペクトル包絡の特性関数Ｈ（ｚ）は、次の式（２）で求められる。

The linear prediction coefficient αg is used to determine the next linear system, and it is assumed that traffic is generated by applying an impulse train or white noise to the linear system. Thereby, the characteristic function H (z) of the spectrum envelope is obtained by the following equation (2).

ケプストラム係数Ｃｐは、トラヒック信号スペクトラムのケプストラム領域の値であり、次の式（３）により、特性関数Ｈ（ｚ）をケプストラム領域に変換した特性関数Ｈ（ω）を求めた後、次の式（４）により、ケプストラム係数Ｃｐが求められる。

The cepstrum coefficient Cp is a value in the cepstrum region of the traffic signal spectrum. After obtaining the characteristic function H (ω) obtained by converting the characteristic function H (z) into the cepstrum region by the following equation (3), the following equation is obtained. From (4), the cepstrum coefficient Cp is obtained.

このケプストラム係数の計算方法については、フーリエ変換、逆フーリエ変換、離散フーリエ変換（ＤＦＴ）など、公知の計算手法を用いればよい。
実際には、ケプストラム係数の計算時に、予め設定した規定数ｄ個だけケプストラム係数を計算し、これらｄ個のケプストラム係数を並べることによりフロー特徴ベクトルＶｉｎｒ＝（Ｃ１，Ｃ２，…，Ｃｄ）を生成する。 As a method for calculating the cepstrum coefficient, a known calculation method such as Fourier transform, inverse Fourier transform, or discrete Fourier transform (DFT) may be used.
Actually, when calculating the cepstrum coefficients, a predetermined number d of cepstrum coefficients are calculated, and the flow feature vector Vinr = (C1, C2,..., Cd) is generated by arranging these d cepstrum coefficients. To do.

この後、特徴ベクトル計算部１１Ｅは、すべてのフレームＦＲについて処理が終了したか確認する（ステップ１４８）。
ここで、未処理のフレームＦＲｉｎｒが存在する場合（ステップ１４８：ＮＯ）、ステップ１４５へ戻って次のフレームＦＲｉｎｒの処理を繰り返し実行する。 Thereafter, the feature vector calculation unit 11E confirms whether the processing has been completed for all the frames FR (step 148).
If there is an unprocessed frame FRinr (step 148: NO), the process returns to step 145 to repeatedly execute the process of the next frame FRinr.

また、全てのフレームＦＲについて処理が終了している場合（ステップ１４８：ＹＥＳ）、特徴ベクトル計算部１１Ｅは、全ての特徴種別Ｙについて処理が終了したか確認する（ステップ１４９）。
ここで、未処理の特徴種別Ｙｎが存在する場合（ステップ１４９：ＮＯ）、ステップ１４２へ戻って次の同時フローＦＣｉの処理を繰り返し実行する。 If processing has been completed for all frames FR (step 148: YES), the feature vector calculation unit 11E confirms whether processing has been completed for all feature types Y (step 149).
Here, when there is an unprocessed feature type Yn (step 149: NO), the process returns to step 142 to repeatedly execute the process of the next simultaneous flow FCi.

この後、特徴ベクトル計算部１１Ｅは、全ての同時フローＦＣについて処理が終了したか確認する（ステップ１５０）。
ここで、未処理の同時フローＦＣｉが存在する場合（ステップ１５０：ＮＯ）、ステップ１４１へ戻って次の同時フローＦＣｉの処理を繰り返し実行する。 Thereafter, the feature vector calculation unit 11E confirms whether the processing has been completed for all the simultaneous flows FC (step 150).
Here, when there is an unprocessed simultaneous flow FCi (step 150: NO), the process returns to step 141 and the process of the next simultaneous flow FCi is repeatedly executed.

また、全ての同時フローＦＣについて処理が終了している場合（ステップ１５０：ＹＥＳ）、特徴ベクトル計算部１１Ｅは、同時フロー番号ｉごとに、各特徴種別Ｙｎのフロー特徴ベクトルＶｉｎを出力し（ステップ１５１）、一連の特徴ベクトル計算処理を終了する。 If the processing has been completed for all the simultaneous flows FC (step 150: YES), the feature vector calculation unit 11E outputs the flow feature vector Vin of each feature type Yn for each simultaneous flow number i (step 150). 151) A series of feature vector calculation processing is terminated.

［類似度計算処理］
次に、図１４および図１５を参照して、本実施の形態にかかるサービスカテゴリ分類部１３における類似度計算動作について説明する。図１４は、類似度計算処理を示すフローチャートである。図１５は、類似度計算処理を示す説明図である。 [Similarity calculation processing]
Next, the similarity calculation operation in the service category classification unit 13 according to the present embodiment will be described with reference to FIGS. 14 and 15. FIG. 14 is a flowchart showing similarity calculation processing. FIG. 15 is an explanatory diagram showing similarity calculation processing.

サービスカテゴリ分類部１３の類似度計算部１３Ａ〜１３Ｎは、同時フローＦＣｉの種別類似度Ｓｉを計算する際、それぞれの特徴種別Ｙについて、図１４の類似度計算処理を実行する。ここでは、類似度計算部１３Ａにおいて、特徴種別Ｙのうち特徴種別Ｙｎに関する種別類似度Ｓｉｎを計算する場合を例として説明する。実際には、類似度計算部１３Ａ〜１３Ｎでは、それぞれに対応する特徴種別Ｙｎについて、図１４の類似度計算処理が並行して実行される。 The similarity calculation units 13A to 13N of the service category classification unit 13 execute the similarity calculation process of FIG. 14 for each feature type Y when calculating the type similarity Si of the simultaneous flow FCi. Here, the case where the similarity calculation unit 13A calculates the type similarity Sin related to the feature type Yn among the feature types Y will be described as an example. Actually, in the similarity calculation units 13A to 13N, the similarity calculation processing of FIG. 14 is executed in parallel for the corresponding feature type Yn.

類似度計算部１３Ａは、まず、特徴量ＤＢ１２から、各サービスカテゴリＸに関する特徴種別Ｙｎの代表特徴ベクトルＶｎをそれぞれ取得し（ステップ１６０）、分類対象となる同時フローＦＣｉのうち特徴種別Ｙｎのフロー特徴ベクトルＶｉｎをそれぞれ取得する（ステップ１６１）。 The similarity calculation unit 13A first obtains the representative feature vector Vn of the feature type Yn for each service category X from the feature amount DB 12 (step 160), and the flow of the feature type Yn among the simultaneous flows FCi to be classified. Each feature vector Vin is acquired (step 161).

続いて、類似度計算部１３Ａは、各サービスカテゴリＸのうちから種別類似度Ｓｉｍｎを計算していない未処理のサービスカテゴリＸｍを１つ選択し（ステップ１６２）、代表特徴ベクトルＶｍに含まれるクラスタｋごとに、分類対象となる同時フローＦＣｉのフロー特徴ベクトルＶｉｎと代表特徴ベクトルＶｍｋｎとの差分Ｄｋ＝｜Ｖｉｎ−Ｖｍｋｎ｜を計算する（ステップ１６３）。ここでは、特徴ベクトルのうち同一要素番号に対応する要素（ケプストラム係数）同士について差分を計算し、要素番号ごとに得られた差分を集計することにより差分Ｄｋを計算すればよい。 Subsequently, the similarity calculation unit 13A selects one unprocessed service category Xm for which the type similarity Simn has not been calculated from each service category X (step 162), and the clusters included in the representative feature vector Vm For each k, a difference Dk = | Vin−Vmkn | between the flow feature vector Vin of the simultaneous flow FCi to be classified and the representative feature vector Vmkn is calculated (step 163). Here, the difference Dk may be calculated by calculating a difference between elements (cepstrum coefficients) corresponding to the same element number in the feature vector and totaling the differences obtained for each element number.

次に、類似度計算部１３Ａは、クラスタＺｋごとの差分Ｄｋに基づいて、分類対象となる同時フローＦＣｉと選択サービスカテゴリＸｍとの、特徴種別Ｙｎに関する種別類似度Ｓｉｍｎを計算する（ステップ１６４）。この種別類似度Ｓｉｍｎについては、例えば、クラスタＺｋごとの差分Ｄｋについて統計処理を行うことにより求めた、最小値、平均値、合計値などの統計値を用いればよい。 Next, the similarity calculation unit 13A calculates the type similarity Simn regarding the feature type Yn between the simultaneous flow FCi to be classified and the selected service category Xm based on the difference Dk for each cluster Zk (step 164). . For this type similarity Sim, for example, a statistical value such as a minimum value, an average value, or a total value obtained by performing statistical processing on the difference Dk for each cluster Zk may be used.

この後、類似度計算部１３Ａは、選択サービスカテゴリＸｍについてフラグなどにより処理済みとし（ステップ１６５）、全てのサービスカテゴリＸについて処理済みか確認する（ステップ１６６）。
ここで、未処理のサービスカテゴリＸｍが存在する場合（ステップ１６６：ＮＯ）、ステップ１６２に戻って次のサービスカテゴリＸｍの処理を繰り返し実行する。また、全てのサービスカテゴリＸについて処理が終了している場合（ステップ１６６：ＹＥＳ）、類似度計算部１３Ａは、同時フローＦＣｉと各サービスカテゴリＸｍとの、特徴種別Ｙｎに関する種別類似度Ｓｉｍｎを出力し（ステップ１６７）、一連の類似度計算処理を終了する。 Thereafter, the similarity calculation unit 13A determines that the selected service category Xm has been processed with a flag or the like (step 165), and checks whether all the service categories X have been processed (step 166).
Here, when there is an unprocessed service category Xm (step 166: NO), the process returns to step 162 and the process of the next service category Xm is repeatedly executed. If processing has been completed for all service categories X (step 166: YES), the similarity calculation unit 13A outputs the type similarity Simn regarding the feature type Yn between the simultaneous flow FCi and each service category Xm. (Step 167), and the series of similarity calculation processing ends.

［分類判定処理］
次に、図１６および図１７を参照して、本実施の形態にかかるサービスカテゴリ分類部１３における分類判定動作について説明する。図１６は、分類判定処理を示すフローチャートである。図１７は、分類判定処理を示す説明図である。 [Classification processing]
Next, with reference to FIG. 16 and FIG. 17, the classification determination operation in the service category classification unit 13 according to the present embodiment will be described. FIG. 16 is a flowchart showing the classification determination process. FIG. 17 is an explanatory diagram illustrating the classification determination process.

サービスカテゴリ分類部１３の類似度統合部１３Ｐと分類判定部１３Ｑは、分類対象となる同時フローＦＣｉをサービスカテゴリＸのいずれかに分類する際、図１６の類似度計算処理を実行する。ここでは、類似度統合部１３Ｐと分類判定部１３Ｑにおいて、入力パケット列から生成された任意の同時フローＦＣｉの分類判定を行う場合を例として説明する。なお、類似度統合部１３Ｐと分類判定部１３Ｑでの分類判定処理は、入力パケット列から生成された、分類対象となるすべての同時フローＦＣについて、繰り返し実行される。 When classifying the simultaneous flow FCi to be classified into any of the service categories X, the similarity integrating unit 13P and the classifying unit 13Q of the service category classifying unit 13 execute the similarity calculating process of FIG. Here, a case will be described as an example in which the similarity determination unit 13P and the classification determination unit 13Q perform classification determination of an arbitrary simultaneous flow FCi generated from an input packet sequence. Note that the classification determination process in the similarity integration unit 13P and the classification determination unit 13Q is repeatedly executed for all the simultaneous flows FC to be classified, which are generated from the input packet sequence.

類似度統合部１３Ｐは、まず、類似度計算部１３Ａ〜１３Ｎから、すべての特徴種別Ｙについて、サービスカテゴリＸ別の種別類似度Ｓｉｍｎを取得して（ステップ１７０）、サービスカテゴリＸｍごとに、例えば種別類似度Ｓｉｍｎの合計値など、これら種別類似度Ｓｉｍｎを統計処理することにより、すべての特徴種別Ｙを考慮した、同時フローＦＣｉに関する、サービスカテゴリＸｍごとのカテゴリ類似度Ｓｉｍを計算する（ステップ１７１）。 First, the similarity integration unit 13P acquires the type similarity Sim by service category X for all feature types Y from the similarity calculation units 13A to 13N (step 170), and for example, for each service category Xm, for example, By statistically processing these type similarity Sim, such as the total value of type similarity Sim, the category similarity Sim for each service category Xm is calculated for the simultaneous flow FCi in consideration of all feature types Y (step 171). ).

次に、分類判定部１３Ｑは、類似度統合部１３Ｐで得られたサービスカテゴリＸｍごとのカテゴリ類似度Ｓｉｍのうちから、その最小値、すなわち最も類似性の高い最高類似度Ｓｉｈを選択し（ステップ１７２）、最高類似度Ｓｉｈと有意しきい値Ｓｔｈとを比較することにより、有意範囲に含まれているか確認する（ステップ１７３）。
ここで、最高類似度Ｓｉｈが有意しきい値Ｓｔｈより小さく、有意範囲に含まれている場合（ステップ１７３：ＹＥＳ）、分類判定部１３Ｑは、分類対象となる同時フローＦＣｉに対して、最高類似度Ｓｉｈと対応するサービスカテゴリＸｍを付加することにより、同時フローＦＣｉをサービスカテゴリＸｍに分類する（ステップ１７４）。 Next, the classification determination unit 13Q selects the minimum value, that is, the highest similarity Sih having the highest similarity among the category similarity Sims for each service category Xm obtained by the similarity integration unit 13P (step S40). 172) By comparing the highest similarity Sih with the significant threshold value Sth, it is confirmed whether it is included in the significant range (step 173).
Here, when the highest similarity Sih is smaller than the significance threshold value Sth and is included in the significance range (step 173: YES), the classification determination unit 13Q determines the highest similarity for the simultaneous flow FCi to be classified. The simultaneous flow FCi is classified into the service category Xm by adding the service category Xm corresponding to the degree Sih (step 174).

一方、最高類似度Ｓｉｈが有意しきい値Ｓｔｈ以上で、有意範囲に含まれていない場合（ステップ１７３：ＮＯ）、分類判定部１３Ｑは、分類対象となる同時フローＦＣｉに対して、新たなサービスカテゴリＸｎｅｗを付加することにより、同時フローＦＣｉを既存のサービスカテゴリではなく、新たなサービスカテゴリＸｎｅｗに分類する（ステップ１７５）。
このようにして、分類対象となる同時フローＦＣｉをいずれかのサービスカテゴリＸに分類した後、分類判定部１３Ｑは、分類したサービスカテゴリＸに対応して登録されているサービスカテゴリ名と、入力パケット列から取得した当該同時フローＦＣｉに関する送信先アドレスや、入力パケット列から計算したトラヒック量などのトラヒック情報とを管理端末２０へ出力して画面表示し（ステップ１７６）、一連の分類判定処理を終了する。 On the other hand, when the highest similarity Sih is not less than the significance threshold value Sth and not included in the significance range (step 173: NO), the classification determination unit 13Q provides a new service to the simultaneous flow FCi to be classified. By adding the category Xnew, the simultaneous flow FCi is classified into the new service category Xnew instead of the existing service category (step 175).
After classifying the simultaneous flow FCi to be classified into one of the service categories X in this way, the classification determination unit 13Q includes the service category name registered corresponding to the classified service category X, and the input packet. The destination address related to the simultaneous flow FCi acquired from the queue and the traffic information such as the traffic volume calculated from the input packet string are output to the management terminal 20 and displayed on the screen (step 176), and the series of classification determination processes is completed. To do.

［第１の実施の形態の効果］
このように、本実施の形態では、一定期間内に発生した複数のフローを１つの同時フローとして統合し、さらに１つの同時フローに含まれる、一定の単位到着間隔内に到着した複数のパケットを、１つの短時間フローとして統合して、これら短時間フローごとにフロー特徴量を計算し、この後、これら短時間フローごとのフロー特徴量からなる時系列データから、同時フローごとにフロー特徴量の時間的変化の特徴を示すフロー特徴ベクトルを計算して、既存のサービスカテゴリとの類似性を判定するようにしたので、同時フローの特徴量を時間軸上の変動に着目してフローの類似性を判定することができる。このため、元のフロー特徴量に基づき類似性を判定する場合と比較して、トラヒックの時間的な変動を利用した識別を行うことができる。 [Effect of the first embodiment]
As described above, in the present embodiment, a plurality of flows generated within a certain period are integrated as one simultaneous flow, and a plurality of packets arriving within a certain unit arrival interval included in one simultaneous flow are further combined. The flow feature value is calculated for each short-time flow, integrated as one short-time flow, and then the flow feature value for each simultaneous flow from the time-series data consisting of the flow feature value for each short-time flow. Since the flow feature vector indicating the characteristics of the time change of the network is calculated and the similarity with the existing service category is judged, the flow similarity is focused on the fluctuation of the feature quantity of the simultaneous flow on the time axis. Gender can be determined. For this reason, compared with the case where similarity is determined based on the original flow feature amount, it is possible to perform identification using temporal variation of traffic.

また、サービスカテゴリごとの類似性のうち、最も高い類似性を示す類似度が有意範囲に含まれない場合には、新たなサービスクラスに分類するようにしたので、既存サービスクラスでは含まれていない新たなアプリケーションのフローについても、正確に分類することができる。このため、いずれか類似性の低いサービスクラスに分類してしまうような誤分類の発生を抑止することが可能となる。 In addition, among the similarities for each service category, when the similarity indicating the highest similarity is not included in the significance range, it is classified into a new service class, so it is not included in the existing service class. New application flows can also be accurately classified. For this reason, it is possible to suppress the occurrence of misclassification that would result in any of the service classes having low similarity.

したがって、本実施の形態をデータ通信サービスのネットワーク管理に適用すれば、映像フローのような品質条件が厳しいフローに対して優先制御を行う等のトラヒック制御を用いることで、同じ設備量でよりユーザ満足度の高いネットワークを提供することができる。また、アプリケーションごとのトラヒック需要量を正確に把握できるため、高精度なトラヒック需要の予測を行うことができ、結果として、ユーザ満足度の更なる向上を実現することが可能となる。 Therefore, if this embodiment is applied to network management of a data communication service, traffic control such as priority control for a flow having a severe quality condition such as a video flow can be used, so that the user can use the same amount of equipment. A highly satisfactory network can be provided. Further, since the traffic demand amount for each application can be accurately grasped, it is possible to predict the traffic demand with high accuracy, and as a result, it is possible to further improve the user satisfaction.

また、本実施の形態によれば、フロー特徴ベクトルを計算する際、時系列データを線形予測分析を行うことにより同時フローのフロー特徴量を示す伝達関数の線形予測係数を求め、これら線形予測係数をケプストラム分析することにより、当該同時フローのフロー特徴量に関するスペクトル包絡特性を示すケプストラム係数を求め、これらケプストラム係数からフロー特徴ベクトルを生成するようにしたので、極めて高い精度で、フロー特徴量の時間的変化の特徴を示すフロー特徴ベクトルを計算することができる。 Further, according to the present embodiment, when calculating the flow feature vector, linear prediction coefficients of the transfer function indicating the flow feature amount of the simultaneous flow are obtained by performing linear prediction analysis on the time series data, and these linear prediction coefficients are calculated. Cepstrum analysis is performed to obtain the cepstrum coefficients indicating the spectral envelope characteristics related to the flow features of the simultaneous flow, and the flow feature vectors are generated from these cepstrum coefficients. A flow feature vector indicative of the characteristic of the dynamic change can be calculated.

また、本実施の形態によれば、各フローを同時フローへ統合する際、パケットの送信先ＩＰアドレスが同一のフローを同時フローへ統合するようにしたので、映像データ、メタデータ、プレイヤーを異なるサーバから受信するような映像配信サービスであっても、当該フローを、対応するサービスカテゴリへ正確に分類できる。 In addition, according to the present embodiment, when integrating each flow into a simultaneous flow, a flow having the same packet destination IP address is integrated into the simultaneous flow, so that video data, metadata, and players are different. Even for a video distribution service received from a server, the flow can be accurately classified into a corresponding service category.

また、本実施の形態によれば、各短時間フローのフロー特徴量を計算する際、パケットのヘッダ情報に含まれるパケットのパケット数、パケットサイズ、または到着間隔を統計処理することによりフロー特徴量を計算するようにしたので、パケットのペイロード部に格納されているユーザ情報を解析する必要がない。このため、プライバシーやセキュリティの面でも高い安全性が得られるとともに、ペイロード部が暗号化されているフローについても精度良く分類することが可能となる。 Further, according to the present embodiment, when calculating the flow feature amount of each short-time flow, the flow feature amount is calculated by statistically processing the number of packets, the packet size, or the arrival interval of the packets included in the packet header information. Therefore, it is not necessary to analyze the user information stored in the payload portion of the packet. For this reason, it is possible to obtain high safety in terms of privacy and security, and it is possible to classify the flow in which the payload portion is encrypted with high accuracy.

［第２の実施の形態］
次に、図１８を参照して、本発明の第２の実施の形態にかかるフロー分類システム１０について説明する。図１８は、第２の実施の形態にかかるフロー分類システムの構成を示すブロック図である。 [Second Embodiment]
Next, the flow classification system 10 according to the second exemplary embodiment of the present invention will be described with reference to FIG. FIG. 18 is a block diagram illustrating a configuration of a flow classification system according to the second embodiment.

本実施の形態は、第１の実施の形態と比較して、学習処理部１４が追加されている。
学習処理部１４は、サービスカテゴリＸが既知である、複数のフレームＦＲに関するフロー特徴ベクトルＶや、サービスカテゴリ分類部１３で得られた分類判定後のフロー特徴ベクトルＶからなる学習データに基づいて、各サービスカテゴリＸｍごとに、特徴種別Ｙｎの代表特徴ベクトルＶｍｎを計算して、特徴量ＤＢ１２へ登録する機能を有している。 In the present embodiment, a learning processing unit 14 is added as compared with the first embodiment.
The learning processing unit 14 is based on learning data including the flow feature vector V related to a plurality of frames FR for which the service category X is known and the flow feature vector V after classification determination obtained by the service category classification unit 13. For each service category Xm, a representative feature vector Vmn of the feature type Yn is calculated and registered in the feature amount DB 12.

図１９は、学習処理部の構成を示すブロック図である。
学習処理部１４には、主な処理部として、教師データ記憶部１４Ａ、サービスカテゴリ作成部１４Ｂ、および代表特徴ベクトル計算部１４Ｃが設けられている。 FIG. 19 is a block diagram illustrating a configuration of the learning processing unit.
The learning processing unit 14 includes a teacher data storage unit 14A, a service category creation unit 14B, and a representative feature vector calculation unit 14C as main processing units.

教師データ記憶部１４Ａは、ハードディスクなどの記憶装置からなり、短時間フローＦＳｉｊを構成する教師パケット列と、当該短時間フローＦＳｉｊの分類先となるアプリケーションとの組が、教師データとして登録されている。
この教師データとしては、サービスカテゴリ分類部１３の分類判定部１３Ｑで分類した分類判定後の短時間フローとそのサービスカテゴリとを用いてもよい。 The teacher data storage unit 14A includes a storage device such as a hard disk, and a set of a teacher packet sequence constituting the short-time flow FSij and an application that is a classification destination of the short-time flow FSij is registered as teacher data. .
As the teacher data, the short-time flow after the classification determination classified by the classification determination unit 13Q of the service category classification unit 13 and the service category may be used.

サービスカテゴリ作成部１４Ｂは、トラヒックデータ収集部１１で通信網３０からの入力パケット列から短時間フローＦＳを生成するのと同様にして、教師データ記憶部１４Ａから読み出した教師パケットから短時間フローＦＳを生成するとともに、当該短時間フローＦＳのフロー特徴量ＲＳを、特徴種別Ｙごとに計算する機能と、これら教師データに含まれるアプリケーションのうちから選択した２つのアプリケーションごとに、これらアプリケーションに関するフロー特徴量の分散、平均値、および要素数に基づいて、当該アプリケーション間のクラス内分散σｗ²とクラス間分散σｂ²との分散比Ｊを、特徴種別Ｙごとに計算する機能と、得られた分散比Ｊの最小値Ｊｍｉｎが判定しきい値Ｊｔｈより小さい場合には、これらアプリケーションを同一サービスカテゴリに分類し、得られた分散比が判定しきい値以上の場合には、これらアプリケーションを別個のサービスカテゴリに分類することにより、サービスカテゴリＸを作成する機能とを有している。 The service category creation unit 14B generates the short-time flow FS from the teacher packet read from the teacher data storage unit 14A in the same manner as the traffic data collection unit 11 generates the short-time flow FS from the input packet sequence from the communication network 30. For each of the two types of applications selected from the functions included in the teacher data and the function for calculating the flow feature amount RS of the short-time flow FS for each feature type Y. the amount of dispersion, the mean value, and based on the number of elements, the variance ratio J of within-class variance .sigma.w ² and interclass variance .sigma.b ² between the application, the function of calculating for each feature type Y, the resultant dispersion If the minimum value Jmin of the ratio J is smaller than the judgment threshold Jth, these applications A service category X is created by classifying the applications into the same service category and, when the obtained distribution ratio is equal to or greater than the determination threshold, classifying these applications into separate service categories. ing.

代表特徴ベクトル計算部１４Ｃは、サービスカテゴリ作成部１４Ｂで作成された各サービスカテゴリＸｍについて、当該サービスカテゴリＸｍに分類されたアプリケーションのフロー特徴量ＲＳから、これらフロー特徴量ＲＳの時間的変化の特徴を示すフロー特徴ベクトルＶｍを、特徴種別Ｙごとに計算して、特徴量ＤＢ１２へ保存する機能とを有している。 For each service category Xm created by the service category creation unit 14B, the representative feature vector calculation unit 14C uses the flow feature quantity RS of the application classified into the service category Xm, and features of temporal changes of the flow feature quantity RS. Is calculated for each feature type Y, and is stored in the feature value DB 12.

［サービスカテゴリ作成動作］
次に、図２０を参照して、本実施の形態にかかる学習処理部１４におけるサービスカテゴリ作成動作について説明する。図２０は、サービスカテゴリ作成処理を示すフローチャートである。 [Service category creation operation]
Next, a service category creation operation in the learning processing unit 14 according to the present embodiment will be described with reference to FIG. FIG. 20 is a flowchart showing service category creation processing.

学習処理部１４のサービスカテゴリ作成部１４Ｂは、教師データからサービスカテゴリを作成する際、図２０のサービスカテゴリ作成処理を実行する。ここでは、サービスカテゴリ作成部１４Ｂにおいて、教師データ記憶部１４Ａの教師パケット列からの短時間フローＦＳの生成とそのフロー特徴量ＲＳの計算がすでに実行されて、教師データ記憶部１４Ａに登録されていることを前提として、特徴種別Ｙのうち特徴種別ＹｎについてサービスカテゴリＸを作成する場合を例として説明する。 The service category creation unit 14B of the learning processing unit 14 executes the service category creation process of FIG. 20 when creating a service category from teacher data. Here, in the service category creation unit 14B, the generation of the short-time flow FS from the teacher packet sequence in the teacher data storage unit 14A and the calculation of the flow feature quantity RS have already been executed and registered in the teacher data storage unit 14A. The case where the service category X is created for the feature type Yn among the feature types Y will be described as an example.

サービスカテゴリ作成部１４Ｂは、まず、教師データ記憶部１４Ａから、アプリケーション別で、各短時間フローＦＳのフロー特徴量ＲＳを取得して（ステップ２００）、サービスカテゴリ作成処理を実行していない未処理の特徴種別Ｙｎを１つ選択し（ステップ２０１）、これらアプリケーションの２つの組のうちから、サービスカテゴリ作成処理を実行していない未処理のアプリケーション組を１つ選択する（ステップ２０２）。 The service category creation unit 14B first acquires the flow feature quantity RS of each short-time flow FS from the teacher data storage unit 14A for each application (step 200), and has not executed the service category creation process. One feature type Yn is selected (step 201), and one unprocessed application group for which the service category creation processing is not executed is selected from the two sets of applications (step 202).

次に、サービスカテゴリ作成部１４Ｂは、選択アプリケーション組であるそれぞれのアプリケーションについて、当該アプリケーションに分類されている各短時間フローＦＳのフロー特徴量ＲＳから、その分散、平均値、および要素数に基づいて、当該アプリケーション間のクラス内分散σｗ²とクラス間分散σｂ²とその分散比Ｊを計算する（ステップ２０３）。 Next, the service category creation unit 14B, for each application that is a selected application set, based on the variance, average value, and number of elements from the flow feature quantity RS of each short-time flow FS classified into the application. Te is calculated within-class variance between the application .sigma.w ² and interclass variance .sigma.b ² and its variance ratio J (step 203).

この際、アプリケーションｘに関するフロー特徴量ＲＳの分散をσｘ、要素数をωｘ、平均値をｍｘとし、アプリケーションｙに関するフロー特徴量ＲＳの分散をσｙ、要素数をωｙ、平均値をｍｙとした場合、アプリケーションｘ，ｙのクラス内分散σｗ²、クラス間分散σｂ²、およびこれらクラス内分散σｗ²とクラス間分散σｂ²の分散比Ｊは、次の式（５）、式（６）、および式（７）で計算される。

In this case, when the variance of the flow feature quantity RS related to the application x is σx, the number of elements is ωx, the average value is mx, the variance of the flow feature quantity RS related to the application y is σy, the number of elements is ωy, and the average value is my , the application x, intraclass variance .sigma.w ² of y, interclass variance .sigma.b ^2, and dispersion ratio J of within-class variance .sigma.w ² and interclass variance .sigma.b ^2, the following equation (5), equation (6), and Calculated by equation (7).

この後、サービスカテゴリ作成部１４Ｂは、選択アプリケーション組についてフラグなどにより処理済みとし（ステップ２０４）、全てのアプリケーション組について処理済みか確認する（ステップ２０５）。
ここで、未処理のアプリケーション組が存在する場合（ステップ２０５：ＮＯ）、ステップ２０２に戻って次のサービスカテゴリＸｍの処理を繰り返し実行する。 Thereafter, the service category creation unit 14B determines that the selected application group has been processed with a flag or the like (step 204), and checks whether all application groups have been processed (step 205).
Here, when there is an unprocessed application group (step 205: NO), the process returns to step 202 and the process of the next service category Xm is repeatedly executed.

また、全てのアプリケーション組について処理が終了している場合（ステップ２０５：ＹＥＳ）、サービスカテゴリ作成部１４Ｂは、アプリケーション組ごとに計算した分散比Ｊのうちから最小分散比Ｊｍｉｎ選択し、予め設定されている判定しきい値Ｊｔｈと比較する（ステップ２０６）。
ここで、最小分散比Ｊｍｉｎが判定しきい値Ｊｔｈより小さい場合（ステップ２０６：ＹＥＳ）、サービスカテゴリ作成部１４Ｂは、最小分散比Ｊｍｉｎが得られたアプリケーション組の２つのアプリケーションを１つのアプリケーションに統合し（ステップ２０７）、ステップ２０２へ移行して、次のアプリケーション組の処理を繰り返し実行する。 If processing has been completed for all application groups (step 205: YES), the service category creation unit 14B selects the minimum dispersion ratio Jmin from the distribution ratios J calculated for each application group, and is set in advance. Is compared with the determination threshold value Jth (step 206).
Here, when the minimum variance ratio Jmin is smaller than the determination threshold value Jth (step 206: YES), the service category creation unit 14B integrates two applications of the application group from which the minimum variance ratio Jmin is obtained into one application. (Step 207), the process proceeds to Step 202, and the process of the next application set is repeatedly executed.

一方、最小分散比Ｊｍｉｎが判定しきい値Ｊｔｈ以上の場合（ステップ２０６：ＮＯ）、サービスカテゴリ作成部１４Ｂは、選択アプリケーション組の２つのアプリケーションを別個のサービスクラスと判定し、アプリケーションの統合は行わない。 On the other hand, when the minimum variance ratio Jmin is equal to or greater than the determination threshold value Jth (step 206: NO), the service category creation unit 14B determines that the two applications of the selected application group are separate service classes, and the applications are integrated. Absent.

このようにして、各アプリケーション組についてアプリケーションの統合可否を判定した後、サービスカテゴリ作成部１４Ｂは、選択特徴種別Ｙｎについてフラグなどにより処理済みとし（ステップ２０７）、全ての特徴種別Ｙについて処理済みか確認する（ステップ２０８）。
ここで、未処理の特徴種別Ｙｎが存在する場合（ステップ２０８：ＮＯ）、ステップ２０１に戻って次の特徴種別Ｙｎの処理を繰り返し実行する。 After determining whether or not application integration is possible for each application group in this way, the service category creation unit 14B determines that the selected feature type Yn has been processed with a flag or the like (step 207), and all feature types Y have been processed. Confirm (step 208).
If there is an unprocessed feature type Yn (step 208: NO), the process returns to step 201 to repeatedly execute the process for the next feature type Yn.

また、全ての特徴種別Ｙｎについて処理が終了している場合（ステップ２０８：ＹＥＳ）、サービスカテゴリ作成部１４Ｂは、統合されたアプリケーションおよび統合できなかったアプリケーションごとに、サービスカテゴリをそれぞれ作成する（ステップ２０９）。この後、サービスカテゴリ作成部１４Ｂは、これらサービスカテゴリ別で、各短時間フローＦＳのフロー特徴量ＲＳを出力し（ステップ２１０）、一連のサービスカテゴリ作成処理を終了する。 If processing has been completed for all feature types Yn (step 208: YES), the service category creation unit 14B creates a service category for each of the integrated application and the application that could not be integrated (step). 209). Thereafter, the service category creation unit 14B outputs the flow feature quantity RS of each short-time flow FS for each service category (step 210), and ends the series of service category creation processing.

［代表特徴ベクトル計算動作］
次に、図２１を参照して、本実施の形態にかかる学習処理部１４における代表特徴ベクトル計算動作について説明する。図２１は、代表特徴ベクトル計算処理を示すフローチャートである。 [Representative feature vector calculation operation]
Next, a representative feature vector calculation operation in the learning processing unit 14 according to the present embodiment will be described with reference to FIG. FIG. 21 is a flowchart showing representative feature vector calculation processing.

学習処理部１４の代表特徴ベクトル計算部１４Ｃは、サービスカテゴリ作成部１３Ｂで作成された各サービスカテゴリの代表特徴ベクトルを計算する際、図２１の代表特徴ベクトル計算処理を実行する。 The representative feature vector calculation unit 14C of the learning processing unit 14 executes the representative feature vector calculation process of FIG. 21 when calculating the representative feature vector of each service category created by the service category creation unit 13B.

代表特徴ベクトル計算部１４Ｃは、まず、サービスカテゴリ作成部１３Ｂから、サービスカテゴリ別で、各短時間フローＦＳのフロー特徴量ＲＳを取得し（ステップ２２０）、代表特徴ベクトル計算処理を実行していない未処理のサービスカテゴリＸｍを１つ選択し（ステップ２２１）、代表特徴ベクトル計算処理を実行していない未処理の特徴種別Ｙｎを１つ選択する（ステップ２２２）。 First, the representative feature vector calculation unit 14C acquires the flow feature quantity RS of each short-time flow FS for each service category from the service category creation unit 13B (step 220), and does not execute the representative feature vector calculation process. One unprocessed service category Xm is selected (step 221), and one unprocessed feature type Yn for which the representative feature vector calculation processing is not executed is selected (step 222).

次に、代表特徴ベクトル計算部１４Ｃは、取得したフロー特徴量ＲＳのうちから、選択サービスカテゴリＸｍで選択特徴種別Ｙｎに関するフロー特徴量ＲＳｍｎを選択し（ステップ２２３）、これらフロー特徴量ＲＳｍｎからなる時系列データから、選択サービスカテゴリＸｍで選択特徴種別Ｙｎに関するフロー特徴ベクトルＶｍｎを計算する（ステップ２２４）。フロー特徴ベクトルＶｍｎの計算については、前述した図１３の特徴ベクトル計算動作で用いたケプストラム分析を用いればよい。 Next, the representative feature vector calculation unit 14C selects the flow feature amount RSmn related to the selected feature type Yn in the selected service category Xm from the acquired flow feature amounts RS (step 223), and includes the flow feature amounts RSmn. From the time series data, the flow feature vector Vmn for the selected feature type Yn is calculated in the selected service category Xm (step 224). For the calculation of the flow feature vector Vmn, the cepstrum analysis used in the above-described feature vector calculation operation of FIG. 13 may be used.

続いて、代表特徴ベクトル計算部１４Ｃは、これらフロー特徴ベクトルＶｍｎを、例えばＬＢＧ（Linde-Buzo-Gray）＋ｓｐｌｉｔｔｉｎｇアルゴリズムなどのベクトル量子化処理を用いて、Ｋ個のクラスタにクラスタリングした後、これらクラスタｋごとに、当該クラスタに属するフロー特徴ベクトルＶｍｋｎから、例えば当該クラスタの中心値などからなる代表特徴ベクトルＶｍｋｎを計算する（ステップ２２５）。 Subsequently, the representative feature vector calculation unit 14C clusters these flow feature vectors Vmn into K clusters using a vector quantization process such as an LBG (Linde-Buzo-Gray) + splitting algorithm, for example. For each k, a representative feature vector Vmkn including, for example, the center value of the cluster is calculated from the flow feature vector Vmkn belonging to the cluster (step 225).

この後、代表特徴ベクトル計算部１４Ｃは、選択特徴種別Ｙｎについてフラグなどにより処理済みとし（ステップ２２６）、全ての特徴種別Ｙについて処理済みか確認する（ステップ２２７）。
ここで、未処理の特徴種別Ｙｎが存在する場合（ステップ２２７：ＮＯ）、ステップ２２２に戻って次の特徴種別Ｙｎの処理を繰り返し実行する。また、全ての特徴種別Ｙｎについて処理が終了している場合（ステップ２２７：ＹＥＳ）、代表特徴ベクトル計算部１４Ｃは、選択サービスカテゴリＸｍについてフラグなどにより処理済みとし（ステップ２２８）、全てのサービスカテゴリＸについて処理済みか確認する（ステップ２２９）。 Thereafter, the representative feature vector calculation unit 14C determines that the selected feature type Yn has been processed with a flag or the like (step 226), and checks whether all the feature types Y have been processed (step 227).
Here, when there is an unprocessed feature type Yn (step 227: NO), the process returns to step 222 to repeatedly execute the process of the next feature type Yn. If the processing has been completed for all the feature types Yn (step 227: YES), the representative feature vector calculation unit 14C has processed the selected service category Xm with a flag or the like (step 228), and all the service categories Whether X has been processed is checked (step 229).

ここで、未処理のサービスカテゴリＸｍが存在する場合（ステップ２２９：ＮＯ）、ステップ２２１に戻って次のサービスカテゴリＸｍの処理を繰り返し実行する。また、全てのサービスカテゴリＸｍについて処理が終了している場合（ステップ２２９：ＹＥＳ）、代表特徴ベクトル計算部１４Ｃは、サービスカテゴリＸｍ別で、各特徴種別Ｙｎに関する代表特徴ベクトルＶｍｎを特徴量ＤＢ１２へ登録し（ステップ２３０）、一連の代表特徴ベクトル計算処理を終了する。 If there is an unprocessed service category Xm (step 229: NO), the process returns to step 221 to repeatedly execute the process for the next service category Xm. If the processing has been completed for all service categories Xm (step 229: YES), the representative feature vector calculation unit 14C sends the representative feature vectors Vmn related to each feature type Yn to the feature amount DB 12 for each service category Xm. Registration is performed (step 230), and a series of representative feature vector calculation processing ends.

［第２の実施の形態の効果］
このように、本実施の形態によれば、各種データ通信サービスのうち、ＦＴＰとファイル転送や、メール受信とメール送信、あるいはチャットとメッセンジャーなど、ほぼ同様の機能を持つデータ通信サービスを提供する複数のアプリケーションが１つのサービスカテゴリに統合されるため、元々トラヒック特性が近いアプリケーションを、１つのサービスカテゴリとして分類することができ、通信網の設計・運用・管理において、極めて有用な分類結果を得ることができる。 [Effect of the second embodiment]
As described above, according to the present embodiment, among various data communication services, a plurality of data communication services having substantially similar functions such as FTP and file transfer, mail reception and mail transmission, or chat and messenger are provided. As applications are integrated into a single service category, applications that originally have similar traffic characteristics can be classified as a single service category, and extremely useful classification results can be obtained in communication network design, operation, and management. Can do.

また、本実施の形態では、サービスカテゴリ作成部１４Ｂで、２つのアプリケーションごとに、これらアプリケーションのフロー特徴量から、当該アプリケーション間のクラス内分散とクラス間分散との分散比を計算し、得られた分散比に応じてアプリケーションの統合可否を判定するようにしたので、極めて高い精度でアプリケーションを統合することができる。 In the present embodiment, the service category creation unit 14B obtains the distribution ratio between the intra-class distribution and the inter-class distribution between the applications from the flow feature quantities of these applications for each of the two applications. Since the application integration is determined according to the distribution ratio, the application can be integrated with extremely high accuracy.

また、本実施の形態では、代表特徴ベクトル計算部１４Ｃで、当該サービスカテゴリに分類されたアプリケーションのフロー特徴量をクラスタリングし、得られたクラスタごとに計算した、当該クラスタに属するフロー特徴量を代表する代表値を代表特徴量として用い、これら代表特徴量から代表特徴ベクトル計算部１４Ｄにより代表特徴ベクトルを計算するようにしたので、極めて高い精度で入力パケット列に含まれる各フローをサービスアプリケーションに分類できる。 In the present embodiment, the representative feature vector calculation unit 14C clusters the flow feature amounts of the applications classified into the service category, and represents the flow feature amount belonging to the cluster calculated for each obtained cluster. The representative feature vector is calculated from the representative feature quantity by the representative feature vector calculation unit 14D from the representative feature quantity, and each flow included in the input packet sequence is classified into service applications with extremely high accuracy. it can.

［実施の形態の拡張］
以上、実施形態を参照して本発明を説明したが、本発明は上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解しうる様々な変更をすることができる。 [Extended embodiment]
The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１０…フロー分類システム、１１…トラヒックデータ収集部、１１Ａ…フロー生成部、１１Ｂ…同時フロー生成部、１１Ｃ…短時間フロー生成部、１１Ｄ…特徴量計算部、１１Ｅ…特徴ベクトル計算部、１１Ｆ…トラヒック情報ＤＢ、１２…特徴量ＤＢ、１３…サービスカテゴリ分類部、１３Ａ〜１３Ｎ…類似度計算部、１３Ｐ…類似度統合部、１３Ｑ…分類判定部、１４…学習処理部、１４Ａ…教師データ記憶部、１４Ｂ…サービスカテゴリ作成部、１４Ｃ…代表特徴ベクトル計算部、２０…管理端末、３０…通信網、Ｆ…フロー、ＦＣ，ＦＣｉ…同時フロー、ＦＳ，ＦＳｉｊ…短時間フロー、ｔｆ…基準到着間隔、ｔｃ…基準開始間隔、ｔｓ…単位到着間隔、ＲＳ，ＲＳｉｊ，ＲＳｉｊｎ…フロー特徴量、Ｖ，Ｖｉｎｒ…フロー特徴ベクトル、Ｘ，Ｘｍ，Ｘｎｅｗ…サービスカテゴリ、Ｙ，Ｙｎ…特徴種別、Ｖ，Ｖｍｎ，Ｖｍｋｎ…代表特徴ベクトル、Ｄ，Ｄｋ…差分、Ｓ，Ｓｉｍ，Ｓｉｍｎ…類似度、Ｓｉｈ…最高類似度、Ｓｔｈ…有意しきい値、σｘ，σｙ…分散（フロー特徴量）、ωｘ，ωｙ…要素数（フロー特徴量）、ｍｘ，ｍｙ…平均値（フロー特徴量）、σｗ²…クラス内分散、σｂ²…クラス間分散、Ｊ…分散比、Ｊｔｈ…判定しきい値。 DESCRIPTION OF SYMBOLS 10 ... Flow classification system, 11 ... Traffic data collection part, 11A ... Flow generation part, 11B ... Simultaneous flow generation part, 11C ... Short-time flow generation part, 11D ... Feature-value calculation part, 11E ... Feature vector calculation part, 11F ... Traffic information DB, 12 ... Feature value DB, 13 ... Service category classification unit, 13A to 13N ... Similarity calculation unit, 13P ... Similarity integration unit, 13Q ... Classification determination unit, 14 ... Learning processing unit, 14A ... Teacher data storage , 14B ... service category creation unit, 14C ... representative feature vector calculation unit, 20 ... management terminal, 30 ... communication network, F ... flow, FC, FCi ... simultaneous flow, FS, FSij ... short-time flow, tf ... reference arrival Interval, tc ... reference start interval, ts ... unit arrival interval, RS, RSij, RSijn ... flow feature amount, V, Vinr ... flow feature vector X, Xm, Xnew ... service category, Y, Yn ... feature type, V, Vmn, Vmkn ... representative feature vector, D, Dk ... difference, S, Sim, Simn ... similarity, Sih ... highest similarity, Sth ... significant threshold, sigma] x, .sigma.y ... dispersion (flow characteristic quantity), ωx, ωy ... number of elements (flow characteristic quantity), mx, my ... mean (flow characteristic quantity), .sigma.w ² ... within-class variance, .sigma.b ² ... Inter-class variance, J ... dispersion ratio, Jth ... judgment threshold.

Claims

通信網から収集した入力パケット列に基づいて前記通信網上のトラヒックに含まれるフローごとに当該フローの特徴を示す特徴量を計算し、これら特徴量に基づいて前記フローをデータ通信サービスのサービスカテゴリごとに分類するフロー分類システムで用いられるフロー分類方法であって、
特徴量データベースが、前記サービスカテゴリごとに、当該サービスカテゴリに分類されるフローの特徴を示す代表特徴ベクトルを記憶する記憶ステップと、
トラヒックデータ収集部が、前記入力パケット列に含まれるパケットのうち、当該パケットから取得した分類用の識別情報が同一のパケットであって、かつ到着間隔が基準到着間隔以下である複数のパケットを、１つのフローとして統合し、これらフローのうち、当該フローに含まれるパケットの送信先ＩＰアドレスが同一のフローであって、かつフロー開始間隔が基準開始間隔以下である複数のフローを、１つの同時フローとして統合し、これら同時フローごとに、当該同時フローに含まれるパケットのうち、単位到着間隔内に到着した複数のパケットを、１つの短時間フローとして統合し、これら短時間フローごとに、当該短時間フローに含まれるパケットのパケット数、パケットサイズ、または到着間隔を統計処理することにより、当該短時間フローの特徴を示すフロー特徴量を計算し、これら短時間フローごとのフロー特徴量からなる時系列データから、前記同時フローごとに前記フロー特徴量の時間的変化の特徴を示すフロー特徴ベクトルを計算するトラヒックデータ収集ステップと、
分類処理部が、前記各同時フローについて、前記サービスカテゴリごとに、当該同時フローに関する前記フロー特徴ベクトルと当該サービスカテゴリの前記代表特徴ベクトルとに基づいて、当該同時フローと当該サービスカテゴリとの類似性を示す類似度を計算し、これら類似度のうち最も高い類似性を示す最高類似度が所定の有意範囲に含まれる場合には、当該同時フローを当該最高類似度が得られたサービスカテゴリに分類し、前記最高類似度が前記有意範囲に含まれない場合には、当該同時フローを新たなサービスカテゴリに分類する分類処理ステップと
を備えることを特徴とするフロー分類方法。 Based on the input packet sequence collected from the communication network, a feature amount indicating the characteristics of the flow is calculated for each flow included in the traffic on the communication network, and the flow is classified into a service category of the data communication service based on these feature amounts. A flow classification method used in a flow classification system for classifying
Storing a representative feature vector indicating a feature of a flow classified into the service category for each service category in the feature database;
A traffic data collection unit includes, among packets included in the input packet sequence, a plurality of packets whose classification identification information acquired from the packets is the same and whose arrival interval is equal to or less than a reference arrival interval. A plurality of flows that are integrated as a single flow and that have the same transmission destination IP address of packets included in the flow and whose flow start interval is equal to or less than the reference start interval are combined into one flow. For each of these simultaneous flows, a plurality of packets arriving within the unit arrival interval are integrated as one short-time flow for each of these simultaneous flows. By statistically processing the number of packets included in the short-time flow, the packet size, or the arrival interval, A flow feature vector indicating the characteristics of the temporal change of the flow feature value for each simultaneous flow is calculated from time series data consisting of the flow feature values for each short-time flow. A traffic data collection step for calculating
The classification processing unit, for each simultaneous flow, for each service category, the similarity between the simultaneous flow and the service category based on the flow feature vector related to the simultaneous flow and the representative feature vector of the service category If the highest similarity indicating the highest similarity among these similarities is included in the predetermined significance range, the simultaneous flow is classified into the service category that obtained the highest similarity. And a classification processing step of classifying the simultaneous flow into a new service category when the highest similarity is not included in the significant range.

請求項１に記載のフロー分類方法において、
前記トラヒックデータ収集ステップは、前記時系列データを線形予測分析を行うことにより前記同時フローのフロー特徴量を示す伝達関数の線形予測係数を求め、これら線形予測係数をケプストラム分析することにより、当該同時フローのフロー特徴量に関するスペクトル包絡特性を示すケプストラム係数を求め、これらケプストラム係数から前記フロー特徴ベクトルを生成する
ことを特徴とするフロー分類方法。 The flow classification method according to claim 1,
The traffic data collection step obtains linear prediction coefficients of a transfer function indicating a flow feature quantity of the simultaneous flow by performing linear prediction analysis on the time series data, and performs cepstrum analysis on the linear prediction coefficients to thereby calculate the simultaneous data. A flow classification method characterized by obtaining cepstrum coefficients indicating spectral envelope characteristics relating to flow feature quantities of a flow, and generating the flow feature vectors from these cepstrum coefficients.

請求項１または請求項２に記載のフロー分類方法において、
前記記憶ステップは、前記代表特徴ベクトルとして、当該サービスカテゴリに含まれる短時間フローのフロー特徴量をクラスタリングして得られたクラスタごとに、当該クラスタに含まれる前記フロー特徴量から計算した代表特徴ベクトルを記憶し、
前記分類処理ステップは、前記代表特徴ベクトルを構成する前記代表値ごとに、前記各フロー特徴ベクトルとの差分を求め、これら差分を統計処理することにより前記類似度を計算する
ことを特徴とするフロー分類方法。 In the flow classification method according to claim 1 or 2,
The storing step includes, as the representative feature vector, a representative feature vector calculated from the flow feature amount included in the cluster for each cluster obtained by clustering flow feature amounts of short-time flows included in the service category. Remember
In the classification processing step, for each of the representative values constituting the representative feature vector, a difference from each of the flow feature vectors is obtained, and the similarity is calculated by statistically processing these differences. Classification method.

請求項１〜請求項３のいずれか１つに記載のフロー分類方法において、
前記記憶ステップは、前記サービスカテゴリごとに、複数の異なる特徴種別のそれぞれについて前記代表特徴ベクトルを記憶し、
前記トラヒックデータ収集ステップは、前記短時間フローごとに、前記特徴種別のそれぞれについて前記フロー特徴ベクトルを計算し、
前記分類処理ステップは、前記同時フローと前記サービスカテゴリとの類似度に代えて、前記特徴種別ごとに種別類似度を計算し、これら種別類似度を統計処理することにより前記類似度を計算する
ことを特徴とするフロー分類方法。 In the flow classification method according to any one of claims 1 to 3,
The storing step stores the representative feature vector for each of a plurality of different feature types for each service category,
The traffic data collection step calculates the flow feature vector for each of the feature types for each short-time flow,
In the classification processing step, instead of the similarity between the simultaneous flow and the service category, a type similarity is calculated for each feature type, and the similarity is calculated by statistically processing the type similarity. A flow classification method characterized by

請求項１〜請求項４のいずれか１つに記載のフロー分類方法において、
サービスカテゴリ作成部が、対応するアプリケーションがパケットごとにそれぞれ既知である教師パケット列について、前記入力パケット列と同様にして生成した短時間フローごとに前記フロー特徴量を計算し、前記アプリケーションのうちから選択した２つの異なるアプリケーションごとに、これらアプリケーションに属する前記フロー特徴量の分散、平均値、および要素数に基づいて、当該アプリケーション間のクラス内分散とクラス間分散との分散比を計算し、得られた分散比が判定しきい値より小さい場合には、これらアプリケーションを同一サービスカテゴリに分類し、得られた分散比が判定しきい値以上の場合には、これらアプリケーションを別個のサービスカテゴリに分類することにより、サービスカテゴリをそれぞれ作成するサービスカテゴリ作成ステップと、
特徴量計算部が、前記サービスカテゴリ作成ステップで作成した前記サービスカテゴリごとに、当該サービスカテゴリに分類された前記アプリケーションに属する前記フロー特徴量から、当該サービスカテゴリに分類されるフローの特徴を示すフロー特徴量を計算して、前記特徴量データベースへ保存する特徴量計算ステップと
をさらに備えることを特徴とするフロー分類方法。 In the flow classification method according to any one of claims 1 to 4,
The service category creation unit calculates the flow feature amount for each short-time flow generated in the same manner as the input packet sequence, for the teacher packet sequence for which the corresponding application is known for each packet. For each of the two different applications selected, based on the variance, average value, and number of elements of the flow feature value belonging to these applications, the variance ratio between the intra-class variance and the inter-class variance between the applications is calculated and obtained. If the obtained distribution ratio is smaller than the decision threshold, these applications are classified into the same service category. If the obtained distribution ratio is equal to or greater than the decision threshold, these applications are classified into separate service categories. To create each service category. And the service category creation step,
For each service category created in the service category creation step by the feature quantity calculation unit, a flow indicating the characteristics of a flow classified into the service category from the flow feature quantity belonging to the application classified into the service category A flow classification method, further comprising: a feature amount calculation step of calculating a feature amount and storing the feature amount in the feature amount database.

請求項５に記載のフロー分類方法において、
前記特徴量計算ステップは、当該サービスカテゴリに分類された前記アプリケーションに属する前記フロー特徴量をクラスタリングし、得られたクラスタごとに、当該クラスタに属する前記フロー特徴量からなる時系列データに基づき、これらフロー特徴量の時間的変化の特徴を示す特徴ベクトルを計算し、得られた特徴ベクトルを当該サービスクラスの前記代表特徴ベクトルとして前記特徴量データベースへ保存することを特徴とするフロー分類方法。 The flow classification method according to claim 5,
The feature quantity calculating step clusters the flow feature quantities belonging to the application classified into the service category, and for each obtained cluster, based on time series data composed of the flow feature quantities belonging to the cluster, A flow classification method characterized in that a feature vector indicating a temporal change feature of a flow feature amount is calculated and the obtained feature vector is stored in the feature amount database as the representative feature vector of the service class.

通信網から収集した入力パケット列に基づいて前記通信網上のトラヒックに含まれるフローごとに当該フローの特徴を示す特徴量を計算し、これら特徴量に基づいて前記フローをデータ通信サービスのサービスカテゴリごとに分類するフロー分類システムであって、
前記サービスカテゴリごとに、当該サービスカテゴリに分類されるフローの特徴を示す代表特徴ベクトルを記憶する特徴量データベースと、
前記入力パケット列に含まれるパケットのうち、当該パケットから取得した分類用の識別情報が同一のパケットであって、かつ到着間隔が基準到着間隔以下である複数のパケットを、１つのフローとして統合し、これらフローのうち、当該フローに含まれるパケットの送信先ＩＰアドレスが同一のフローであって、かつフロー開始間隔が基準開始間隔以下である複数のフローを、１つの同時フローとして統合し、これら同時フローごとに、当該同時フローに含まれるパケットのうち、単位到着間隔内に到着した複数のパケットを、１つの短時間フローとして統合し、これら短時間フローごとに、当該短時間フローに含まれるパケットのパケット数、パケットサイズ、または到着間隔を統計処理することにより、当該短時間フローの特徴を示すフロー特徴量を計算し、これら短時間フローごとのフロー特徴量からなる時系列データから、前記同時フローごとに前記フロー特徴量の時間的変化の特徴を示すフロー特徴ベクトルを計算するトラヒックデータ収集部と、
前記各同時フローについて、前記サービスカテゴリごとに、当該同時フローに関する前記フロー特徴ベクトルと当該サービスカテゴリの前記代表特徴ベクトルとに基づいて、当該同時フローと当該サービスカテゴリとの類似性を示す類似度を計算し、これら類似度のうち最も高い類似性を示す最高類似度が所定の有意範囲に含まれる場合には、当該同時フローを当該最高類似度が得られたサービスカテゴリに分類し、前記最高類似度が前記有意範囲に含まれない場合には、当該同時フローを新たなサービスカテゴリに分類する分類処理部と
を備えることを特徴とするフロー分類システム。 Based on the input packet sequence collected from the communication network, a feature amount indicating the characteristics of the flow is calculated for each flow included in the traffic on the communication network, and the flow is classified into a service category of the data communication service based on these feature amounts. A flow classification system for classifying
For each service category, a feature amount database that stores a representative feature vector indicating features of flows classified into the service category;
Among the packets included in the input packet sequence, a plurality of packets having the same classification identification information acquired from the packets and having an arrival interval equal to or less than the reference arrival interval are integrated as one flow. Of these flows, a plurality of flows having the same transmission destination IP address of packets included in the flow and having a flow start interval equal to or less than the reference start interval are integrated as one simultaneous flow, For each simultaneous flow, among the packets included in the simultaneous flow, a plurality of packets that arrive within the unit arrival interval are integrated as one short-time flow, and each short-time flow is included in the short-time flow. Shows the characteristics of the short-time flow by statistically processing the number of packets, packet size, or arrival interval. A traffic data collection unit that calculates a low feature amount and calculates a flow feature vector indicating a feature of temporal change of the flow feature amount for each simultaneous flow from time-series data including the flow feature amount for each short-time flow. When,
For each of the simultaneous flows, for each service category, based on the flow feature vector related to the simultaneous flow and the representative feature vector of the service category, a similarity indicating the similarity between the simultaneous flow and the service category is obtained. If the highest similarity that shows the highest similarity among these similarities is included in the predetermined significance range, the simultaneous flow is classified into the service category from which the highest similarity was obtained, and the highest similarity And a classification processing unit that classifies the simultaneous flow into a new service category when the degree is not included in the significant range.

請求項７に記載のフロー分類システムにおいて、
前記トラヒックデータ収集部は、前記時系列データを線形予測分析を行うことにより前記同時フローのフロー特徴量を示す伝達関数の線形予測係数を求め、これら線形予測係数をケプストラム分析することにより、当該同時フローのフロー特徴量に関するスペクトル包絡特性を示すケプストラム係数を求め、これらケプストラム係数から前記フロー特徴ベクトルを生成する
ことを特徴とするフロー分類システム。 The flow classification system according to claim 7, wherein
The traffic data collection unit obtains a linear prediction coefficient of a transfer function indicating a flow feature quantity of the simultaneous flow by performing a linear prediction analysis on the time series data, and performs a cepstrum analysis on the linear prediction coefficient to thereby obtain the simultaneous prediction coefficient. A flow classification system characterized in that a cepstrum coefficient indicating a spectral envelope characteristic related to a flow feature quantity of a flow is obtained, and the flow feature vector is generated from the cepstrum coefficient.

コンピュータに、請求項１〜請求項６のいずれか１つに記載したフロー分類方法の各ステップを実行させるためのプログラム。 The program for making a computer perform each step of the flow classification | category method as described in any one of Claims 1-6.