JP4398907B2

JP4398907B2 - Feature sequence pattern finding device and method of operating feature sequence pattern finding device

Info

Publication number: JP4398907B2
Application number: JP2005188453A
Authority: JP
Inventors: 茂明櫻井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2005-06-28
Filing date: 2005-06-28
Publication date: 2010-01-13
Anticipated expiration: 2025-06-28
Also published as: JP2007011488A

Description

本発明は、コンピュータ上に時間を追って蓄積される系列データ、例えば、小売り分野における日々の売上げデータ及び業務報告を記載した日報、健康管理分野における日々の血圧・脈拍等の生体データ及び個人の行動を記録した行動記録、金融分野における日々の株価データ及び新聞等に記載されているニュース等といった系列データに内在する特徴的な系列パターンを発見し、利用者の意思決定を支援するための装置及びその作動方法に関するものである。 The present invention relates to serial data accumulated over time on a computer, for example, daily reports describing daily sales data and business reports in the retail field, biological data such as daily blood pressure and pulse in the health management field, and individual behavior. For detecting characteristic sequence patterns inherent in the sequence data such as action records recorded, daily stock price data in the financial field, news etc. described in newspapers, etc., and for supporting decision making by users, and It relates to its operating method.

ＧＳＰ（Generalized Sequential Patterns）アルゴリズムでは、多数の要素から構成される系列データを入力とし、その系列データ集合の中において頻出する系列パターンを発見することができる。しかしながら、本手法において発見される頻出系列パターンは、分析者にとっては既知の系列パターンである場合が多く、必ずしも分析者に新たな知見を与えることができない。また、少ない頻度を指定して頻出系列パターンを発見する場合には、多数の頻出系列パターンを発見することになるため、すべての頻出系列パターンを発見するのに多くの時間が必要となるばかりか、新たな知見を与える特徴的な系列パターンが多数の頻出系列パターンの中に埋もれてしまう危険性がある。 In the GSP (Generalized Sequential Patterns) algorithm, sequence data composed of a large number of elements is input, and sequence patterns that frequently appear in the sequence data set can be found. However, the frequent sequence pattern discovered by this method is often a sequence pattern known to the analyst, and new knowledge cannot always be given to the analyst. In addition, when frequent sequences are found by specifying a low frequency, a large number of frequent sequences are discovered, so that not only frequent time is required to discover all frequent sequences. There is a risk that a characteristic sequence pattern giving new knowledge is buried in a large number of frequent sequence patterns.

下記特許文献１に記載の「意外性に基づく状態列パターンの評価装置」では、系列パターンに対して意外性を定義することにより、系列パターンの中から特徴的な系列パターンを発見することができる。しかしながら、本装置においては、意外性の有無の判定のための候補系列パターンが発見されていることを前提としており、意外性のある系列パターンを発見するには上記ＧＳＰなどの手法を利用して予め候補となる系列パターンを発見しなければならない。また、候補となる系列パターンの頻度の増減と意外性の値の増減との間には単調な関係が存在しないことから、低頻度の系列パターンをも候補として発見しなければ、すべての意外性のある系列パターンを発見することができない。このため、候補となる系列パターンを発見するのに多くの時間が必要である。 In the “apparatus for evaluating state sequence pattern based on unexpectedness” described in Patent Document 1 described below, a characteristic sequence pattern can be found from the sequence pattern by defining the unexpectedness for the sequence pattern. . However, in this apparatus, it is premised that a candidate sequence pattern for determining the presence or absence of an unexpectedness has been discovered, and in order to discover an unexpected sequence pattern, a technique such as GSP is used. A candidate sequence pattern must be found in advance. In addition, since there is no monotonous relationship between the increase / decrease in the frequency of candidate series patterns and the increase / decrease in the value of unexpectedness, all unexpectedness must be found unless low-frequency series patterns are also detected as candidates. A certain series pattern cannot be found. For this reason, it takes a lot of time to find candidate sequence patterns.

下記特許文献２に記載の「イベントデータに関する情報管理装置」では、対象となる問題領域において、時系列的に発生するイベントを利用することにより、問題領域における事例の重要度を更新し、現在の時間に合った重要度の高い事例を抽出することができる。しかしながら、本装置においては、問題領域に対応する事例の抽出の際に、時間を勘案して事例を抽出しているだけであり、時系列的なパターンを発見することはできない。
特開２００４−１７８５１５号公報特開２００２−２０７７５５号公報 In the “information management apparatus related to event data” described in Patent Document 2 below, the importance of the case in the problem area is updated by using the events that occur in time series in the target problem area. Cases with high importance that match the time can be extracted. However, in this apparatus, when extracting the case corresponding to the problem area, only the case is extracted in consideration of time, and a time-series pattern cannot be found.
JP 2004-178515 A JP 2002-207755 A

近年、時間情報及び属性情報が付随したイベントを簡便に収集・蓄積できる環境が整備されており、これらのイベントデータを分析し、人間の意思決定に役立てたいとのニーズが高まっている。 In recent years, an environment in which events accompanied by time information and attribute information can be easily collected and accumulated has been established, and there is an increasing need to analyze these event data and use them for human decision making.

イベントデータを分析する従来の手法では、まず、個々のイベントに付随する時間情報や属性情報に基づいてイベントをグループ化し、系列データを生成する。次に、この系列データの集合から、系列データ集合中に頻繁に現れる部分系列を頻出系列パターンとして抽出する。この頻出系列パターンは、与えられた系列データ集合を代表するパターンになっているものの、分析者にとっては、ありふれたパターンである場合が多い。このため、分析者に新たな知見を与える特徴的な系列パターンを発見するには、発見された頻出系列パターンの中から、他の基準（例えば信頼度）に基づいて特徴的な系列パターンを発見する必要がある。 In the conventional method of analyzing event data, first, events are grouped based on time information and attribute information associated with each event to generate series data. Next, a partial series that frequently appears in the series data set is extracted as a frequent series pattern from the series data set. Although this frequent series pattern is a pattern representative of a given series data set, it is often a common pattern for analysts. For this reason, in order to discover a characteristic sequence pattern that gives the analyst new knowledge, a characteristic sequence pattern is found based on other criteria (for example, reliability) from the frequent sequence patterns that have been discovered. There is a need to.

ここで、「頻出」と判定する基準を高くすることにより最初に発見される頻出系列パターンの数を少なくし過ぎると、特徴的な系列パターンが見落とされる可能性があり、一方、「頻出」と判定する基準を低くし過ぎると、多数の頻出系列パターンの中に特徴的な系列パターンが埋もれてしまう可能性があるという問題点がある。したがって、このようなトレードオフの問題点を解決し、特徴的な系列パターンを効率よく発見するための新たな手法の確立が望まれている。 Here, if the number of frequent sequence patterns first discovered by increasing the criterion for determining “frequent” is too small, characteristic sequence patterns may be overlooked, while “frequent” If the criterion for determination is too low, there is a problem that a characteristic sequence pattern may be buried in a large number of frequent sequence patterns. Therefore, it is desired to establish a new method for solving such a trade-off problem and efficiently discovering a characteristic sequence pattern.

本発明は、かかる事情を考慮してなされたものであり、コンピュータ上に時間を追って蓄積される系列データの中から、出現頻度がそれ程多くないとしても、利用者にとって興味が高いと考えられるような特徴的な系列パターンを発見できる装置及びその作動方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and seems to be of high interest to the user even if the appearance frequency is not so high from the series data accumulated over time on the computer. An object of the present invention is to provide an apparatus capable of finding a unique characteristic sequence pattern and a method of operating the same.

本発明の一観点に係る特徴系列パターン発見装置は、複数のイベントからなる系列データを格納する系列データ格納部と、既に発見された特徴系列パターンを格納する特徴系列パターン格納部と、前記特徴系列パターン格納部に格納されている特徴系列パターンの組において一致するイベント又はイベント集合に対し、前記系列データ格納部から取り出したイベント又はイベント集合を加えることにより候補系列パターンを生成する候補系列パターン生成部と、前記系列データ格納部に格納された系列データのうち前記候補系列パターンを包含する系列データの個数に相当する、前記候補系列パターンの頻度を計算する系列パターン頻度計算部と、前記特徴系列パターンにおける部分系列パターンの頻度を格納する部分系列パターン頻度格納部と、前記系列パターン頻度計算部により計算された候補系列パターンの頻度及び前記部分系列パターン頻度格納部に格納されている部分系列パターンの頻度から、より多くのイベントを含む候補系列パターンに対して単調に減少する評価値を与える評価式に従い、前記候補系列パターンの評価値を計算する候補系列パターン評価部と、前記評価値が閾値を超えるか否かを判定する候補系列パターン判定部と、を具備し、前記閾値を超える候補系列パターンを新たな特徴系列パターンとして前記特徴系列パターン格納部に格納することを具備する特徴系列パターン発見装置である。 A feature sequence pattern finding apparatus according to an aspect of the present invention includes a sequence data storage unit that stores sequence data including a plurality of events, a feature sequence pattern storage unit that stores feature sequence patterns that have already been discovered, and the feature sequence. A candidate sequence pattern generation unit that generates a candidate sequence pattern by adding an event or event set extracted from the sequence data storage unit to an event or event set that matches in a set of feature sequence patterns stored in the pattern storage unit A sequence pattern frequency calculation unit for calculating the frequency of the candidate sequence pattern corresponding to the number of sequence data including the candidate sequence pattern among the sequence data stored in the sequence data storage unit, and the feature sequence pattern Subsequence pattern frequency storage for storing the frequency of subsequence patterns in And the frequency of the candidate sequence pattern calculated by the sequence pattern frequency calculation unit and the frequency of the partial sequence pattern stored in the partial sequence pattern frequency storage unit are monotonous with respect to the candidate sequence pattern including more events. A candidate sequence pattern evaluation unit that calculates an evaluation value of the candidate sequence pattern according to an evaluation formula that gives an evaluation value that decreases, and a candidate sequence pattern determination unit that determines whether or not the evaluation value exceeds a threshold value And a feature sequence pattern finding device comprising storing a candidate sequence pattern exceeding the threshold as a new feature sequence pattern in the feature sequence pattern storage unit.

本発明によれば、コンピュータ上に時間を追って蓄積される系列データの中から、出現頻度がそれ程多くないとしても、利用者にとって興味が高いと考えられるような特徴的な系列パターンを効率的に発見できる装置及びその作動方法を提供することができる。 According to the present invention, it is possible to efficiently generate a characteristic sequence pattern that is considered to be of high interest to the user even if the appearance frequency is not so high among sequence data accumulated over time on a computer. A device that can be discovered and a method of operating the same can be provided.

以下、図面を参照しながら本発明の実施形態を説明する。図１は、本発明の一実施形態に係る特徴系列パターン発見装置を示すブロック図である。本装置は、コンピュータ上に時間を追って蓄積される系列データ、例えば、小売り分野における日々の売上げデータ及び業務報告を記載した日報、健康管理分野における日々の血圧・脈拍等の生体データ及び個人の行動を記録した行動記録、金融分野における日々の株価データ及び新聞等に記載されているニュース等といった系列データに内在する特徴的な系列パターンを発見し、利用者の意思決定を支援するための装置に関する。同図に示されるように、本装置は系列データ格納部Ｂ１と、候補系列パターン生成部Ｂ２と、系列パターン頻度計算部Ｂ３と、部分系列パターン頻度格納部Ｂ４と、候補系列パターン評価部Ｂ５と、候補系列パターン判定部Ｂ６と、特徴系列パターン格納部Ｂ７とにより構成されている。本発明は、コンピュータをこのような構成の特徴系列パターン発見装置として機能させるプログラムとして実施することができる。この場合、本発明に係るプログラムは、コンピュータ内のプログラム記憶装置に格納される。プログラム記憶装置は、例えば不揮発性半導体記憶装置や磁気ディスク装置等からなる。上記プログラムが図示しないＣＰＵからの制御でランダムアクセスメモリ（ＲＡＭ）に読み込まれ、同ＣＰＵにより実行されることにより、コンピュータを本発明に係る特徴系列パターン発見装置として機能させることができる。なお、このコンピュータには、各種コンピュータ資源を管理し、グラフィカルユーザインタフェース（ＧＵＩ）等を提供するオペレーティングシステムも導入されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a feature sequence pattern finding apparatus according to an embodiment of the present invention. This device is a series of data that is accumulated on a computer over time, for example, daily reports describing daily sales data and business reports in the retail field, biological data such as daily blood pressure and pulse in the health management field, and individual behavior. The present invention relates to a device for supporting a user's decision making by discovering characteristic sequence patterns inherent in the sequence data such as action records that record data, daily stock price data in the financial field and news described in newspapers, etc. . As shown in the figure, this apparatus includes a sequence data storage unit B1, a candidate sequence pattern generation unit B2, a sequence pattern frequency calculation unit B3, a partial sequence pattern frequency storage unit B4, and a candidate sequence pattern evaluation unit B5. The candidate sequence pattern determination unit B6 and the feature sequence pattern storage unit B7. The present invention can be implemented as a program that causes a computer to function as a feature sequence pattern finding apparatus having such a configuration. In this case, the program according to the present invention is stored in a program storage device in the computer. The program storage device is composed of, for example, a nonvolatile semiconductor storage device or a magnetic disk device. The above program is read into a random access memory (RAM) under the control of a CPU (not shown) and executed by the CPU, whereby the computer can function as a feature sequence pattern finding device according to the present invention. Note that an operating system that manages various computer resources and provides a graphical user interface (GUI) or the like is also installed in this computer.

以下、図２、図３、図４に示すフローチャートに沿って、本実施形態に係る特徴系列パターン発見装置の処理の流れを説明する。 Hereinafter, the flow of processing of the feature sequence pattern finding apparatus according to the present embodiment will be described with reference to the flowcharts shown in FIGS.

本実施形態に係る特徴系列パターン発見装置の系列データ格納部Ｂ１には、例えば図５に示すような系列データが格納されているものとする。ここでいう「系列データ」とは、複数のイベントからなるデータをいい、より具体的には、同一の時間帯に起こるイベントをまとめたイベント集合が時間順に並べられたデータのことである。図５に示す系列データにおいて、系列データの各行を構成するイベントＩＤａ〜ｅは図６に示すイベント名に対応している。また、系列データＩＤＤ２の例の場合、ａ、（ｂｅ）、ｄ、ｂの４つの要素から１つの系列データ（セグメント）が構成されており、２番目の系列データの要素は、同一の時間帯に発生するふたつのイベントｂ、ｅから構成されている。すなわち、ＩＤにより識別される１つの系列データ（セグメント）は１つまたは複数の要素からなり、該要素は１つまたは複数のイベントからなる。 It is assumed that the sequence data storage unit B1 of the feature sequence pattern finding apparatus according to the present embodiment stores sequence data as shown in FIG. “Sequence data” here refers to data composed of a plurality of events, and more specifically, data in which event sets in which events occurring in the same time zone are arranged in time order. In the series data shown in FIG. 5, event IDs a to e constituting each row of the series data correspond to the event names shown in FIG. In the case of the series data ID D2, one series data (segment) is composed of four elements a, (be), d, and b, and the elements of the second series data are the same time. It consists of two events b and e that occur in the band. That is, one series data (segment) identified by the ID is composed of one or a plurality of elements, and the elements are composed of one or a plurality of events.

図２乃至図４に沿って本装置の処理の流れを説明するのに先立って、「系列データ」以外の他の用語についても定義しておく。 Prior to explaining the processing flow of the present apparatus along FIGS. 2 to 4, terms other than “series data” are also defined.

・「系列パターン」とはイベント集合が時間順に並んだものをいい、系列データの中から抽出されるものとする。 -“Sequence pattern” means a set of events arranged in time order, and is extracted from the series data.

・「系列データが系列パターンを包含する」とは、次の条件が成り立つことをいう。すなわち、系列データをｅｄ１，ｅｄ２，…，ｅｄｍとし、系列パターンをｅｐ１，ｅｐ２，…，ｅｐｎとした場合、ｅｐｋ⊆ｅｄｉｋ，（ｋ＝１，２，…，ｎ），０＜ｉ１＜ｉ２＜…＜ｉｎとなる整数列｛ｉ１，ｉ２，…，ｉｎ｝が存在するという条件である。ただし、ｅｄｉ及びｅｐｊはイベント集合を表すものとする。 “The series data includes the series pattern” means that the following condition is satisfied. That is, when the sequence data is ed1, ed2,..., Edm and the sequence pattern is ep1, ep2,..., Epn, epk⊆edik, (k = 1, 2,..., N), 0 <i1 <i2 < ... the condition that there is an integer sequence {i1, i2, ..., in} that becomes <in. However, edi and epj represent event sets.

・系列パターンｐ１が系列パターンｐ２を包含する場合、系列パターンｐ２を系列パターンｐ１の「部分系列パターン」という。 When the sequence pattern p1 includes the sequence pattern p2, the sequence pattern p2 is referred to as a “partial sequence pattern” of the sequence pattern p1.

・「系列パターンの頻度」とは系列パターンを包含する系列データの個数をいう。 “Frequency of sequence pattern” means the number of sequence data including the sequence pattern.

先ず図２に示すようにステップＳ１では、系列データ格納部Ｂ１に格納されている系列データの中から、候補系列パターン生成部Ｂ２が順にひとつの系列データを読み込む。 First, as shown in FIG. 2, in step S1, the candidate sequence pattern generation unit B2 sequentially reads one sequence data from the sequence data stored in the sequence data storage unit B1.

ステップＳ２では、系列データ格納部から読み出す系列データが存在するかどうかを候補系列パターン生成部Ｂ２で判定し、存在しない場合にはステップＳ４に進む。一方、存在する場合にはステップＳ３に進む。図５の系列データの場合、６個の系列データが存在しているので、６回目まではステップＳ３に進み、７回目にステップＳ４に進むことになる。 In step S2, the candidate sequence pattern generation unit B2 determines whether or not there is sequence data to be read from the sequence data storage unit. If there is no sequence data, the process proceeds to step S4. On the other hand, if it exists, the process proceeds to step S3. In the case of the series data in FIG. 5, since there are six series data, the process proceeds to step S3 until the sixth time, and proceeds to step S4 for the seventh time.

ステップＳ３では、候補系列パターン生成部Ｂ２が読み込んだひとつの系列データをイベントに分解し、重複するイベントを取り除いてバッファ等に記憶しておく。図５の例の場合には、当該ステップの６回の実行により、図６に示す５つのイベントが抽出される。 In step S3, one series data read by the candidate series pattern generation unit B2 is decomposed into events, and duplicate events are removed and stored in a buffer or the like. In the case of the example of FIG. 5, five events shown in FIG. 6 are extracted by executing the step six times.

ステップＳ４では、候補系列パターン生成部Ｂ２が上記バッファからイベントを順に取り出す。ここで、取り出すことのできるイベントがあるかどうかを候補系列パターン生成部Ｂ２が判定し、取り出すイベントがない場合にはステップＳ８に進む。一方、取り出すイベントがある場合には、ステップＳ５に進む。図６の場合、５つのイベントが存在するので、５回目まではステップＳ５に進み、６回目にステップＳ８に進むことになる。 In step S4, the candidate sequence pattern generation unit B2 sequentially extracts events from the buffer. Here, the candidate sequence pattern generation unit B2 determines whether there is an event that can be extracted, and if there is no event to be extracted, the process proceeds to step S8. On the other hand, if there is an event to be extracted, the process proceeds to step S5. In the case of FIG. 6, since five events exist, the process proceeds to step S5 up to the fifth time, and proceeds to step S8 for the sixth time.

ステップＳ５では、まず、取り出されたイベントの頻度を系列パターン頻度計算部Ｂ３が計算する。図５の系列データの場合、イベントＩＤがａとなるイベントは、すべての系列データに包含されているので、イベントＩＤａの頻度は６と計算される。同様な計算を各回において実施することにより、イベントＩＤｂ，ｃ，ｄ，ｅに対応した頻度は、５，３，４，３と計算される。 In step S5, first, the sequence pattern frequency calculation unit B3 calculates the frequency of the extracted event. In the case of the series data in FIG. 5, the event with the event ID “a” is included in all the series data, so the frequency of the event ID “a” is calculated as 6. By performing the same calculation at each time, the frequencies corresponding to the event IDs b, c, d, and e are calculated as 5, 3, 4, and 3, respectively.

次に、候補系列パターン評価部Ｂ５は次の式（１）に基づいてイベントの興味度（評価値）を計算する。

Next, the candidate series pattern evaluation unit B5 calculates the interest level (evaluation value) of the event based on the following equation (1).

ただし、ｓを系列パターン、ｓｐを系列パターンｓの部分系列パターン、ｆｓ（）を系列パターン又は部分系列パターンの頻度、Ｎを系列データの個数とする。また、イベントは系列パターンに含まれるイベントの個数が１となる場合であり、ｓ＝ｓｐとなる。一方、式（１）のように系列パターンの評価値を計算することにより、系列パターンの興味度をそのすべての部分系列パターンの興味度以下にすることができる。 Here, s is a series pattern, sp is a partial series pattern of the series pattern s, fs () is the frequency of the series pattern or partial series pattern, and N is the number of series data. An event is a case where the number of events included in the sequence pattern is 1, and s = sp. On the other hand, by calculating the evaluation value of the sequence pattern as in Expression (1), the interest level of the sequence pattern can be made lower than the interest level of all the partial sequence patterns.

図５の系列データに対して、イベントＩＤａに対応する興味度を計算した場合、その値は次のようになる。

When the degree of interest corresponding to the event ID a is calculated for the series data in FIG. 5, the value is as follows.

また、イベントＩＤｂに対応する興味度の値は次のとおり計算される。

Further, the value of the degree of interest corresponding to the event ID b is calculated as follows.

同様に、イベントＩＤｃ，ｄ，ｅの興味度は図７に示すように計算することができる。 Similarly, the degree of interest of the event IDs c, d, e can be calculated as shown in FIG.

ステップＳ６では、候補系列パターン判定部Ｂ６がイベントの興味度の値と予め設定されている最小興味度（Ｔｈ１；閾値）の値とを比較し、最小興味度以上であればステップＳ８に進む。一方、最小興味度未満である場合にはステップＳ４に戻る。図５の系列データに対する最小興味度を０．２と設定した場合、すべてのイベントの興味度は０．２以上となるので、すべてのイベントについてステップＳ８に進むことになる。 In step S6, the candidate sequence pattern determination unit B6 compares the event interest level value with a preset minimum interest level (Th1; threshold) value, and if it is equal to or greater than the minimum interest level, the process proceeds to step S8. On the other hand, if it is less than the minimum interest level, the process returns to step S4. When the minimum degree of interest for the series data in FIG. 5 is set to 0.2, the degree of interest for all events is 0.2 or more, and therefore, the process proceeds to step S8 for all events.

ステップＳ７において、候補系列パターン判定部Ｂ６は、最小興味度以上となったイベント及びその頻度を、１次特徴イベント集合及びその最大頻度として特徴系列パターン格納部Ｂ７及び部分系列パターン頻度格納部Ｂ４に格納する。図５の系列データに対する最小興味度を０．２と設定した場合、図７のイベントの列及び最大頻度の列は、１次系列パターン及び最大頻度として特徴系列パターン格納部Ｂ７及び部分系列パターン頻度格納部Ｂ４に格納される。 In step S7, the candidate sequence pattern determination unit B6 assigns the event having the minimum interest level or higher and its frequency to the feature sequence pattern storage unit B7 and the partial sequence pattern frequency storage unit B4 as the primary feature event set and its maximum frequency. Store. When the minimum degree of interest for the sequence data in FIG. 5 is set to 0.2, the event sequence and the maximum frequency column in FIG. 7 are the primary sequence pattern and the maximum frequency as the feature sequence pattern storage unit B7 and the partial sequence pattern frequency. Stored in the storage unit B4.

次に図３に示すように、ステップＳ８では、特徴系列パターン格納部Ｂ７に格納されている（Ｌ−１）次特徴イベント集合の中から、イベント集合の前方（Ｌ−２）個のイベントが一致しているふたつのイベント集合を候補系列パターン生成部Ｂ２が抽出する。だだし、イベント集合は指定された順序（例えば、イベントＩＤの辞書順）で並んでいるものとする。（Ｌ−１）次特徴イベント集合とは（Ｌ−１）個のイベントによって構成された特徴的なイベントの集合のことである。次に、候補系列パターン生成部Ｂ２は抽出したふたつの（Ｌ−１）次特徴イベント集合を組み合わせることにより、Ｌ次イベント集合を生成する。 Next, as shown in FIG. 3, in step S8, the (L-2) next feature event set stored in the feature sequence pattern storage unit B7 includes (L-2) events ahead of the event set. Candidate sequence pattern generation unit B2 extracts two matching event sets. However, it is assumed that the event sets are arranged in a specified order (for example, an event ID dictionary order). The (L-1) next feature event set is a set of characteristic events constituted by (L-1) events. Next, the candidate sequence pattern generation unit B2 generates an L-th order event set by combining the two extracted (L-1) -order feature event sets.

すなわち、共通する（Ｌ−２）個のイベントに、（Ｌ−１）次特徴イベント集合に残った各１個のイベントを加えることにより、Ｌ次イベント集合が生成される。このとき、Ｌ次イベント集合は指定された順序によって並べ替えておくことにする。この生成されたＬ次イベント集合を「Ｌ次特徴イベント集合候補」という。 That is, the L-th event set is generated by adding each one event remaining in the (L-1) next-feature event set to the common (L-2) events. At this time, the L-order event set is rearranged in the specified order. This generated L-th order event set is referred to as “L-th feature event set candidate”.

例えば、１次特徴イベント集合から２次特徴イベント集合候補を生成する場合を考えてみる。このとき、候補系列パターン生成部Ｂ２は１次特徴イベント集合として、ａ，ｂを初めに選択する。ただし、Ｌ＝２の場合においては、イベント集合の前方に（Ｌ−２）個のイベントが存在していないので、本条件は考慮せずにイベント集合を抽出する。次に、このふたつのイベント集合から２次のイベント集合（ａｂ）を生成する。ただし、イベント集合はイベントＩＤのアルファベット順に並べられることとする。 For example, consider a case where a secondary feature event set candidate is generated from a primary feature event set. At this time, the candidate sequence pattern generation unit B2 first selects a and b as the primary feature event set. However, in the case of L = 2, since (L-2) events do not exist ahead of the event set, the event set is extracted without considering this condition. Next, a secondary event set (ab) is generated from the two event sets. However, event sets are arranged in alphabetical order of event IDs.

同様に、２次特徴イベント集合から３次特徴イベント集合候補を生成する場合を考えてみる。詳細については後述するが、図５の例の場合、（ｂｃ）、（ｂｅ）が２次特徴イベント集合となる。当該２次特徴イベント集合の場合、前方に存在する１個のイベント集合がｂと共通しているため、当該イベント集合から３次特徴イベント集合候補を生成することができる。すなわち、（ｂｃｅ）といったイベント集合が３次特徴イベント集合候補として生成される。 Similarly, consider a case where a tertiary feature event set candidate is generated from a secondary feature event set. Although details will be described later, in the example of FIG. 5, (bc) and (be) are secondary feature event sets. In the case of the secondary feature event set, since one event set existing ahead is common with b, a tertiary feature event set candidate can be generated from the event set. That is, an event set such as (bce) is generated as a tertiary feature event set candidate.

ステップＳ９では、候補系列パターン生成部Ｂ２が生成するＬ次特徴イベント集合が存在するかどうかを判断し、存在しない場合にはステップＳ１３に進む。一方、存在する場合には、ステップＳ１０に進む。図５の例における２次特徴イベント集合候補の生成の場合、２次特徴イベント集合候補は図８の２次イベント集合の列に示す１０個存在するので、１０回目まではステップＳ１０に進み。１１回目にステップＳ１３に進むことになる。 In step S9, it is determined whether or not an L-th feature event set generated by the candidate sequence pattern generation unit B2 exists. If not, the process proceeds to step S13. On the other hand, when it exists, it progresses to step S10. In the case of generating the secondary feature event set candidate in the example of FIG. 5, since there are ten secondary feature event set candidates shown in the column of the secondary event set of FIG. 8, the process proceeds to step S10 up to the tenth time. The process proceeds to step S13 for the eleventh time.

ステップＳ１０では、生成されたＬ次特徴イベント集合候補の頻度を系列パターン頻度計算部Ｂ３が計算する。図５の例における２次特徴イベント集合候補の生成の場合、各イベント集合の頻度は図８の頻度の列のように与えられる。また、図５の例における３次特徴イベント集合候補の生成の場合、各イベント集合の頻度は図９の頻度の列のように与えられる。 In step S10, the sequence pattern frequency calculation unit B3 calculates the frequency of the generated L-th order feature event set candidate. In the case of generating secondary feature event set candidates in the example of FIG. 5, the frequency of each event set is given as in the frequency column of FIG. Further, in the case of generating the third feature event set candidate in the example of FIG. 5, the frequency of each event set is given as in the frequency column of FIG.

次に、候補系列パターン評価部Ｂ５は、系列パターン頻度計算部Ｂ３で計算された頻度及び特徴系列パターン格納部Ｂ７に格納されている（Ｌ−１）次特徴イベント集合の最大頻度を式（１）に適用することにより、生成されたＬ次特徴イベント集合候補についての興味度を計算する。例として、２次特徴イベント集合（ｂｃ）の興味度を計算することを考えてみる。このとき、ｂ，ｃの最大頻度が５，３と与えられており、（ｂｃ）の頻度が３と与えられるので、興味度は次のように与えられる。

Next, the candidate sequence pattern evaluation unit B5 calculates the frequency calculated by the sequence pattern frequency calculation unit B3 and the maximum frequency of the next feature event set stored in the feature sequence pattern storage unit B7 (1). ) To calculate the degree of interest for the generated L-th order feature event set candidate. As an example, consider calculating the degree of interest of a secondary feature event set (bc). At this time, since the maximum frequency of b and c is given as 5 and 3, and the frequency of (bc) is given as 3, the degree of interest is given as follows.

同様に、３次特徴イベント集合（ｂｃｅ）の興味度を計算することを考えてみる。このとき、後のステップＳ１２において説明する最大頻度の計算方法によれば、（ｂｃ）、（ｂｅ）の最大頻度は５，５と与えられるので、興味度は

Similarly, consider calculating the degree of interest of a tertiary feature event set (bce). At this time, according to the maximum frequency calculation method described in step S12 later, the maximum frequencies of (bc) and (be) are given as 5 and 5, so the degree of interest is

と与えられる。上記の計算例において、２次の場合における（ｂｃ）、３次の場合におけるｂ，ｃ，ｅ，（ｃｅ），（ｂｃｅ）に対応する部分系列パターンの頻度を評価していないことに注意する必要がある。このような評価が可能であるのは、系列パターンに対応する最大頻度の設定方法に関連した性質を利用しているためであり、ステップＳ１２において最大頻度を設定する際に理由を説明する。 And given. Note that in the above calculation example, the frequency of the partial sequence pattern corresponding to (bc) in the second-order case and b, c, e, (ce), (bce) in the third-order case is not evaluated. There is a need. Such evaluation is possible because the property related to the setting method of the maximum frequency corresponding to the sequence pattern is used, and the reason will be described when setting the maximum frequency in step S12.

ステップＳ１１では、候補系列パターン判定部Ｂ６がＬ次特徴イベント集合に対応する興味度が最小興味度以上であるかを判定し、最小興味度以上である場合にステップＳ１２に進む。一方、最小興味度未満の場合にはステップＳ８に戻る。２次特徴イベント集合候補の判定の場合、（ｂｃ），（ｂｅ）の場合に最小興味度０．２以上となるので、ステップＳ１２に進み、（ａｂ），（ａｃ），（ａｄ），（ｂｄ），（ｃｄ），（ｃｅ），（ｄｅ）の場合にステップＳ８に戻ることになる。 In step S11, the candidate series pattern determination unit B6 determines whether the degree of interest corresponding to the L-th order feature event set is equal to or greater than the minimum degree of interest. If the degree of interest is equal to or greater than the minimum degree of interest, the process proceeds to step S12. On the other hand, if the degree of interest is less than the minimum interest level, the process returns to step S8. In the case of the determination of the secondary feature event set candidate, since the minimum interest degree is 0.2 or more in the case of (bc) and (be), the process proceeds to step S12, and (ab), (ac), (ad), ( In the case of bd), (cd), (ce), (de), the process returns to step S8.

ステップＳ１２では、興味度が最小興味度以上となるＬ次特徴イベント集合候補を、候補系列パターン判定部Ｂ６がＬ次特徴イベント集合として特徴系列パターン格納部Ｂ７に格納する。また、対応する最大頻度として、Ｌ次特徴イベント集合を生成する元になったふたつの（Ｌ−１）次特徴イベント集合の最大頻度の値を当該Ｌ次特徴イベント集合の最大頻度として部分系列パターン頻度格納部Ｂ４に格納する。系列パターンの頻度は、その部分系列パターンの頻度以下になるといった性質が存在する。このため、部分系列パターンの頻度の逆数を考えた場合には、より短い部分系列パターンに対応した逆数の中に最小値が存在する。したがって、最も短い部分系列パターンであるイベントに対応した逆数の中に最小値が存在する。このため、系列パターンを構成するイベントの頻度の最大値（最大頻度）を部分系列パターンの頻度として部分系列パターン頻度格納部Ｂ４に記憶しておくことにより、候補系列パターン評価部Ｂ５は、Ｌ次特徴イベント集合を構成するのに利用した（Ｌ−１）次イベント集合のふたつの最大頻度だけを評価して、興味度を計算することができる。 In step S12, the candidate sequence pattern determination unit B6 stores the L-th feature event set candidate having an interest level equal to or greater than the minimum interest level in the feature sequence pattern storage unit B7 as the L-th feature event set. In addition, as the corresponding maximum frequency, the partial sequence pattern is obtained by setting the value of the maximum frequency of the two (L-1) -th feature event sets from which the L-th feature event set is generated as the maximum frequency of the L-th feature event set. Store in the frequency storage unit B4. There is a property that the frequency of the sequence pattern is equal to or less than the frequency of the partial sequence pattern. For this reason, when the reciprocal of the frequency of the partial sequence pattern is considered, the minimum value exists in the reciprocal corresponding to the shorter partial sequence pattern. Therefore, there is a minimum value in the reciprocal corresponding to the event that is the shortest partial sequence pattern. For this reason, by storing the maximum value (maximum frequency) of the events constituting the sequence pattern in the partial sequence pattern frequency storage unit B4 as the frequency of the partial sequence pattern, the candidate sequence pattern evaluation unit B5 can perform the Lth order. The degree of interest can be calculated by evaluating only the two maximum frequencies of the (L-1) next event set used to construct the feature event set.

このステップＳ１２において、候補系列パターン生成部Ｂ２は、特徴系列パターン格納部Ｂ７に格納されている特徴イベント集合をすべて併合することにより１次特徴系列パターンを生成する。図５の例の場合、図７に記述されている５個の１次特徴イベント集合及び図８に最大頻度が与えられて記述されている２個の２次イベント集合が１次特徴系列パターンとなる。 In step S12, the candidate sequence pattern generation unit B2 generates a primary feature sequence pattern by merging all feature event sets stored in the feature sequence pattern storage unit B7. In the case of the example in FIG. 5, the five primary feature event sets described in FIG. 7 and the two secondary event sets described in FIG. Become.

次に図４に示すように、ステップＳ１４では、候補系列パターン生成部Ｂ２が系列の前方の（Ｌ−２）個のイベント集合が一致するふたつの（Ｌ−１）次系列パターンを抽出する。また、その取り出した順序を考慮して、抽出した系列パターンからＬ次特徴系列パターン候補を生成する。例として、１次系列パターンａ，（ｂｅ）が順次取り出されて、このふたつのパターンから２次特徴系列パターン候補を生成する場合を考えてみる。ただし、２次特徴系列パターン候補の生成の場合には、前方の（Ｌ−２）個のイベント集合は存在しないので当該条件は適用されていない。このとき、当該１次特徴系列パターンからａ（ｂｅ）といった２次特徴系列パターンを生成することができる。また、前方の１個のイベント集合が共通している２次系列パターンａ（ｂｅ），ａｂが順次取り出されて、このふたつのパターンから３次特徴系列パターン候補を生成する場合を考えてみる。ここで、当該のふたつの特徴系列パターンにおいては、前方の１個のイベント集合ａが共通しているため、ａ（ｂｅ）ｂといった３次の特徴系列パターン候補を生成することができる。 Next, as shown in FIG. 4, in step S14, the candidate sequence pattern generation unit B2 extracts two (L-1) next sequence patterns that match the (L-2) event sets ahead of the sequence. Further, in consideration of the extracted order, an L-th feature sequence pattern candidate is generated from the extracted sequence pattern. As an example, consider a case where primary sequence patterns a and (be) are sequentially extracted and secondary feature sequence pattern candidates are generated from the two patterns. However, in the case of generating a secondary feature series pattern candidate, there is no forward (L-2) event set, so the condition is not applied. At this time, a secondary feature sequence pattern such as a (be) can be generated from the primary feature sequence pattern. Also, consider a case where secondary sequence patterns a (be) and ab having a common one event set are sequentially extracted and a tertiary feature sequence pattern candidate is generated from the two patterns. Here, in the two feature series patterns, since one forward event set a is common, a tertiary feature series pattern candidate such as a (be) b can be generated.

ステップ１５では、候補系列パターン生成部Ｂ２がＬ次特徴系列パターン候補が生成できたかどうかを判定し、生成できなかった場合にステップＳ１９に進む。一方、生成できた場合にステップＳ１６に進む。 In step 15, the candidate sequence pattern generation unit B <b> 2 determines whether or not an L-th feature sequence pattern candidate has been generated. On the other hand, if it can be generated, the process proceeds to step S16.

ステップＳ１６では、系列パターン頻度計算部Ｂ３が当該特徴系列パターン候補の頻度を計算する。図５の例の場合、２次、３次、４次の各系列パターン候補に対して図１０、図１１、図１２の頻度の列に示す値が頻度として計算される。 In step S16, the sequence pattern frequency calculation unit B3 calculates the frequency of the feature sequence pattern candidate. In the case of the example in FIG. 5, the values shown in the frequency columns of FIGS. 10, 11, and 12 are calculated as frequencies for the second, third, and fourth series pattern candidates.

ただし、各図においては、後に計算する興味度が最小興味度以上の興味度をもつ系列パターン候補に対してのみ、頻度を計算した結果を示している。 However, in each figure, the result of calculating the frequency is shown only for the sequence pattern candidates whose interest degree to be calculated later is greater than or equal to the minimum interest degree.

また、候補系列パターン評価部Ｂ５は、系列パターン頻度計算部Ｂ３により計算されたＬ次特徴系列パターン候補及び特徴系列パターン格納部Ｂ７に格納されている、対応する最大頻度を式（１）に適用することにより興味度を計算する。例として、２次特徴系列パターン候補ａ（ｂｅ）の興味度を計算してみると、興味度は、

Further, the candidate sequence pattern evaluation unit B5 applies the corresponding maximum frequency stored in the L-order feature sequence pattern candidate and the feature sequence pattern storage unit B7 calculated by the sequence pattern frequency calculation unit B3 to Equation (1). To calculate the degree of interest. As an example, when calculating the degree of interest of the secondary feature series pattern candidate a (be), the degree of interest is

と与えられる。 And given.

また、３次特徴系列パターン候補ａ（ｂｅ）ｂの興味度は、

Also, the degree of interest of the tertiary feature series pattern candidate a (be) b is

と計算される。同様に、２次、３次、４次の各系列パターン候補に対して図１０、図１１、図１２に示すように興味度の値が計算される。ただし、興味度が最小興味度以上になる特徴系列パターン候補に対してのみ結果を示している。上記の計算において、Ｌ次特徴系列パターン候補の生成の際に利用した最大頻度に基づいて興味度を計算できる理由は、ステップＳ１２で説明したことと同様の理由による。 Is calculated. Similarly, an interest value is calculated for each of the second, third, and fourth series pattern candidates as shown in FIGS. However, the results are shown only for feature series pattern candidates whose interest level is equal to or greater than the minimum interest level. In the above calculation, the reason why the degree of interest can be calculated based on the maximum frequency used in generating the L-th feature sequence pattern candidate is the same as described in step S12.

ステップＳ１７では、候補系列パターン判定部Ｂ６がＬ次特徴系列パターン候補の興味度が最小興味度以上であるかどうかを判定し、最小興味度以上の場合にステップＳ１８に進む。一方、最小興味度未満の場合には、ステップＳ１４に戻る。例えば、３次特徴系列パターン候補ａｂｂの場合には、最小興味度以上となるためステップＳ１８に進み、図１２に示す４次特徴系列パターン候補ａｂｂｂの場合には最小興味度未満となるためステップＳ１４に戻る。 In step S17, the candidate sequence pattern determination unit B6 determines whether the interest degree of the L-th feature sequence pattern candidate is equal to or greater than the minimum interest level, and proceeds to step S18 if it is equal to or greater than the minimum interest level. On the other hand, if the degree of interest is less than the minimum interest level, the process returns to step S14. For example, in the case of a tertiary feature series pattern candidate abb, the interest level is equal to or higher than the minimum interest level, so the process proceeds to step S18. In the case of the quaternary feature series pattern candidate abbb shown in FIG. Return to.

ステップＳ１８では、候補系列パターン判定部Ｂ６が興味度以上となるＬ次特徴系列パターン候補を、Ｌ次特徴系列パターンとして特徴系列パターン格納部Ｂ７に格納する。また、当該特徴系列パターンを生成する際に利用したふたつの（Ｌ−１）次特徴系列パターンの最大頻度の最大値を当該パターンに対する最大頻度として、部分系列パターン頻度格納部Ｂ４に格納する。図５の例の場合、図１０、図１１の最大頻度の列に記載されている値が、対応する最大頻度として格納される。 In step S18, the candidate sequence pattern determination unit B6 stores the L-th feature sequence pattern candidate having an interest level or higher as the L-order feature sequence pattern in the feature sequence pattern storage unit B7. Further, the maximum value of the maximum frequency of the two (L-1) next feature sequence patterns used when generating the feature sequence pattern is stored in the partial sequence pattern frequency storage unit B4 as the maximum frequency for the pattern. In the case of the example of FIG. 5, the values described in the maximum frequency column of FIGS. 10 and 11 are stored as the corresponding maximum frequency.

ステップＳ１５では、特徴系列パターン格納部Ｂ７にひとつ以上のＬ次特徴系列パターン候補が格納されているかどうかを候補系列パターン生成部Ｂ２が判定する。このとき、Ｌ次特徴系列パターン候補がひとつ以上格納されている場合には、ステップＳ１４に進み、ひとつも格納されていない場合には本フローの処理を終了する。図５の例の場合、４次特徴系列パターン候補を生成した段階でひとつも４次特徴系列パターンを発見することができないので、本フローの処理を終了する。 In step S15, the candidate sequence pattern generation unit B2 determines whether one or more L-th feature sequence pattern candidates are stored in the feature sequence pattern storage unit B7. At this time, if one or more L-th feature sequence pattern candidates are stored, the process proceeds to step S14, and if none is stored, the process of this flow is terminated. In the case of the example of FIG. 5, since no quaternary feature series pattern can be found at the stage where the quaternary feature series pattern candidates are generated, the processing of this flow is terminated.

ステップＳ１９では、系列を延伸可能であれば、候補系列パターン生成部Ｂ２が系列のサイズを１増やして、ステップＳ１４に戻る。 In step S19, if the sequence can be extended, the candidate sequence pattern generation unit B2 increases the size of the sequence by 1 and returns to step S14.

最終的には、図７に示される全てのイベント、図８に示される（ｂｃ）と（ｂｅ）、図１０および図１１に示される全てのパターンが、特徴系列パターンとして抽出される。 Finally, all events shown in FIG. 7, (bc) and (be) shown in FIG. 8, and all patterns shown in FIGS. 10 and 11 are extracted as feature series patterns.

以上説明したように、本実施形態によれば、分析者に新たな知見を与えるような有用で特徴的な系列パターンを取りこぼすことなく発見することができる。また、系列パターンの評価値を系列パターンに含まれるイベントに対して単調減少するように定義していることから、特徴系列パターンに含まれるすべての部分系列パターンの評価値は当該特徴系列パターンの評価値以上となり、すべての部分系列パターンが特徴系列パターンとなる。したがって、部分系列パターンが特徴系列パターンにならない系列パターンを候補系列パターンとして評価する必要がなくなり、効率的に特徴系列パターンを発見することができる。 As described above, according to the present embodiment, it is possible to discover a useful and characteristic sequence pattern that gives a new knowledge to the analyst without missing it. Since the evaluation value of the sequence pattern is defined so as to monotonously decrease with respect to the event included in the sequence pattern, the evaluation values of all the partial sequence patterns included in the feature sequence pattern are the evaluation values of the feature sequence pattern. All the partial series patterns become the characteristic series patterns. Therefore, it is not necessary to evaluate a sequence pattern whose partial sequence pattern does not become a feature sequence pattern as a candidate sequence pattern, and a feature sequence pattern can be efficiently found.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

例えば、特徴系列パターンの格納において、系列サイズごとに抽出されたすべての特徴系列パターンを系列パターン格納部Ｂ７に格納することにしているが、より長い系列サイズに含まれる特徴系列パターンは特徴系列パターン格納部Ｂ７から削除することにしてもよい。 For example, in storing feature sequence patterns, all feature sequence patterns extracted for each sequence size are stored in the sequence pattern storage unit B7, but feature sequence patterns included in longer sequence sizes are feature sequence patterns. You may delete from storage part B7.

また、Ｌ次特徴系列パターンの頻度を計算するにあたって、その元になった（Ｌ−１）次特徴系列パターンが含まれる系列データを記憶しておき、系列データの部分集合にアクセスすることによりＬ次特徴系列パターン候補の頻度を計算してもよい。 Further, when calculating the frequency of the L-order feature sequence pattern, the sequence data including the (L-1) -order feature sequence pattern that is the basis thereof is stored, and a subset of the sequence data is accessed to store L The frequency of the next feature series pattern candidate may be calculated.

また、分析者が興味のある部分系列パターンを指定し、当該部分系列パターンを含む特徴系列パターンだけを最終的に抽出するようにしてもよい。 Alternatively, the partial sequence pattern in which the analyst is interested may be designated, and only the feature sequence pattern including the partial sequence pattern may be finally extracted.

さらに、上記した式（１）は、下記の式（２）のように変形することができる。この場合、系列パターンの頻度が比較的高く（式（２）の第２項に相当）、系列パターンに含まれる特定のイベントとその系列パターンの間に高い関連性（式（２）の第１項に相当）がある系列パターンを特徴的な系列パターンとして発見することができる。

Furthermore, the above-described equation (1) can be modified as the following equation (2). In this case, the frequency of the sequence pattern is relatively high (corresponding to the second term of the equation (2)), and a high degree of association between the specific event included in the sequence pattern and the sequence pattern (the first of the equation (2)) It is possible to find a certain series pattern as a characteristic series pattern.

本発明の一実施形態に係る特徴系列パターン発見装置を示すブロック図The block diagram which shows the feature series pattern discovery apparatus which concerns on one Embodiment of this invention. 上記特徴系列パターン発見装置が実行する頻出イベント抽出手順を示すフローチャートThe flowchart which shows the frequent event extraction procedure which the said characteristic series pattern discovery apparatus performs 上記特徴系列パターン発見装置が実行する特徴イベント集合抽出手順を示すフローチャートThe flowchart which shows the feature event set extraction procedure which the said feature series pattern discovery apparatus performs 上記特徴系列パターン発見装置が実行する特徴イベントパターン抽出手順を示すフローチャートThe flowchart which shows the feature event pattern extraction procedure which the said feature series pattern discovery apparatus performs 系列データ集合の一例を示す図Diagram showing an example of a series data set イベント名とイベントＩＤとの対応を示す図Diagram showing correspondence between event name and event ID 図５の系列データから取り出される１次特徴イベント集合候補と、その頻度、興味度、および最大頻度との関係を示す図The figure which shows the relationship between the primary characteristic event set candidate taken out from the series data of FIG. 5, its frequency, an interest degree, and the maximum frequency. 図５の系列データから取り出される２次特徴イベント集合候補と、その頻度、興味度、および最大頻度との関係を示す図The figure which shows the relationship between the secondary feature event set candidate taken out from the series data of FIG. 5, its frequency, an interest degree, and the maximum frequency. 図５の系列データから取り出される興味度が０．２以上となる２次特徴イベント集合から生成された、３次特徴イベント集合候補と、その頻度、興味度、および最大頻度との関係を示す図The figure which shows the relationship between the tertiary feature event set candidate produced | generated from the secondary feature event set from which the interest degree taken out from the series data of FIG. 5 becomes 0.2 or more, its frequency, interest degree, and maximum frequency 図５の系列データから取り出される興味度が０．２以上となる２次系列パターンと、その頻度、興味度、および最大頻度との関係を示す図The figure which shows the relationship between the secondary series pattern from which the interest degree taken out from the series data of FIG. 5 becomes 0.2 or more, its frequency, an interest degree, and the maximum frequency 図５の系列データから取り出される興味度が０．２以上となる３次系列パターンと、その頻度、興味度、および最大頻度との関係を示す図The figure which shows the relationship between the tertiary series pattern from which the degree of interest taken out from the series data of FIG. 5 becomes 0.2 or more, the frequency, the degree of interest, and the maximum frequency 図５の系列データから取り出される興味度が０．２以上となる３次系列パターンから生成された、４次系列パターン候補と、その頻度、興味度、および最大頻度との関係を示す図The figure which shows the relationship between the quaternary series pattern candidate produced | generated from the tertiary series pattern from which the interest degree taken out from the series data of FIG. 5 becomes 0.2 or more, its frequency, interest degree, and maximum frequency

符号の説明Explanation of symbols

Ｂ１…系列データ格納部；
Ｂ２…候補系列パターン生成部；
Ｂ３…系列パターン頻度計算部；
Ｂ４…部分系列パターン頻度格納部；
Ｂ５…候補系列パターン評価部；
Ｂ６…候補系列パターン判定部；
Ｂ７…特徴系列パターン格納部 B1 ... series data storage unit;
B2 ... Candidate sequence pattern generation unit;
B3: Sequence pattern frequency calculation unit;
B4 ... Partial sequence pattern frequency storage unit;
B5 ... Candidate sequence pattern evaluation unit;
B6 ... Candidate sequence pattern determination unit;
B7: Feature sequence pattern storage unit

Claims

複数のイベントからなる系列データを格納する系列データ格納部と、
既に発見された特徴系列パターンを格納する特徴系列パターン格納部と、
前記特徴系列パターン格納部に格納されている特徴系列パターンの組において一致するイベント又はイベント集合に対し、前記系列データ格納部から取り出したイベント又はイベント集合を加えることにより候補系列パターンを生成する候補系列パターン生成部と、
前記系列データ格納部に格納された系列データのうち前記候補系列パターンを包含する系列データの個数を前記候補系列パターンの頻度として計算する系列パターン頻度計算部と、
前記特徴系列パターンにおける部分系列パターンの最大頻度を格納する部分系列パターン頻度格納部と、
前記系列パターン頻度計算部により計算された候補系列パターンの頻度及び前記部分系列パターン頻度格納部に格納されている部分系列パターンの最大頻度に基づいて計算される評価値であって、より多くのイベントを含む候補系列パターンに対して単調に減少する評価値を与える評価式に従い、前記候補系列パターンの評価値を計算する候補系列パターン評価部と、
前記評価値が閾値を超えるか否かを判定し、前記閾値を超える候補系列パターンの頻度を前記部分系列パターンの最大頻度として前記部分系列パターン頻度格納部に格納する候補系列パターン判定部と、を具備し、
前記候補系列パターン評価部は、ｓを系列パターン、ｓｐを系列パターンｓの部分系列パターン、ｆｓ（）を系列パターンの頻度、Ｎを系列データの個数とするとき、前記評価値に相当する興味度を次式（１）すなわち

に従って算出し、
前記閾値を超える候補系列パターンを新たな特徴系列パターンとして前記特徴系列パターン格納部に格納することを具備する特徴系列パターン発見装置。 A series data storage unit for storing series data consisting of a plurality of events;
A feature sequence pattern storage unit for storing already discovered feature sequence patterns;
A candidate sequence for generating a candidate sequence pattern by adding an event or event set extracted from the sequence data storage unit to an event or event set that matches in a set of feature sequence patterns stored in the feature sequence pattern storage unit A pattern generator,
A sequence pattern frequency calculating unit that calculates the number of sequence data including the candidate sequence pattern among the sequence data stored in the sequence data storage unit as the frequency of the candidate sequence pattern;
A partial sequence pattern frequency storage unit for storing the maximum frequency of the partial sequence pattern in the feature sequence pattern;
An evaluation value calculated based on the frequency of the candidate sequence pattern calculated by the sequence pattern frequency calculation unit and the maximum frequency of the partial sequence pattern stored in the partial sequence pattern frequency storage unit, and more events A candidate series pattern evaluation unit that calculates an evaluation value of the candidate series pattern according to an evaluation formula that gives a monotonously decreasing evaluation value for a candidate series pattern including:
Determining whether or not the evaluation value exceeds a threshold, and storing the frequency of the candidate sequence pattern exceeding the threshold in the partial sequence pattern frequency storage unit as the maximum frequency of the partial sequence pattern ; and Equipped,
The candidate sequence pattern evaluation unit, when s is a sequence pattern, sp is a partial sequence pattern of the sequence pattern s, fs () is a sequence pattern frequency, and N is the number of sequence data, the degree of interest corresponding to the evaluation value Is expressed by the following equation (1),

According to
A feature sequence pattern finding apparatus comprising: storing a candidate sequence pattern exceeding the threshold as a new feature sequence pattern in the feature sequence pattern storage unit.

複数のイベントからなる系列データを格納する系列データ格納部と、
既に発見された特徴系列パターンを格納する特徴系列パターン格納部と、
前記特徴系列パターン格納部に格納されている特徴系列パターンの組において一致するイベント又はイベント集合に対し、前記系列データ格納部から取り出したイベント又はイベント集合を加えることにより候補系列パターンを生成する候補系列パターン生成部と、
前記系列データ格納部に格納された系列データのうち前記候補系列パターンを包含する系列データの個数を前記候補系列パターンの頻度として計算する系列パターン頻度計算部と、
前記特徴系列パターンにおける部分系列パターンの最大頻度を格納する部分系列パターン頻度格納部と、
前記系列パターン頻度計算部により計算された候補系列パターンの頻度及び前記部分系列パターン頻度格納部に格納されている部分系列パターンの最大頻度に基づいて計算される評価値であって、より多くのイベントを含む候補系列パターンに対して単調に減少する評価値を与える評価式に従い、前記候補系列パターンの評価値を計算する候補系列パターン評価部と、
前記評価値が閾値を超えるか否かを判定し、前記閾値を超える候補系列パターンの頻度を前記部分系列パターンの最大頻度として前記部分系列パターン頻度格納部に格納する候補系列パターン判定部と、を具備し、
前記候補系列パターン評価部は、ｓを系列パターン、ｓｐを系列パターンｓの部分系列パターン、ｆｓ（）を系列パターンの頻度、Ｎを系列データの個数とするとき、前記評価値に相当する興味度を次式（２）すなわち

に従って算出し、
前記閾値を超える候補系列パターンを新たな特徴系列パターンとして前記特徴系列パターン格納部に格納することを具備する特徴系列パターン発見装置。 A series data storage unit for storing series data consisting of a plurality of events;
A feature sequence pattern storage unit for storing already discovered feature sequence patterns;
A candidate sequence for generating a candidate sequence pattern by adding an event or event set extracted from the sequence data storage unit to an event or event set that matches in a set of feature sequence patterns stored in the feature sequence pattern storage unit A pattern generator,
A sequence pattern frequency calculating unit that calculates the number of sequence data including the candidate sequence pattern among the sequence data stored in the sequence data storage unit as the frequency of the candidate sequence pattern;
A partial sequence pattern frequency storage unit for storing the maximum frequency of the partial sequence pattern in the feature sequence pattern;
An evaluation value calculated based on the frequency of the candidate sequence pattern calculated by the sequence pattern frequency calculation unit and the maximum frequency of the partial sequence pattern stored in the partial sequence pattern frequency storage unit, and more events A candidate series pattern evaluation unit that calculates an evaluation value of the candidate series pattern according to an evaluation formula that gives a monotonously decreasing evaluation value for a candidate series pattern including:
Determining whether or not the evaluation value exceeds a threshold, and storing the frequency of the candidate sequence pattern exceeding the threshold in the partial sequence pattern frequency storage unit as the maximum frequency of the partial sequence pattern; and Equipped,
The candidate sequence pattern evaluation unit has a degree of interest corresponding to the evaluation value when s is a sequence pattern, sp is a partial sequence pattern of the sequence pattern s, fs () is the frequency of the sequence pattern, and N is the number of sequence data. With the following equation (2)

前記特徴系列パターンは、前記系列データから抽出され、時間順に並んだイベント集合からなる請求項１又は２に記載の特徴系列パターン発見装置。 The feature sequence pattern, the extracted from the time series data, temporal feature sequence pattern mining apparatus according to claim 1 or 2 consisting of an event set are arranged in this order.

系列データ格納部に格納された複数のイベントからなる系列データと特徴系列パターン格納部に格納され、既に発見された特徴系列パターンとから新たな特徴系列パターンを発見する特徴系列パターン発見装置の作動方法であって、
候補系列パターン生成部が、前記特徴系列パターン格納部に格納されている特徴系列パターンの組において一致するイベント又はイベント集合に対し、前記系列データ格納部から取り出したイベント又はイベント集合を加えることにより候補系列パターンを生成するステップと、
前記系列データ格納部に格納された系列データのうち前記候補系列パターンを包含する系列データの個数を前記候補系列パターンの頻度として系列パターン頻度計算部が計算するステップと、
前記特徴系列パターンにおける部分系列パターンの最大頻度を部分系列パターン頻度格納部が格納するステップと、
前記系列パターン頻度計算部により計算された候補系列パターンの頻度及び前記部分系列パターン頻度格納部に格納されている部分系列パターンの最大頻度に基づいて計算される評価値であって、より多くのイベントを含む候補系列パターンに対して単調に減少する評価値を与える評価式に従い、候補系列パターン評価部が前記候補系列パターンの評価値を計算するステップと、
前記評価値が閾値を超えるか否かを候補系列パターン判定部が判定し、前記閾値を超える候補系列パターンの頻度を前記部分系列パターンの最大頻度として前記部分系列パターン頻度格納部に格納するステップとを具備し、
前記候補系列パターン評価部は、ｓを系列パターン、ｓｐを系列パターンｓの部分系列パターン、ｆｓ（）を系列パターンの頻度、Ｎを系列データの個数とするとき、前記評価値に相当する興味度を次式（３）すなわち

に従って算出し、
前記閾値を超える候補系列パターンが前記新たな特徴系列パターンとして前記特徴系列パターン格納部に格納されることを特徴とする特徴系列パターン発見装置の作動方法。 Method of operating a feature sequence pattern finding apparatus for discovering a new feature sequence pattern from sequence data consisting of a plurality of events stored in a sequence data storage unit and a feature sequence pattern stored in a feature sequence pattern storage unit Because
The candidate sequence pattern generation unit adds the event or event set extracted from the sequence data storage unit to the matching event or event set in the feature sequence pattern set stored in the feature sequence pattern storage unit. Generating a sequence pattern;
A sequence pattern frequency calculating unit that calculates the number of sequence data including the candidate sequence pattern among the sequence data stored in the sequence data storage unit as the frequency of the candidate sequence pattern;
The partial sequence pattern frequency storage unit stores the maximum frequency of the partial sequence pattern in the feature sequence pattern;
An evaluation value calculated based on the frequency of the candidate sequence pattern calculated by the sequence pattern frequency calculation unit and the maximum frequency of the partial sequence pattern stored in the partial sequence pattern frequency storage unit, and more events A candidate series pattern evaluation unit calculates an evaluation value of the candidate series pattern according to an evaluation formula that gives a monotonically decreasing evaluation value for a candidate series pattern including:
A candidate series pattern determining unit determining whether or not the evaluation value exceeds a threshold, and storing the frequency of the candidate series pattern exceeding the threshold as the maximum frequency of the partial series pattern in the partial series pattern frequency storage unit ; Comprising
The candidate sequence pattern evaluation unit, when s is a sequence pattern, sp is a partial sequence pattern of the sequence pattern s, fs () is a sequence pattern frequency, and N is the number of sequence data, the degree of interest corresponding to the evaluation value With the following equation (3),

According to
A method of operating a feature sequence pattern finding apparatus, wherein a candidate sequence pattern exceeding the threshold value is stored in the feature sequence pattern storage unit as the new feature sequence pattern.

系列データ格納部に格納された複数のイベントからなる系列データと特徴系列パターン格納部に格納され、既に発見された特徴系列パターンとから新たな特徴系列パターンを発見する特徴系列パターン発見装置の作動方法であって、Method of operating a feature sequence pattern finding apparatus for discovering a new feature sequence pattern from sequence data consisting of a plurality of events stored in a sequence data storage unit and a feature sequence pattern stored in a feature sequence pattern storage unit Because
候補系列パターン生成部が、前記特徴系列パターン格納部に格納されている特徴系列パターンの組において一致するイベント又はイベント集合に対し、前記系列データ格納部から取り出したイベント又はイベント集合を加えることにより候補系列パターンを生成するステップと、The candidate sequence pattern generation unit adds the event or event set extracted from the sequence data storage unit to the matching event or event set in the feature sequence pattern set stored in the feature sequence pattern storage unit. Generating a sequence pattern;
前記系列データ格納部に格納された系列データのうち前記候補系列パターンを包含する系列データの個数に相当する、を前記候補系列パターンの頻度をとして系列パターン頻度計算部が計算するステップと、A sequence pattern frequency calculating unit calculating the frequency of the candidate sequence pattern as a frequency of the candidate sequence pattern, which corresponds to the number of sequence data including the candidate sequence pattern among the sequence data stored in the sequence data storage unit;
前記特徴系列パターンにおける部分系列パターンの最大頻度を部分系列パターン頻度格納部が格納するステップと、The partial sequence pattern frequency storage unit stores the maximum frequency of the partial sequence pattern in the feature sequence pattern;
前記系列パターン頻度計算部により計算された候補系列パターンの頻度及び前記部分系列パターン頻度格納部に格納されている部分系列パターンの最大頻度からに基づいて計算される評価値であって、より多くのイベントを含む候補系列パターンに対して単調に減少する評価値を与える評価式に従い、候補系列パターン評価部が前記候補系列パターンの評価値を計算するステップと、An evaluation value calculated based on the frequency of the candidate sequence pattern calculated by the sequence pattern frequency calculation unit and the maximum frequency of the partial sequence pattern stored in the partial sequence pattern frequency storage unit, and more In accordance with an evaluation formula that gives a monotonously decreasing evaluation value for a candidate series pattern including an event, the candidate series pattern evaluation unit calculates the evaluation value of the candidate series pattern;
前記評価値が閾値を超えるか否かを候補系列パターン判定部が判定し、前記閾値を超える候補系列パターンの頻度を前記部分系列パターンの最大頻度として前記部分系列パターン頻度格納部に格納するステップとを具備し、A candidate series pattern determining unit determining whether or not the evaluation value exceeds a threshold, and storing the frequency of the candidate series pattern exceeding the threshold as the maximum frequency of the partial series pattern in the partial series pattern frequency storage unit; Comprising
前記候補系列パターン評価部は、ｓを系列パターン、ｓｐを系列パターンｓの部分系列パターン、ｆｓ（）を系列パターンの頻度、Ｎを系列データの個数とするとき、前記評価値に相当する興味度を次式（４）すなわちThe candidate sequence pattern evaluation unit, when s is a sequence pattern, sp is a partial sequence pattern of the sequence pattern s, fs () is a sequence pattern frequency, and N is the number of sequence data, the degree of interest corresponding to the evaluation value With the following equation (4)

に従って算出し、According to
前記閾値を超える候補系列パターンが前記新たな特徴系列パターンとして前記特徴系列パターン格納部に格納されることを特徴とする特徴系列パターン発見装置の作動方法。A method of operating a feature sequence pattern finding apparatus, wherein a candidate sequence pattern exceeding the threshold value is stored in the feature sequence pattern storage unit as the new feature sequence pattern.