JP2014164618A

JP2014164618A - Frequent pattern extraction device, frequent pattern extraction method, and program

Info

Publication number: JP2014164618A
Application number: JP2013036332A
Authority: JP
Inventors: Takayuki Kawabata; 貴幸川端
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-02-26
Filing date: 2013-02-26
Publication date: 2014-09-08

Abstract

PROBLEM TO BE SOLVED: To extract a versatile workflow when a plurality of users cooperatively work by operating a plurality of files.SOLUTION: A frequent pattern extraction device comprises: a center cluster extraction unit 3273 which extracts a center file cluster that becomes a center of a workflow; a subordinate cluster identification unit 3275 which identifies subordinate file clusters included in the same workflow as the center file cluster; a sequence extraction unit 3276 which extracts a set of operation sequences related to the center file cluster and the subordinate file clusters for each user on the basis of operation histories concerning files of the center file cluster and files of the subordinate file cluster; and a frequent pattern extraction unit 3277 which extracts a frequent pattern as a workflow on the basis of the set of operation sequences extracted by the sequence extraction unit 3276.

Description

本発明は、複数のアイテムにおける各アイテムに対する各ユーザの操作履歴から時系列の頻出パターンを抽出する頻出パターン抽出装置及び頻出パターン抽出方法、並びに、当該頻出パターン抽出方法をコンピュータに実行させるためのプログラムに関する。 The present invention relates to a frequent pattern extraction device and a frequent pattern extraction method for extracting a time-series frequent pattern from an operation history of each user for each item in a plurality of items, and a program for causing a computer to execute the frequent pattern extraction method About.

従来から、ユーザのアイテム操作履歴を解析して、特徴的な頻出パターンを抽出し、その抽出したパターンを利用して、ユーザの操作効率を向上させるような手法が多く提案されている。例えば、Ｗｅｂのアクセスログを解析し、ページＡを見た後には、ページＦをよく見るなどのパターンを抽出することで、ページＡを見たユーザに対して、次にページＦを見ることを推薦するような技術がある。 Conventionally, many methods have been proposed in which a user's item operation history is analyzed, a characteristic frequent pattern is extracted, and the extracted pattern is used to improve the user's operation efficiency. For example, after viewing the page A after analyzing the web access log and extracting the pattern such as carefully viewing the page F, the user who viewed the page A can see the page F next. There are techniques to recommend.

また、オフィスにおけるユーザのファイル操作履歴を分析して、作業の流れ（ワークフロー）を抽出する手法も提案されている。 There has also been proposed a method of extracting a work flow (workflow) by analyzing a file operation history of a user in an office.

例えば、下記の特許文献１では、プリンタや複写機などの画像処理装置で行われた処理についての画像情報を含む履歴を用いて業務手順を推定する手法が提案されている。この手法の特徴的なところは、文書画像の特徴量の類似度によりフォーム判定を行い、蓄積された多数の文書画像のログを、同一種類の帳票ひな型ごとの集合に分類することである。その結果「フォームＡの帳票は、中村（課員）が印刷して押印した後スキャンし、次に鈴木（課長）が押印の後コピーし、最後に田中（部長）が押印の後スキャンする」というようなワークフローが抽出できる。 For example, Patent Document 1 below proposes a method for estimating a work procedure using a history including image information regarding processing performed by an image processing apparatus such as a printer or a copier. A characteristic feature of this method is that form determination is performed based on the similarity of the feature amounts of document images, and the logs of a large number of accumulated document images are classified into sets for the same type of form template. As a result, “Form A forms are scanned after Nakamura (section member) prints and stamps, then Suzuki (section manager) copies after the seal, and finally Tanaka (section manager scans after the seal”. Such a workflow can be extracted.

また、例えば、下記の特許文献２では、オフィスでの文書に対する操作の履歴から、分岐を含むワークフローを生成する手法が提案されている。この手法は、文書単位で操作履歴レコードをノードとして時系列に並べたものをツリーとし、ツリー間で一部が共通の属性（ファイル名や、操作者、操作種別など）を含むノード同士を結合していくことで、分岐や結合を含んだワークフローを抽出している。 Further, for example, Patent Document 2 below proposes a method for generating a workflow including a branch from a history of operations on a document in an office. In this method, operation history records are arranged in chronological order as nodes in the form of a tree, and nodes that have some common attributes (file name, operator, operation type, etc.) are combined. By doing so, workflows that include branches and connections are extracted.

特開２００９−２２４９５８号公報JP 2009-224958 A 特開２０１０−１９１７０９号公報JP 2010-191709 A

Agrawal, R. and Srikant, R., "Fast Algorithms for Mining Association Rules", Proceedings of the 20th VLDB Conference, 1994, p487-499Agrawal, R. and Srikant, R., "Fast Algorithms for Mining Association Rules", Proceedings of the 20th VLDB Conference, 1994, p487-499 J. Pei, J. Han, B. Mortazavi-Asl, Q. Checn, U. Dayal, and M.C. Hsu, "PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth", Proceedings of ICDE, 2001, p215-224J. Pei, J. Han, B. Mortazavi-Asl, Q. Checn, U. Dayal, and M.C. Hsu, "PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth", Proceedings of ICDE, 2001, p215-224 Jian Pei, Haixun Wang, Jian Liu, Ke Wang, Jianyong Wang, and Philip S. Yu, "Discovering Frequent Closed Partial Orders from Strings", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 11, NOVEMBER 2006, p1467-1481Jian Pei, Haixun Wang, Jian Liu, Ke Wang, Jianyong Wang, and Philip S. Yu, "Discovering Frequent Closed Partial Orders from Strings", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 11, NOVEMBER 2006, p1467-1481

しかしながら、上述した従来手法には、下記の２つの課題がある。 However, the conventional method described above has the following two problems.

まず、１つ目の課題は、単一の文書（単一のアイテム）だけに限られたワークフローしか抽出できないことである。特許文献１も特許文献２も、単一の文書毎に操作履歴をまとめることで、その文書に対して、どういうユーザが、どのような順で、どういった操作を行っていくのかを推定している。しかしながら、オフィスでのワークフローは、複数の文書（複数のアイテム）を扱って複数のユーザが協調して行うようなものも多く、上述した従来手法ではこのようなワークフローを抽出することができない。 First, the first problem is that only a workflow limited to a single document (single item) can be extracted. In both Patent Document 1 and Patent Document 2, by summarizing the operation history for each single document, it is estimated what kind of user will perform what operation on that document in what order. ing. However, in many office workflows, a plurality of users (a plurality of items) are handled in cooperation with a plurality of users, and such a workflow cannot be extracted by the above-described conventional method.

２つ目の課題は、複数のユーザが並行して作業を行うようなワークフローを抽出できないことである。例えば、Ａさんが作業した後には、ＢさんとＣさんは独立して並行に作業を行うことができ、ＢさんとＣさんの作業が両方完了した後には、Ｄさんが作業を開始できるようなワークフローである。特許文献２では、分岐や結合を含むワークフローを扱えるが、ここで言う分岐や結合は、我々の言う並行作業での分岐や結合とは異なる。特許文献２では、分岐はＩＦ−ＴＨＥＮルールであり、例えば、見積もり依頼書を作成するワークフローにおいて、見積もり物品の種別に応じて、次のフローである依頼先の担当者を切り替えるようなワークフローである。つまり、実際の作業の流れは***であり、我々の言う複数のユーザが独立して並行に行うような作業の流れではなく、そのようなワークフローを抽出することはできない。 The second problem is that a workflow in which a plurality of users work in parallel cannot be extracted. For example, after Mr. A works, Mr. B and Mr. C can work independently and in parallel, and after both Mr. B and Mr. C work are completed, Mr. D can start working. It is a simple workflow. In Patent Document 2, a workflow including branching and joining can be handled, but the branching and joining referred to here is different from the branching and joining in the parallel work that we say. In Patent Document 2, branching is an IF-THEN rule. For example, in a workflow for creating a request for quotation, the workflow is such that the person in charge of the request, which is the next flow, is switched according to the type of the estimated article. . In other words, the actual work flow is one way, and it is not the work flow that multiple users say we perform independently and in parallel, and such a workflow cannot be extracted.

すなわち、上述した従来手法では、上述した２つの課題のために、複数のユーザが協調して複数のアイテムを操作して作業を行う場合において、限定的なワークフローしか抽出することができない。 That is, in the above-described conventional method, only a limited workflow can be extracted when a plurality of users cooperate with each other to operate a plurality of items because of the two problems described above.

本発明は、上述した従来手法による課題に鑑みてなされたものであり、複数のユーザが協調して複数のアイテムを操作して作業を行う場合において、より汎用的なワークフローの抽出を実現する仕組みを提供することを目的とする。 The present invention has been made in view of the problems caused by the above-described conventional method, and a mechanism for realizing a more versatile workflow extraction when a plurality of users cooperate and operate a plurality of items. The purpose is to provide.

本発明の頻出パターン抽出装置は、複数のアイテムにおける各アイテムに対する各ユーザの操作履歴から時系列の頻出パターンを抽出する頻出パターン抽出装置であって、前記複数のアイテムにおける各アイテム間の類似度に基づいて、前記複数のアイテムを複数のアイテムクラスタにクラスタリングするクラスタリング手段と、前記複数のアイテムクラスタの中から、前記頻出パターンの中心となる中心アイテムクラスタを抽出する中心クラスタ抽出手段と、前記複数のアイテムクラスタの中から、前記中心アイテムクラスタと同じ前記頻出パターンに含まれる従属アイテムクラスタを特定する従属クラスタ特定手段と、前記中心アイテムクラスタに属するアイテムと前記従属アイテムクラスタに属するアイテムにおける操作履歴に基づいて、前記各ユーザにおける前記中心アイテムクラスタおよび前記従属アイテムクラスタの操作シーケンスの集合を抽出するシーケンス抽出手段と、前記シーケンス抽出手段で抽出した操作シーケンスの集合に基づいて、前記頻出パターンを抽出する頻出パターン抽出手段とを有する。
また、本発明は、上述した頻出パターン抽出装置による頻出パターン抽出方法、及び、当該頻出パターン抽出方法をコンピュータに実行させるためのプログラムを含む。 The frequent pattern extraction device of the present invention is a frequent pattern extraction device that extracts a time-series frequent pattern from each user's operation history for each item in a plurality of items, and the similarity between the items in the plurality of items is determined. Based on, a clustering means for clustering the plurality of items into a plurality of item clusters, a center cluster extraction means for extracting a center item cluster that is the center of the frequent pattern from the plurality of item clusters, and the plurality of the plurality of items. Based on an operation history of an item belonging to the central item cluster and an item belonging to the subordinate item cluster, a subordinate cluster specifying means for specifying a subordinate item cluster included in the same frequent pattern as the central item cluster from among the item clusters A sequence extracting means for extracting a set of operation sequences of the central item cluster and the subordinate item cluster for each user, and a frequent pattern for extracting the frequent pattern based on the set of operation sequences extracted by the sequence extracting means Extraction means.
Further, the present invention includes a frequent pattern extraction method by the above-described frequent pattern extraction apparatus and a program for causing a computer to execute the frequent pattern extraction method.

本発明によれば、複数のユーザが協調して複数のアイテムを操作して作業を行う場合において、ユーザが独立して並行に作業を行うようなパターンも含めて、より汎用的なワークフローの抽出を実現することができる。これにより、このワークフローを用いて、ユーザのアイテム操作をナビゲートするなど、作業の効率を向上させることが可能となる。 According to the present invention, when a plurality of users collaborate to operate a plurality of items and perform work, a more general workflow extraction including a pattern in which the users work independently and in parallel is performed. Can be realized. This makes it possible to improve work efficiency, such as navigating user item operations, using this workflow.

本発明の実施形態に係る頻出パターン抽出システムの装置構成の一例を示す模式図である。It is a schematic diagram which shows an example of the apparatus structure of the frequent pattern extraction system which concerns on embodiment of this invention. 図１に示す各装置の内部構成の一例を示すブロック図である。It is a block diagram which shows an example of an internal structure of each apparatus shown in FIG. 図１に示すファイル管理サーバー内に構築されるファイル管理システムの機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a function structure of the file management system constructed | assembled in the file management server shown in FIG. 本発明の実施形態を示し、図３に示す操作履歴データベースにファイル操作履歴として格納されるファイル操作情報の一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the file operation information stored as a file operation history in the operation history database shown in FIG. 本発明の実施形態を示し、図３に示すワークフロー抽出部によるワークフロー抽出処理の処理手順の一例を示すフローチャートである。4 is a flowchart illustrating an example of a processing procedure of workflow extraction processing by the workflow extraction unit illustrated in FIG. 3 according to the embodiment of this invention. 本発明の実施形態を示し、図３に示すワークフロー抽出部によるワークフロー抽出処理の処理手順の一例を示すフローチャートである。4 is a flowchart illustrating an example of a processing procedure of workflow extraction processing by the workflow extraction unit illustrated in FIG. 3 according to the embodiment of this invention. 本発明の実施形態を示し、図３に示すワークフロー抽出部が抽出対象とするワークフローの一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the workflow made into the extraction object by the workflow extraction part shown in FIG. 本発明の実施形態を示し、図３に示す操作履歴データベースにファイル操作履歴として格納されるファイル操作情報の一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the file operation information stored as a file operation history in the operation history database shown in FIG. 本発明の実施形態を示し、ファイルのコピー関係によるファイル間の類似度の一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the similarity between files by the copy relationship of a file. 本発明の実施形態を示し、図８に示すファイル操作情報におけるファイルを階層型クラスタリングした一例を示す図である。It is a figure which shows embodiment of this invention and shows an example which carried out the hierarchical clustering of the file in the file operation information shown in FIG. 本発明の実施形態を示し、図８に示すファイル操作情報について、ファイル別にユーザのファイル操作を時系列にマッピングした一例を示す図である。FIG. 9 is a diagram illustrating an example in which a user's file operations are mapped in time series for each file in the file operation information illustrated in FIG. 8 according to the embodiment of this invention. 本発明の実施形態を示し、図８に示すファイル操作情報について、ファイルクラスタ別にユーザのファイル操作を時系列にマッピングした一例を示す図である。FIG. 9 is a diagram illustrating an example in which user file operations are mapped in time series for each file cluster in the file operation information illustrated in FIG. 8 according to the embodiment of this invention. 本発明の実施形態を示し、図８に示すファイル操作情報について、ＦＣ１及びＦＣ２に属するファイル別にユーザのファイル操作を時系列にマッピングした一例を示す図である。FIG. 9 is a diagram illustrating an example in which a user's file operation is mapped in time series for each file belonging to FC1 and FC2 in the file operation information illustrated in FIG. 8 according to the embodiment of the present invention. 本発明の実施形態を示し、ファイル操作シーケンス及びファイルクラスタ操作シーケンスの一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of a file operation sequence and a file cluster operation sequence. 本発明の実施形態を示し、シーケンシャルパターマイニングを説明するための図である。It is a figure for demonstrating embodiment of this invention and explaining sequential pattern mining. 本発明の実施形態を示し、図６のステップＳ６０４において抽出されるワークフローの一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the workflow extracted in step S604 of FIG.

以下に、図面を参照しながら、本発明を実施するための形態（実施形態）について説明する。 Hereinafter, embodiments (embodiments) for carrying out the present invention will be described with reference to the drawings.

本発明の実施形態では、アイテムとして、フォルダ（またはディレクトリ）構造を持つファイルを対象とする。なお、本実施形態では、アイテムとしてファイルを対象としているが、本発明においては、これに限定されるものではない。 In the embodiment of the present invention, the item is a file having a folder (or directory) structure. In this embodiment, a file is targeted as an item. However, the present invention is not limited to this.

図１は、本発明の実施形態に係る頻出パターン抽出システムの装置構成の一例を示す模式図である。
頻出パターン抽出システムは、クライアントサーバモデルとして実現される。具体的に、本実施形態に係る頻出パターン抽出システムは、図１に示すように、ネットワーク１０１、端末Ａ１０２、端末Ｂ１０３、端末Ｃ１０４、及び、ファイル管理サーバー１０５を備えて構成されている。 FIG. 1 is a schematic diagram illustrating an example of a device configuration of a frequent pattern extraction system according to an embodiment of the present invention.
The frequent pattern extraction system is realized as a client-server model. Specifically, the frequent pattern extraction system according to the present embodiment includes a network 101, a terminal A102, a terminal B103, a terminal C104, and a file management server 105 as shown in FIG.

端末Ａ１０２、端末Ｂ１０３、端末Ｃ１０４、及び、ファイル管理サーバー１０５は、ネットワーク１０１を介して接続されており、それぞれ相互間で各種の情報の授受を実行する。ユーザは、それぞれ、端末Ａ１０２、端末Ｂ１０３、端末Ｃ１０４上の専用のクライアントツールを用いて、ファイルの登録、閲覧、削除などのファイル操作を行う。 The terminal A102, the terminal B103, the terminal C104, and the file management server 105 are connected via the network 101, and exchange various types of information with each other. The user performs file operations such as file registration, browsing, and deletion using dedicated client tools on the terminals A102, B103, and C104, respectively.

図２は、図１に示す各装置の内部構成の一例を示すブロック図である。
図１に示す各装置は、図２に示すように、制御部２０１、バス２０２、メモリ部２０３、大規模記憶部２０４、表示部２０５、入力部２０６、出力部２０７、及び、ネットワーク接続部２０８を有して構成されている。 FIG. 2 is a block diagram illustrating an example of an internal configuration of each apparatus illustrated in FIG.
1 includes a control unit 201, a bus 202, a memory unit 203, a large-scale storage unit 204, a display unit 205, an input unit 206, an output unit 207, and a network connection unit 208, as shown in FIG. It is comprised.

制御部２０１は、例えばＣＰＵ等で構成されており、当該装置における動作を統括的に制御する。 The control unit 201 includes, for example, a CPU and the like, and comprehensively controls operations in the device.

バス２０２は、制御部２０１、メモリ部２０３、大規模記憶部２０４、表示部２０５、入力部２０６、出力部２０７、及び、ネットワーク接続部２０８を相互に通信可能に接続する。制御部２０１は、バス２０２を介して、当該装置の各部（２０３〜２０８）を制御することにより、当該装置における動作を統括的に制御する。 The bus 202 connects the control unit 201, the memory unit 203, the large-scale storage unit 204, the display unit 205, the input unit 206, the output unit 207, and the network connection unit 208 so that they can communicate with each other. The control unit 201 controls the operation of the device in an integrated manner by controlling each unit (203 to 208) of the device via the bus 202.

メモリ部２０３は、例えば、ＲＡＭやＲＯＭ等で構成される電子的な記憶装置である。制御部２０１は、このメモリ部２０３に記憶されたプログラムやデータに従って動作し、バス２０２を介して接続された当該装置の各部を制御する。 The memory unit 203 is an electronic storage device composed of, for example, a RAM or a ROM. The control unit 201 operates in accordance with programs and data stored in the memory unit 203 and controls each unit of the device connected via the bus 202.

大規模記憶部２０４は、例えば、ハードディスクや光学ディスク等で構成される記憶装置である。 The large-scale storage unit 204 is a storage device configured by, for example, a hard disk or an optical disk.

表示部２０５は、本システムを使用するユーザに対し、文書や画像等を表示するディスプレイ装置である。 A display unit 205 is a display device that displays a document, an image, and the like to a user who uses the system.

入力部２０６は、例えば、表示部２０５の表示内容に連動した指示等を入力するためのマウス、スティック、パッド等のポインティングデバイスである。なお、タッチパネル機能付きディスプレイ等、表示部２０５と入力部２０６を兼ねる装置を用いてもよい。 The input unit 206 is, for example, a pointing device such as a mouse, a stick, or a pad for inputting an instruction or the like linked to the display content of the display unit 205. In addition, you may use the apparatus which serves as the display part 205 and the input part 206, such as a display with a touchscreen function.

出力部２０７は、例えば、電子データを紙に出力するプリンタデバイス等である。 The output unit 207 is, for example, a printer device that outputs electronic data to paper.

ネットワーク接続部２０８は、電子データを装置外から取り込んだり、或いは、電子データを装置外に送信したりするためのネットワークインターフェースである。 The network connection unit 208 is a network interface for taking in electronic data from the outside of the apparatus or transmitting electronic data to the outside of the apparatus.

なお、図２に示す２０１〜２０８は、ＰＣ等の汎用コンピュータ単体として構成してもよいし、或いは、ＭＦＰ等の電子機器内に構築してもよい。また、互いに接続された複数のコンピュータやサーバー、及び、ディスプレイやＰＤＡ等の周辺機器の集合によって構築してもよい。 2 may be configured as a single general-purpose computer such as a PC, or may be constructed in an electronic device such as an MFP. Alternatively, a plurality of computers and servers connected to each other and a set of peripheral devices such as a display and a PDA may be used.

図３は、図１に示すファイル管理サーバー１０５内に構築されるファイル管理システム３２０の機能構成の一例を示すブロック図である。なお、図３において、ユーザ端末３１０は、端末Ａ１０２、端末Ｂ１０３或いは端末Ｃ１０４に相当し、クライアントツールが構築されている。ファイル管理システム３２０が構築されるファイル管理サーバー１０５は、本発明の実施形態に係る頻出パターン抽出装置（複数のアイテムにおける各アイテムに対する各ユーザの操作履歴から時系列の頻出パターンを抽出する頻出パターン抽出装置）を構成する。 FIG. 3 is a block diagram showing an example of a functional configuration of the file management system 320 constructed in the file management server 105 shown in FIG. In FIG. 3, the user terminal 310 corresponds to the terminal A102, the terminal B103, or the terminal C104, and a client tool is constructed. The file management server 105 in which the file management system 320 is constructed is a frequent pattern extraction apparatus according to an embodiment of the present invention (frequent pattern extraction that extracts a time-series frequent pattern from each user's operation history for each item in a plurality of items. Device).

ファイル管理システム３２０は、操作取得部３２１、ファイル管理部３２２、データベース３２３、操作履歴管理部３２４、操作履歴データベース３２５、情報送信部３２６、及び、ワークフロー抽出部３２７を有して構成されている。 The file management system 320 includes an operation acquisition unit 321, a file management unit 322, a database 323, an operation history management unit 324, an operation history database 325, an information transmission unit 326, and a workflow extraction unit 327.

なお、本実施形態では、ファイル管理システム３２０の中にワークフロー抽出機能を有するワークフロー抽出部３２７を構成しているが、本発明においてはこの形態に限定されるものではない。例えば、ファイル管理機能とワークフロー抽出機能とをそれぞれ単体で構築してもよいし、ワークフロー抽出機能を、ファイル管理システム３２０とは別の他のシステムに組み込む形で実施してもよい。また、本実施形態では、ファイル管理システム３２０をクライントサーバモデルで実施しているが、本発明においてはこの形態に限定されるものではなく、例えばクライアント単体でも実施可能である。 In the present embodiment, the workflow extraction unit 327 having the workflow extraction function is configured in the file management system 320, but the present invention is not limited to this form. For example, the file management function and the workflow extraction function may be individually constructed, or the workflow extraction function may be implemented by being incorporated in another system different from the file management system 320. In the present embodiment, the file management system 320 is implemented in the client server model. However, the present invention is not limited to this mode, and can be implemented by, for example, a single client.

ここで、図３に示す各構成部（３２１〜３２７）と、図２に示す各構成部との対応関係の一例について説明する。
例えば、図２に示す制御部２０１及びメモリ部２０３に記憶されているプログラム、並びに、ネットワーク接続部２０８から、図３に示す操作取得部３２１及び情報送信部３２６が構成される。
また、例えば、図２に示す制御部２０１及びメモリ部２０３に記憶されているプログラムから、図３に示すファイル管理部３２２、操作履歴管理部３２４及びワークフロー抽出部３２７が構成される。
また、例えば、図２に示す大規模記憶部２０４から、データベース３２３及び操作履歴データベース３２５が構成される。 Here, an example of a correspondence relationship between the components (321 to 327) illustrated in FIG. 3 and the components illustrated in FIG. 2 will be described.
For example, the operation acquisition unit 321 and the information transmission unit 326 illustrated in FIG. 3 are configured from the programs stored in the control unit 201 and the memory unit 203 illustrated in FIG. 2 and the network connection unit 208.
Further, for example, the file management unit 322, the operation history management unit 324, and the workflow extraction unit 327 illustrated in FIG. 3 are configured from the programs stored in the control unit 201 and the memory unit 203 illustrated in FIG.
For example, the large-scale storage unit 204 illustrated in FIG. 2 includes a database 323 and an operation history database 325.

操作取得部３２１は、ユーザ端末３１０上のクライアントツールから入力されたファイル操作情報を取得する。そして、操作取得部３２１は、取得したファイル操作情報を、ファイル管理部３２２や操作履歴管理部３２４に送信する。 The operation acquisition unit 321 acquires file operation information input from a client tool on the user terminal 310. Then, the operation acquisition unit 321 transmits the acquired file operation information to the file management unit 322 and the operation history management unit 324.

ファイル管理部３２２は、操作取得部３２１から送信されたファイル操作情報を受け取り、ファイル操作情報に基づきデータベース３２３と連携して所定のファイル操作処理を行う。ここで言うファイル操作とは、例えば、ファイルの新規登録や、オープン、コピー、削除、また、フォルダに対する操作などを指し、その処理内容は一般的なファイル管理システムと同様である。この処理結果の情報は、情報送信部３２６を通じて、ユーザ端末３１０に送られ、ユーザ端末３１０上のクライアントツールに表示される。 The file management unit 322 receives the file operation information transmitted from the operation acquisition unit 321 and performs a predetermined file operation process in cooperation with the database 323 based on the file operation information. The file operation mentioned here refers to, for example, new file registration, open, copy, delete, and operation on a folder, and the processing content is the same as that of a general file management system. Information on the processing result is sent to the user terminal 310 through the information transmission unit 326 and displayed on the client tool on the user terminal 310.

データベース３２３は、ファイル管理システム３２０で管理するファイルやフォルダの情報や、ファイル管理システム３２０を利用するユーザのユーザ情報などを格納する。ユーザ情報としては、ユーザ名やユーザＩＤなどのユーザ単体の情報だけではなく、ユーザが所属するグループや、グループに所属しているユーザのリストなどのユーザグループに関する情報も含む。 The database 323 stores information on files and folders managed by the file management system 320, user information of users who use the file management system 320, and the like. The user information includes not only individual user information such as a user name and a user ID, but also information related to a user group such as a group to which the user belongs and a list of users belonging to the group.

操作履歴管理部３２４は、操作取得部３２１から送信されたファイル操作情報を受け取り、操作履歴データベース３２５にファイル操作履歴としてファイル操作情報を格納して管理する。 The operation history management unit 324 receives the file operation information transmitted from the operation acquisition unit 321 and stores the file operation information as a file operation history in the operation history database 325 for management.

操作履歴データベース３２５は、操作履歴管理部３２４からのファイル操作情報をファイル操作履歴として格納する。 The operation history database 325 stores file operation information from the operation history management unit 324 as a file operation history.

図４は、本発明の実施形態を示し、図３に示す操作履歴データベース３２５にファイル操作履歴として格納されるファイル操作情報の一例を示す図である。
図４において、ログＩＤ４０１は、ファイル操作情報を一意に識別するための符号である。時間４０２は、ファイル操作が行われた時間情報を表す。ユーザＩＤ４０３は、ファイル操作を行ったユーザを識別するための符号である。ファイルＩＤ４０４は、操作対象のファイルを識別するための符号である。操作イベント４０５は、実行されたファイル操作イベントの種類を表す。この図４に示すファイル操作情報は一例であり、これに限定されるわけではない。以降、説明を簡単にするためにファイルに対する操作は省略することがあるが、実際にはファイルとその操作はセットとして扱われ、ファイル操作が一致するとは、ファイルとその操作の両方が一致することを指している。 FIG. 4 is a diagram illustrating an example of file operation information stored as a file operation history in the operation history database 325 illustrated in FIG. 3 according to the embodiment of this invention.
In FIG. 4, a log ID 401 is a code for uniquely identifying file operation information. Time 402 represents time information when the file operation was performed. The user ID 403 is a code for identifying the user who performed the file operation. The file ID 404 is a code for identifying the operation target file. The operation event 405 represents the type of the file operation event that has been executed. The file operation information shown in FIG. 4 is an example, and the present invention is not limited to this. In the following, operations on files may be omitted for the sake of simplicity, but in reality, files and their operations are treated as a set, and when file operations match, both the file and its operations match. Pointing.

ここで、再び、図３の説明に戻る。
情報送信部３２６は、ファイル管理部３２２からの情報をユーザ端末３１０に送信する。 Here, it returns to description of FIG. 3 again.
The information transmission unit 326 transmits information from the file management unit 322 to the user terminal 310.

ワークフロー抽出部３２７は、ワークフローを抽出する処理を行う。ここで言うワークフローとは、ある目的を達成するための作業の流れを指し、ユーザとファイル操作をノードとしたグラフ構造で表せられるものである。 The workflow extraction unit 327 performs processing for extracting a workflow. The workflow mentioned here refers to the flow of work for achieving a certain purpose, and can be represented by a graph structure in which a user and a file operation are nodes.

図７は、本発明の実施形態を示し、図３に示すワークフロー抽出部３２７が抽出対象とするワークフローの一例を示す図である。
図７において、ノード７０１は、ユーザＡがＦＣ１（ＦｉｌｅＣｌｕｓｔｅｒ１）に含まれるファイルに対して操作を行うことを表している。ここで、ＦＣ１（７０８）は、図７に示すように、Ｆｉｌｅ１とＦｉｌｅ６が属するファイルクラスタである。このように、ワークフローの各ノードをファイル操作ではなく、ファイルクラスタに対する操作として表現するのは、同じワークフローでも、その都度扱うファイルが異なる場合が多いためである。例えば、見積書を作成するワークフローの場合、顧客毎に作成する見積書ファイルは異なるため、１つのワークフローとして表すには、それら顧客毎の見積書ファイルをまとまりとして扱う必要がある。つまり、ＦＣ１（７０８）では、Ｆｉｌｅ１やＦｉｌｅ６がそれぞれ別の顧客に対する見積書を表し、ＦＣ１（７０８）はそれらの見積書の集合を表す。このようなファイルをファイルクラスタとして置き換えることをファイルの抽象化と呼ぶことにする。 FIG. 7 is a diagram illustrating an example of a workflow to be extracted by the workflow extraction unit 327 illustrated in FIG. 3 according to the embodiment of this invention.
In FIG. 7, a node 701 represents that the user A performs an operation on a file included in FC1 (File Cluster 1). Here, FC1 (708) is a file cluster to which File1 and File6 belong as shown in FIG. As described above, each node of the workflow is expressed as an operation on the file cluster rather than a file operation because the file handled in each workflow is often different. For example, in the case of a workflow for creating an estimate, the estimate file to be created for each customer is different. Therefore, in order to represent as one workflow, it is necessary to handle the estimate file for each customer as a group. That is, in FC1 (708), File1 and File6 each represent an estimate for another customer, and FC1 (708) represents a set of these estimates. Replacing such a file as a file cluster is called file abstraction.

図７において、分岐７０２は、作業の分岐を表し、ノード７０１の作業が完了した後に、ノード７０３やノード７０４の作業が独立して並行に行えることを表している。つまり、図７の例では、ユーザＡがＦＣ１への操作を行った後に、ユーザＢやユーザＣがＦＣ１への操作を並行して行うことを表している。この際、ユーザＢの操作とユーザＣの操作には順番がなく、どちらが先に操作を行ってもよいし、これらの操作を同時に行ってもよい。 In FIG. 7, a branch 702 represents a branch of work, and indicates that the work of the node 703 and the node 704 can be performed independently and in parallel after the work of the node 701 is completed. That is, in the example of FIG. 7, after the user A performs an operation on the FC1, the user B and the user C perform an operation on the FC1 in parallel. At this time, the operation of the user B and the operation of the user C are not in order, and either may perform the operation first, or these operations may be performed simultaneously.

図７において、結合７０５は、作業の結合を表す。結合７０５には、同期や非同期があり、同期とは、結合前の作業が全て完了したときのみ結合後の作業を行えるものであり、非同期とは、結合前の作業の一部が完了すれば結合後の作業を行えるものである。例えば、図７の例では、結合７０５が同期だとすると、ユーザＢとユーザＣによるＦＣ１への操作がどちらも完了したときに、ノード７０６においてユーザＤがＦＣ２に対する操作を行えることになる。 In FIG. 7, a connection 705 represents a connection of work. There are synchronous and asynchronous in the coupling 705. Synchronous means that the work after the combination can be performed only when all the work before the combination is completed. Asynchronous is when the part of the work before the combination is completed. The work after the combination can be performed. For example, in the example of FIG. 7, assuming that the coupling 705 is synchronous, the user D can perform an operation on the FC 2 at the node 706 when both the operations on the FC 1 by the user B and the user C are completed.

図７において、ノード７０６は、ユーザＤがＦＣ２（ＦｉｌｅＣｌｕｓｔｅｒ２）に属するファイルに対して操作を行うことを表している。ここで、ＦＣ２（７０９）は、図７に示すように、Ｆｉｌｅ２とＦｉｌｅ８が属するファイルクラスタである。 In FIG. 7, a node 706 represents that the user D performs an operation on a file belonging to FC2 (File Cluster 2). Here, FC2 (709) is a file cluster to which File2 and File8 belong, as shown in FIG.

図７において、ノード７０７は、ユーザＥがＦＣ１に属するファイルに対して操作を行うことを表している。 In FIG. 7, a node 707 represents that the user E performs an operation on a file belonging to FC1.

このように、本発明の実施形態におけるワークフローは、複数のユーザが複数のアイテムを操作し、かつ、各ユーザが独立して並行に操作を行うような分岐・結合パターンを含んでいることが特徴である。 As described above, the workflow according to the embodiment of the present invention includes a branching / joining pattern in which a plurality of users operate a plurality of items, and each user performs an operation in parallel independently. It is.

ここで、再び、図３の説明に戻る。
ワークフロー抽出部３２７は、図３に示すように、類似度計算部３２７１、クラスタリング部３２７２、中心クラスタ抽出部３２７３、共起確率計算部３２７４、従属クラスタ特定部３２７５、シーケンス抽出部３２７６、及び、頻出パターン抽出部３２７７を有して構成されている。 Here, it returns to description of FIG. 3 again.
As shown in FIG. 3, the workflow extraction unit 327 includes a similarity calculation unit 3271, a clustering unit 3272, a central cluster extraction unit 3273, a co-occurrence probability calculation unit 3274, a dependent cluster specification unit 3275, a sequence extraction unit 3276, and a frequent appearance. A pattern extraction unit 3277 is included.

次に、ワークフロー抽出部３２７によるワークフロー抽出処理（頻出パターン抽出処理）について説明する。
図５及び図６は、本発明の実施形態を示し、図３に示すワークフロー抽出部３２７によるワークフロー抽出処理の処理手順の一例を示すフローチャートである。このフローチャートの処理は、図２に示す制御部２０１がメモリ部２０３に記憶されているプログラムを実行することにより行われる。より具体的には、このフローチャートの処理は、図３に示すワークフロー抽出部３２７の各構成部（３２７１〜３２７７）により行われる。 Next, a workflow extraction process (frequent pattern extraction process) by the workflow extraction unit 327 will be described.
5 and 6 are flowcharts showing an embodiment of the present invention and showing an example of a processing procedure of workflow extraction processing by the workflow extraction unit 327 shown in FIG. The process of this flowchart is performed by the control unit 201 illustrated in FIG. 2 executing a program stored in the memory unit 203. More specifically, the processing of this flowchart is performed by each component (3271 to 3277) of the workflow extraction unit 327 shown in FIG.

なお、図５及び図６のフローチャートの説明においては、図８に示すファイル操作情報の例を用いて説明を行う。この際、説明を簡単にするために、ファイル操作については省略している。
図８は、本発明の実施形態を示し、図３に示す操作履歴データベース３２５にファイル操作履歴として格納されるファイル操作情報の一例を示す図である。この図８には、ログＩＤ、時間、ユーザＩＤ及びファイルＩＤについてのファイル操作情報が示されている。
また、図１１は、本発明の実施形態を示し、図８に示すファイル操作情報について、ファイル別にユーザのファイル操作を時系列にマッピングした一例を示す図である。図１１において、例えば、イベント１１０１は、ユーザＡがファイル１（Ｆｉｌｅ１）に対して操作を行ったことを示している。その後、ファイル１は、イベント１１０２においてユーザＢによって操作されていることが分かる。 5 and 6 will be described using the example of the file operation information shown in FIG. At this time, file operations are omitted for the sake of simplicity.
FIG. 8 is a diagram illustrating an example of file operation information stored as a file operation history in the operation history database 325 illustrated in FIG. 3 according to the embodiment of this invention. FIG. 8 shows file operation information for log ID, time, user ID, and file ID.
FIG. 11 shows an embodiment of the present invention, and is a diagram showing an example in which the user's file operations are mapped in time series for each file in the file operation information shown in FIG. In FIG. 11, for example, an event 1101 indicates that the user A has performed an operation on the file 1 (File 1). Thereafter, it can be seen that file 1 is being operated by user B at event 1102.

ここで、まず、図５のフローチャートの説明を行う。
ステップＳ５０１において、ワークフロー抽出部３２７の類似度計算部３２７１は、ファイルを抽象化するために、データベース３２３内の全てのファイル間の類似度の計算を行う。ここで、ファイル間の類似度としては、一般的に良く用いられる文書に含まれる単語の類似性を指標とするのではなく、作業におけるファイルの利用目的が似ているものを類似度が高いと見なす指標を用いるのがよい。例えば、そのような指標として、次のようなものが利用できる。
・ファイルの派生関係
・ファイルの構造情報（ＸＭＬ構造）
・ファイルの共起頻度情報
・ファイルの属性情報
それぞれの指標における類似度は、必要に応じて単体で用いても、複数を組み合わせて用いてもよく、また、これらに限定されるものではない。それぞれの指標におけるファイル間の類似度の計算方法について以下に詳しく説明する。 First, the flowchart of FIG. 5 will be described.
In step S501, the similarity calculation unit 3271 of the workflow extraction unit 327 calculates the similarity between all the files in the database 323 in order to abstract the files. Here, the similarity between files is not based on the similarity of words contained in a document that is commonly used in general. It is good to use the index to consider. For example, the following can be used as such an index.
・ File derivation ・ File structure information (XML structure)
-File co-occurrence frequency information-File attribute information The degree of similarity of each index may be used alone or in combination as necessary, and is not limited to these. A method for calculating the similarity between files in each index will be described in detail below.

まず、「ファイルの派生関係」によるファイル間の類似度について説明する。
例えば、あるテンプレートがあり、そのテンプレートをコピーして作成したファイルＡと、ファイルＢがあったとき、ファイルＡとファイルＢは同じ目的の作業に使用された可能性が高いと考えられる。このような考えから、ファイルの派生関係を利用してファイル間の類似度を定義することができる。単純な方法では、例えば、コピー関係にあるファイルを図７のように木構造で表すと、自分自身との類似度を１とし、自分から離れていく毎に減衰係数を類似度に掛けることにより他のファイルとの類似度を求めることができる。
図９は、本発明の実施形態を示し、ファイルのコピー関係によるファイル間の類似度の一例を示す図である。図９（ａ）に示すコピー関係の場合、減衰係数を０．９としたときの各ファイル間の類似度は、図９（ｂ）に示す通りになる。例えばＦｉｌｅＡＡＡは、ＦｉｌｅＡをコピーして作成したファイルＦｉｌｅＡＡをコピーして作成したファイルなので、ＦｉｌｅＡとＦｉｌｅＡＡＡとの間の類似度は、１×０．９×０．９＝０．８１となる。 First, the similarity between files based on the “file derivation relationship” will be described.
For example, when there is a template and there are a file A and a file B created by copying the template, it is considered that there is a high possibility that the file A and the file B are used for the same purpose work. From such an idea, the degree of similarity between files can be defined using the derivation relationship of files. In a simple method, for example, when a file having a copy relationship is represented by a tree structure as shown in FIG. 7, the similarity with itself is set to 1, and the degree of similarity is multiplied by the attenuation coefficient every time the user moves away from the user. Similarity with other files can be obtained.
FIG. 9 is a diagram illustrating an embodiment of the present invention and an example of similarity between files based on a file copy relationship. In the case of the copy relationship shown in FIG. 9A, the similarity between files when the attenuation coefficient is 0.9 is as shown in FIG. 9B. For example, since FileAAA is a file created by copying FileAA by copying FileA, the similarity between FileA and FileAAA is 1 × 0.9 × 0.9 = 0.81.

次いで、「ファイルの構造情報（ＸＭＬ構造）」によるファイル間の類似度について説明する。
近年、文書ファイルは、独自形式からＸＭＬ形式に替わってきているものが多い。ＸＭＬ形式では、文書内容にタグ付けがしてあり、文書の構造と内容とを分離して処理し易い点が特徴である。そこで、文書ファイル間で文書内容には因らず、文書構造が似たものを容易に探すことが可能である。例えば、同じテンプレートから作成されたファイルＡとファイルＢとは文書内容は異なるが、同じテンプレートから引き継いだ文書構造は似ているため、文書構造による類似度は有効な指標となる。 Next, the similarity between files based on “file structure information (XML structure)” will be described.
In recent years, many document files have been changed from the original format to the XML format. The XML format is characterized in that the document content is tagged and it is easy to process the document structure and content separately. Therefore, it is possible to easily find a document having a similar document structure regardless of the document contents between the document files. For example, the file A and the file B created from the same template have different document contents, but the document structure inherited from the same template is similar, so the similarity based on the document structure is an effective index.

次いで、「ファイルの共起頻度情報」によるファイル間の類似度について説明する。
例えば、ＦｉｌｅＡは、ＦｉｌｅＢ及びＦｉｌｅＣと一緒に使用される確率が高いとし、また別のＦｉｌｅＸも、ＦｉｌｅＢ及びＦｉｌｅＣと一緒に使用される確率が高いとき、ＦｉｌｅＡとＦｉｌｅＸは同じ目的の作業において使用のされた方が似ていると推定することができる。このような考え方から、ファイルの共起頻度情報を用いてファイル間の類似度を定義できる。類似度の単純な算出方法としては、２つのファイル間で共通している共起ファイル数を、それぞれのファイルの共起ファイル数の平均で割るなどすればよい。なお、共通している共起ファイルとは、同一のファイルだけを指すわけではなく、類似したファイルを含むようにしてもよい。 Next, the similarity between files based on “file co-occurrence frequency information” will be described.
For example, if FileA is likely to be used with FileB and FileC, and another FileX is also likely to be used with FileB and FileC, FileA and FileX will be used in the same purpose task. It can be presumed that the person who is done is similar. From this concept, the degree of similarity between files can be defined using file co-occurrence frequency information. As a simple method of calculating the similarity, the number of co-occurrence files that are common between two files may be divided by the average number of co-occurrence files of the respective files. The common co-occurrence file does not indicate only the same file but may include a similar file.

次いで、「ファイルの属性情報」によるファイル間の類似度について説明する。
ファイル間の類似度を計算する上で有効な情報として、ファイル名やパス名などがある。同じ目的の作業ではファイル名に共通性が見られ、一部分が異なっていることが多い。そのような例として、例えば、会議の議事録などは、ファイル名の違いが日付であったり、また、何かの調査だったりするとファイル名の違いはユーザ名だったりする。このようにファイル名に共通性が見られるものを、ファイル間の類似度が高いとすればよい。例えば、ｆｉｌｅＡとｆｉｌｅＢのファイル名による類似度をｓｉｍ（ｆｉｌｅＡ，ｆｉｌｅＢ）として、単純には、以下の（１）式のように定義できる。 Next, the similarity between files based on “file attribute information” will be described.
Effective information for calculating the similarity between files includes a file name and a path name. For work with the same purpose, there is a commonality in file names, and some parts are different. As an example of this, for example, in the minutes of a meeting, the difference in file names is the date, or if the investigation is something, the difference in file names is the user name. In this way, what has commonality in file names may be considered as having high similarity between files. For example, the similarity based on the file names of fileA and fileB can be defined as sim (fileA, fileB) as shown in the following equation (1).

（１）式において、ｌｅｎ（ｆｉｌｅＡ）は、ｆｉｌｅＡのファイル名の長さを表し、ｍｉｎ（ｌｅｎ（ｆｉｌｅＡ），ｌｅｎ（ｆｉｌｅＢ））は、ｆｉｌｅＡのファイル名の長さとｆｉｌｅＢのファイル名の長さのうちの短い方の長さを表す。また、（１）式において、ＬＣＳ（ｆｉｌｅＡ，ｆｉｌｅＢ）は、ｆｉｌｅＡのファイル名とｆｉｌｅＢのファイル名の最長共通部分列（ＬｏｎｇｅｓｔＣｏｍｍｏｎＳｕｂｓｅｑｕｅｎｃｅ：ＬＣＳ）を表す。ここで、部分列（Ｓｕｂｓｅｑｕｅｎｃｅ）は、系列のいくつかの要素を取り出してできた系列のことである。２つの系列の共通の部分列を共通部分列（ＣｏｍｍｏｎＳｕｂｓｅｑｕｅｎｃｅ）と呼ぶ。共通部分列のうち、最も長いものを最長共通部分列（ＬｏｎｇｅｓｔＣｏｍｍｏｎＳｕｂｓｅｑｕｅｎｃｅ：ＬＣＳ）と呼ぶ。
また、ファイル名による類似度の他の例として、編集距離と呼ばれる、情報理論において２つの文字列がどの程度異なっているかを示す数値を用いることもできる。具体的には、文字の挿入や削除、置換によって、１つの文字列を別の文字列に変形するのに必要な手順の最小回数として与えられる。 In equation (1), len (fileA) represents the length of the file name of fileA, and min (len (fileA), len (fileB)) represents the length of the file name of fileA and the length of the file name of fileB. Of the shorter of the two. In the equation (1), LCS (fileA, fileB) represents the longest common subsequence (LCS) of the file name of fileA and the file name of fileB. Here, the subsequence is a sequence obtained by extracting some elements of the sequence. A common partial sequence of the two sequences is referred to as a common partial sequence. The longest common subsequence is called the longest common subsequence (LCS).
As another example of the similarity based on the file name, a numerical value called an edit distance, which indicates how much two character strings are different in information theory, can be used. Specifically, it is given as the minimum number of procedures required to transform one character string into another character string by inserting, deleting, or replacing characters.

以上、４つの指標について説明したが、ファイル間の類似度として、そのうち１つを用いてもよいし、また、任意の複数の指標を組み合わせる形で用いてもよい。また、ここで挙げた指標は一例であり、それ以外でも、作業におけるファイルの利用目的が似ているものを類似度が高いと見なす指標であればよい。 Although the four indexes have been described above, one of them may be used as the similarity between files, or a plurality of arbitrary indexes may be combined. In addition, the index mentioned here is only an example, and any other index may be used as long as the similarity is high when the usage purpose of the file in the work is similar.

ここで、再び、図５の説明に戻る。
ステップＳ５０１の処理が終了すると、ステップＳ５０２に進む。
ステップＳ５０２に進むと、ワークフロー抽出部３２７のクラスタリング部３２７２は、ステップＳ５０１による計算処理により得られた、複数のファイルにおける各ファイル間（各アイテム間）の類似度を用いて、ファイルをクラスタリングする処理を行う。即ち、ここでは、データベース３２３内に格納されている複数のファイル（複数のアイテム）を複数のファイルクラスタ（複数のアイテムクラスタ）にクラスタリングする処理を行う。ここで、クラスタリングの手法としては、階層型と非階層型との２つに大別されるが、ここでは、クラスタの数を予め定める必要のない階層型クラスタリングの手法を用いる。階層型クラスタリングの代表的な手法に、最短距離法、最長距離法、群平均法、ウォード法などがあるが、本実施形態においてはどれを用いてもよい。本ステップでは、結果として、作業におけるファイルの使用のされ方が似ているものをグループとしてまとめたものをファイルクラスタとして出力する。なお、ファイルクラスタは１つ以上のファイルのまとまりであり、類似するファイルが１つもないファイルでも、それ単体でファイルクラスタとする。 Here, it returns to description of FIG. 5 again.
When the process of step S501 ends, the process proceeds to step S502.
In step S502, the clustering unit 3272 of the workflow extraction unit 327 performs clustering of files using the similarity between files (between items) in a plurality of files obtained by the calculation processing in step S501. I do. That is, here, a process of clustering a plurality of files (a plurality of items) stored in the database 323 into a plurality of file clusters (a plurality of item clusters) is performed. Here, the clustering method is roughly classified into two types, a hierarchical type and a non-hierarchical type. Here, a hierarchical clustering method that does not require a predetermined number of clusters is used. Typical techniques for hierarchical clustering include the shortest distance method, the longest distance method, the group average method, and the Ward method. Any method may be used in this embodiment. In this step, as a result, a group of files that are used in a similar manner is output as a file cluster. Note that a file cluster is a group of one or more files, and even a file having no similar file is a single file cluster.

図１０は、本発明の実施形態を示し、図８に示すファイル操作情報におけるファイルを階層型クラスタリングした一例を示す図である。図１０に示す例では、ＦＣ１はファイル１及びファイル６が属するファイルクラスタ、ＦＣ４はファイル４のみが属するファイルクラスタ、ＦＣ５はファイル５のみが属するファイルクラスタ、ＦＣ２はファイル２及びファイル８が属するファイルクラスタ、ＦＣ３はファイル３及びファイル７が属するファイルクラスタである。 FIG. 10 is a diagram illustrating an example of hierarchical clustering of files in the file operation information illustrated in FIG. 8 according to the embodiment of this invention. In the example shown in FIG. 10, FC1 is a file cluster to which file 1 and file 6 belong, FC4 is a file cluster to which only file 4 belongs, FC5 is a file cluster to which only file 5 belongs, FC2 is a file cluster to which file 2 and file 8 belong , FC3 is a file cluster to which file 3 and file 7 belong.

続いて、ステップＳ５０３において、ワークフロー抽出部３２７の中心クラスタ抽出部３２７３は、ステップＳ５０２で得られた複数のファイルクラスタの中から、ワークフローの中心となる中心ファイルクラスタ（中心アイテムクラスタ）を抽出する処理を行う。ここで、ワークフローの中心となるファイルとは、ワークフローの中で複数のユーザに操作されるファイルや、そのワークフローの最終成果物となるようなファイルであり、そのようなファイルを多く含むファイルクラスタを、中心ファイルクラスタとして抽出する。例えば、中心クラスタ抽出部３２７３は、各ファイルの利用情報（各ファイルを利用するユーザ数、及び、各ファイルの利用方法（例えば上述したワークフローの最終成果物として利用する等）のうちの少なくとも１つの情報を含む）に基づいて、中心ファイルクラスタを抽出する処理を行う。 Subsequently, in step S503, the central cluster extraction unit 3273 of the workflow extraction unit 327 extracts a central file cluster (central item cluster) that is the center of the workflow from the plurality of file clusters obtained in step S502. I do. Here, the file that is the center of the workflow is a file that is operated by multiple users in the workflow, or a file that is the final product of the workflow, and a file cluster that contains many such files. Extract as a central file cluster. For example, the central cluster extraction unit 3273 uses at least one of the usage information of each file (the number of users who use each file and the usage method of each file (for example, use as the final product of the workflow described above)). A central file cluster is extracted on the basis of (including information).

ここでは、各ファイルを利用するユーザ数に基づいて、中心ファイルクラスタを抽出する場合について説明を行う。
この場合、まず、ファイル毎に、編集などのファイル操作を行ったユーザ数を抽出し、ファイルクラスタ単位で、その中に含まれるファイルの前記ユーザ数を平均する。そして、その平均値が規定の値以上のファイルクラスタをワークフローの中心となる中心ファイルクラスタとして抽出する。例えば、図８に示す例では、ＦＣ１は、その中に含まれるファイル１とファイル６のどちらも４人のユーザから操作されており、そのユーザ数の平均は４人である。例えば、前記既定の値を３人とすると、ＦＣ１は、ワークフローの中心となる中心ファイルクラスタとして抽出されることになる。この場合、図８に示す例では、ＦＣ１のみが中心ファイルクラスタとして抽出されることになるが、本実施形態においてはこれに限定されるものではない。本実施形態においては、前記規定の値以上の全てのファイルクラスタが中心ファイルクラスタとしての抽出対象である。 Here, a case where a central file cluster is extracted based on the number of users using each file will be described.
In this case, first, the number of users who have performed file operations such as editing is extracted for each file, and the number of users of the files included therein is averaged for each file cluster. Then, a file cluster whose average value is equal to or greater than a prescribed value is extracted as a central file cluster that is the center of the workflow. For example, in the example shown in FIG. 8, FC1 is operated by four users in both file 1 and file 6 included therein, and the average number of users is four. For example, if the predetermined value is three, FC1 is extracted as a central file cluster that is the center of the workflow. In this case, in the example shown in FIG. 8, only FC1 is extracted as the central file cluster, but the present embodiment is not limited to this. In the present embodiment, all file clusters that are equal to or greater than the specified value are extraction targets as the central file cluster.

ステップＳ５０３の処理が終了すると、図５に示すフローチャートにおける処理が終了する。 When the process of step S503 ends, the process in the flowchart shown in FIG. 5 ends.

次いで、図５のステップＳ５０３で抽出した中心ファイルクラスタ毎に、図６に示すフローチャートによりワークフロー（頻出パターン）を抽出する。 Next, a workflow (frequent pattern) is extracted by the flowchart shown in FIG. 6 for each central file cluster extracted in step S503 in FIG.

まず、ステップＳ６０１において、ワークフロー抽出部３２７の共起確率計算部３２７４は、中心ファイルクラスタ（中心アイテムクラスタ）と、その他のファイルクラスタ（その他のアイテムクラスタ）との共起確率を計算する処理を行う。通常、ＡとＢの共起確率とは、Ａ∩Ｂ／Ａ∪Ｂであるが、ここでは、Ａを中心ファイルクラスタに固定し、Ｂをその他のファイルクラスタとし、Ａ∩Ｂ／Ｂを、中心ファイルクラスタとその他のファイルクラスタとの共起確率とする。２つのファイルクラスタが共起したかどうかの条件は、いろいろと考えられる。例えば、ファイルクラスタの操作が行われた時刻の前後２時間を、そのファイルクラスタへの操作時間とし、その操作時間に重なりがある場合に２つのファイルクラスタは共起しているとしてもよい。他には、固定時間、例えば３時間毎にファイル操作履歴を区切ることでセッションを作成し、そのセッションの中に含まれているファイルクラスタ同士は共起をしているとしてもよい。ここで例として挙げた２時間や３時間はパラメータであり、任意に決めることができる。 First, in step S601, the co-occurrence probability calculation unit 3274 of the workflow extraction unit 327 performs a process of calculating the co-occurrence probability between the central file cluster (central item cluster) and other file clusters (other item clusters). . Usually, the co-occurrence probability of A and B is A∩B / A∪B, but here A is fixed to the central file cluster, B is the other file cluster, and A∩B / B is Let the co-occurrence probability of the central file cluster and other file clusters. There are various conditions regarding whether or not two file clusters co-occur. For example, two hours before and after the time when the file cluster operation is performed may be set as the operation time for the file cluster, and the two file clusters may co-occur when there is an overlap in the operation time. Alternatively, a session may be created by dividing a file operation history every fixed time, for example, every 3 hours, and file clusters included in the session may co-occur. Here, 2 hours and 3 hours given as examples are parameters and can be arbitrarily determined.

ここでは、図８に示すファイル操作情報を用いて、中心ファイルクラスタであるＦＣ１と、当該中心ファイルクラスタを除くその他のファイルクラスタとの共起確率の計算例について説明する。
図１２は、本発明の実施形態を示し、図８に示すファイル操作情報について、ファイルクラスタ別にユーザのファイル操作を時系列にマッピングした一例を示す図である。
イベント１２０１やイベント１２０２は、ユーザＡが行った元々別のファイルへの操作であるが、ファイルを抽象化することで、同軸上のイベントとして考えることができる。このようにすることで、同じ作業としてのファイルの共起性の発見し易さが向上する。ここでは、ファイルクラスタの共起を、操作時刻から前後２時間の時間帯で重なりがあることとすると、ＦＣ２ではイベント１２１１及び１２１２の２つの操作があり、２つともＦＣ１の操作と共起しているので、共起確率は２／２＝１．０となる。同様にして、ＦＣ５では、イベント１２２１、１２２２、１２２３及び１２２４の４つの操作があり、このうちのイベント１２２１及び１２２４の２つだけがＦＣ１と共起しているので、共起確率は２／４＝０．５となる。同様に、ＦＣ３及びＦＣ４も、それぞれ、０．５及び０．３と共起確率が計算される。 Here, a calculation example of the co-occurrence probability between the central file cluster FC1 and other file clusters excluding the central file cluster will be described using the file operation information shown in FIG.
FIG. 12 shows an embodiment of the present invention, and is a diagram showing an example in which user file operations are mapped in time series for each file cluster in the file operation information shown in FIG.
The event 1201 and the event 1202 are operations on another file originally performed by the user A, but can be considered as coaxial events by abstracting the file. By doing so, the ease of finding the co-occurrence of the file as the same work is improved. Here, if the file cluster co-occurrence is overlapped in the time zone of 2 hours before and after the operation time, there are two operations of events 1211 and 1212 in FC2, and both co-occur with the operation of FC1. Therefore, the co-occurrence probability is 2/2 = 1.0. Similarly, in FC5, there are four operations of events 1221, 1222, 1223, and 1224, and only two of events 1221 and 1224 co-occur with FC1, so the co-occurrence probability is 2/4. = 0.5. Similarly, the co-occurrence probabilities for FC3 and FC4 are calculated as 0.5 and 0.3, respectively.

ここで、再び、図６の説明に戻る。
ステップＳ６０１の処理が終了すると、ステップＳ６０２に進む。
ステップＳ６０２に進むと、ワークフロー抽出部３２７の従属クラスタ特定部３２７５は、ステップＳ６０１による計算処理により得られた共起確率を用いて、中心ファイルクラスタ（中心アイテムクラスタ）と同じワークフローに属する従属ファイルクラスタ（従属アイテムクラスタ）を特定する処理を行う。例えば、単純に所定の値以上の共起確率を持つファイルクラスタとすればよい。図８に示すファイル操作情報の例において、前記所定の値を０．７（７割ぐらいの確率で一緒に扱われる）とすると、ＦＣ２が、中心ファイルクラスタであるＦＣ１と同じワークフローに属する従属ファイルクラスタとして特定される。図１２において、一見すると、ＦＣ５もＦＣ１と一緒に扱われやすいように見えるが、ＦＣ５は全体的に現れるため、特別、ＦＣ１と一緒に扱われやすいわけではない。 Here, it returns to description of FIG. 6 again.
When the process of step S601 ends, the process proceeds to step S602.
In step S602, the dependent cluster specifying unit 3275 of the workflow extracting unit 327 uses the co-occurrence probability obtained by the calculation processing in step S601 to determine the dependent file cluster belonging to the same workflow as the central file cluster (central item cluster). A process of specifying (subordinate item cluster) is performed. For example, a file cluster having a co-occurrence probability equal to or higher than a predetermined value may be simply used. In the example of the file operation information shown in FIG. 8, if the predetermined value is 0.7 (they are handled together with a probability of about 70%), the dependent file belonging to the same workflow as the central file cluster FC1 is FC2. Identified as a cluster. In FIG. 12, at first glance, FC5 seems to be easily handled together with FC1, but since FC5 appears as a whole, it is not specially easy to handle with FC1.

続いて、ステップＳ６０３において、ワークフロー抽出部３２７のシーケンス抽出部３２７６は、ワークフローの候補となる、アイテムクラスタの操作シーケンスであるファイルクラスタ操作シーケンスの集合を抽出する処理を行う。ここで、ワークフローの候補となるファイルクラスタ操作シーケンスは、ワークフローの中心となる中心フィルクラスタに含まれるファイル毎に抽出されるものである。そして、本ステップでは、そのファイルを中心とした作業をファイル操作シーケンスとして抽出した後、ファイルをファイルクラスタへ置き換えることで、ファイルクラスタ操作シーケンスとする。 Subsequently, in step S603, the sequence extraction unit 3276 of the workflow extraction unit 327 performs a process of extracting a set of file cluster operation sequences, which are item cluster operation sequences, as workflow candidates. Here, the file cluster operation sequence that is a candidate for the workflow is extracted for each file included in the central fill cluster that is the center of the workflow. In this step, the operation centered on the file is extracted as a file operation sequence, and then the file is replaced with a file cluster to obtain a file cluster operation sequence.

ここで、図８に示すファイル操作情報の例を用いて、具体的な処理について説明する。
まず、ワークフローの中心となる中心ファイルクラスタであるＦＣ１と、そのワークフローに属する従属ファイルクラスタであるＦＣ２とに含まれるファイルの操作履歴を取りだす。 Here, specific processing will be described using the example of the file operation information shown in FIG.
First, the operation history of files included in FC1 that is the central file cluster that is the center of the workflow and FC2 that is the dependent file cluster belonging to the workflow is taken out.

図１３は、本発明の実施形態を示し、図８に示すファイル操作情報について、ＦＣ１及びＦＣ２に属するファイル別にユーザのファイル操作を時系列にマッピングした一例を示す図である。
図１３において、ファイル１とファイル６がＦＣ１に属するものであり、ファイル２とファイル８がＦＣ２に属するものである。そして、中心ファイルクラスタであるＦＣ１に属するファイル毎に、そのファイルと共起関係にあるファイルを含めたファイル操作シーケンスを抽出する。ここで、ファイルの共起とは、ファイルクラスタの共起と同様に操作時刻から前後２時間の時間帯で重なりがあることとする。もちろん、２時間は任意のパラメータであり、また、共起の定義はこれに限らない。 FIG. 13 shows an embodiment of the present invention, and shows an example in which the file operations of the user are mapped in time series for the files belonging to FC1 and FC2 for the file operation information shown in FIG.
In FIG. 13, file 1 and file 6 belong to FC1, and file 2 and file 8 belong to FC2. Then, for each file belonging to the central file cluster FC1, a file operation sequence including a file co-occurring with the file is extracted. Here, it is assumed that the file co-occurrence is overlapped in the time zone of two hours before and after the operation time, similarly to the file cluster co-occurrence. Of course, 2 hours is an arbitrary parameter, and the definition of co-occurrence is not limited to this.

図１４は、本発明の実施形態を示し、ファイル操作シーケンス及びファイルクラスタ操作シーケンスの一例を示す図である。
ここで、図１４（ａ）は、上述した抽出処理により抽出されたファイル操作シーケンスの一例である。図１４（ａ）において、シーケンス１はファイル１との共起関係に応じて抽出され、シーケンス２はファイル６との共起関係に応じて抽出されたものである。次いで、抽出したファイル操作シーケンスのファイルを再びファイルクラスタへとファイルの抽象化を行うことで、図１４（ｂ）に示すファイルクラスタ操作シーケンスを得る。このファイルクラスタ操作シーケンスの１つ１つが、ある１つの作業の流れを表しており、目的が類似する作業の流れを集めることで、それら作業の典型的なパターンであるワークフローを抽出できる。 FIG. 14 is a diagram illustrating an example of a file operation sequence and a file cluster operation sequence according to the embodiment of this invention.
Here, FIG. 14A is an example of a file operation sequence extracted by the extraction process described above. In FIG. 14A, the sequence 1 is extracted according to the co-occurrence relationship with the file 1, and the sequence 2 is extracted according to the co-occurrence relationship with the file 6. Next, by abstracting the extracted file operation sequence file into a file cluster again, the file cluster operation sequence shown in FIG. 14B is obtained. Each of the file cluster operation sequences represents a certain work flow. By collecting work flows having similar purposes, a workflow that is a typical pattern of the work can be extracted.

ここで、再び、図６の説明に戻る。
ステップＳ６０３の処理が終了すると、ステップＳ６０４に進む。
ステップＳ６０４に進むと、ワークフロー抽出部３２７の頻出パターン抽出部３２７７は、ステップＳ６０４の抽出処理により抽出されたファイルクラスタ操作シーケンスの集合から、ワークフローを抽出する処理を行う（頻出パターンを抽出する処理を行う）。 Here, it returns to description of FIG. 6 again.
When the process of step S603 ends, the process proceeds to step S604.
In step S604, the frequent pattern extraction unit 3277 of the workflow extraction unit 327 performs processing for extracting a workflow from the set of file cluster operation sequences extracted by the extraction processing in step S604 (processing for extracting frequent patterns). Do).

以下に、ステップＳ６０４の処理の詳細について説明する。
ここまでの処理により、抽出されたファイルクラスタ操作シーケンスの集合は、目的が類似する作業の流れの集合となっている。そして、ここで抽出するワークフローは、抽出されたファイルクラスタ操作シーケンスの集合を入力とし、頻出する「ＣｌｏｓｅｄＰａｒｔｉａｌＯｒｄｅｒｓ」として抽出する。「ＣｌｏｓｅｄＰａｒｔｉａｌＯｒｄｅｒｓ」とは、系列データの集合から、シーケンシャルパターンマイニングと呼ばれる手法により抽出された頻出する部分系列データ集合を要約する形で得られるものである。 Details of the process in step S604 will be described below.
The set of file cluster operation sequences extracted by the processing so far is a set of work flows having similar purposes. The workflow to be extracted here is extracted as “Closed Partial Orders” that frequently appear as a set of the extracted file cluster operation sequences. “Closed Partial Orders” is obtained by summarizing frequent partial sequence data sets extracted from a set of sequence data by a technique called sequential pattern mining.

シーケンシャルパターンマイニングは、以下のように定義される処理である。
Ｉ＝｛ｉ₁，ｉ₂，…，ｉ_n｝を、アイテム集合とする。集合Ｉの空でない部分集合をエレメントと言う。また、ある閾値ξ＞０が与えられたとき、集合Ｉにおいてξ回以上現れるアイテムを頻出アイテムと言う。エレメントの順序列をシーケンスと言う。さらに、シーケンスα＝（ａ₁，ａ₂，…，ａ_n）とシーケンスβ＝（ｂ₁，ｂ₂，…，ｂ_n）に対して、ａ₁⊆ｂ_j1，ａ₂⊆ｂ_j2，…，ａ_n⊆ｂ_jnとなる整数１＜ｊ１＜ｊ２＜…＜ｊｎ＜ｍがあるとき、αをβのサブシーケンスと言い、α⊆βと表記する。シーケンスｉｄのｓｉｄとシーケンスｓのタプル（ｓｉｄ，ｓ）の集合であるＳ＝｛（ｓｉｄ₁，ｓ₁），（ｓｉｄ₂，ｓ₂），…，（ｓｉｄ_n，ｓ_n）｝をシーケンスデータベースと呼ぶ。さらに、系列αの系列データベースＳにおけるサポートとは、Ｓ中の全ての系列のうち、系列αを含むタプルの数と定義される。閾値ξ（最小サポート値と呼ぶ）以上の個数の（ｓｉｄ，ｓ）に含まれているシーケンスをシーケンシャルデータベースにおけるシーケンシャルパターンと言う。シーケンシャルパターンマイニングとは、シーケンスデータベースＳと最小サポート値ξが与えられたときに、Ｓにおけるシーケンシャルパターンを全て見つけることである。代表的なシーケンシャルパターマイニングの手法としては、上記の非特許文献１に示すＡｐｒｉｏｒｉアルゴリズムや、上記の非特許文献２に示すＰｒｅｆｉｘＳｐａｎなどがある。 Sequential pattern mining is a process defined as follows.
Let I = {i ₁ , i ₂ ,..., I _n } be an item set. A non-empty subset of set I is called an element. An item that appears more than ξ times in the set I when a certain threshold value ξ> 0 is given is called a frequent item. The sequence of elements is called a sequence. Further, for the sequence α = (a ₁ , a ₂ ,..., _An ) and the sequence β = (b ₁ , b ₂ ,..., B _n ), a ₁ ⊆b _j1 , a ₂ ⊆b _j2 ,. , A _n ⊆b _jn , when there is an integer 1 <j1 <j2 <... <Jn <m, α is called a subsequence of β and is expressed as α⊆β. S = {(sid ₁ , s ₁ ), (sid ₂ , s ₂ ),..., (Sid _n , s _n )}, which is a set of sequence id sid and tuple (sid, s) of sequence s Call it. Further, the support in the sequence database S of the sequence α is defined as the number of tuples including the sequence α among all the sequences in S. A sequence included in (sid, s) more than a threshold ξ (referred to as a minimum support value) is called a sequential pattern in a sequential database. Sequential pattern mining is to find all sequential patterns in S given the sequence database S and the minimum support value ξ. Typical sequential pattern mining techniques include the Priori algorithm shown in Non-Patent Document 1 and the PrefixSpan shown in Non-Patent Document 2.

図１５は、本発明の実施形態を示し、シーケンシャルパターマイニングを説明するための図である。
例えば、図１５（ａ）に示すようなシーケンスデータベースが与えられたとき、シーケンシャルパターンマイニングを適用することにより、図１５（ｂ）に示すように４つのシーケンシャルパターンが抽出される。しかしながら、これはシーケンスデータベースから読み取れる本来のパターンが断片化されたものとなっている。シーケンスデータベースから読み取れる本来のパターンとは、図１５（ｃ）に示すようなもので、アイテムＡの後には、アイテムＢとアイテムＣが現れ、その次にアイテムＤが現れ、最後にアイテムＥとアイテムＦが現れるというものである。この図１５（ｃ）に示すようなパターンは、「ＣｌｏｓｅｄＰａｒｔｉａｌＯｒｄｅｒｓ」と呼ばれ、この「ＣｌｏｓｅｄＰａｒｔｉａｌＯｒｄｅｒｓ」を抽出する方法は、既にいくつか提案されている。例えば、上記の非特許文献３に示す方法などがある。 FIG. 15 shows an embodiment of the present invention and is a diagram for explaining sequential pattern mining.
For example, when a sequence database as shown in FIG. 15A is given, four sequential patterns are extracted as shown in FIG. 15B by applying sequential pattern mining. However, this is a fragmented original pattern that can be read from the sequence database. The original pattern that can be read from the sequence database is as shown in FIG. 15C. Item B is followed by item C after item A, followed by item D, and finally item E and item. F appears. The pattern as shown in FIG. 15C is called “Closed Partial Orders”, and several methods for extracting “Closed Partial Orders” have already been proposed. For example, there is a method shown in Non-Patent Document 3 above.

図１６は、本発明の実施形態を示し、図６のステップＳ６０４において抽出されるワークフローの一例を示す図である。具体的に、図１６は、図６のステップＳ６０３で抽出された図１４（ｂ）に示すファイルクラスタ操作シーケンスの集合から、抽出される「ＣｌｏｓｅｄＰａｒｔｉａｌＯｒｄｅｒｓ」を示している。この「ＣｌｏｓｅｄＰａｒｔｉａｌＯｒｄｅｒｓ」は、目的が類似する作業の流れの集合から抽出された典型的な操作パターンであるワークフローとなる。図１６には、ノード１６０１〜ノード１６０５が示されている。そして、ワークフロー抽出部３２７の処理により、図８に示すファイル操作情報から、図７に示すワークフローが最終的に抽出される。 FIG. 16 is a diagram illustrating an example of the workflow extracted in step S604 in FIG. 6 according to the embodiment of this invention. Specifically, FIG. 16 shows “Closed Partial Orders” extracted from the set of file cluster operation sequences shown in FIG. 14B extracted in step S603 of FIG. This “Closed Partial Orders” is a workflow that is a typical operation pattern extracted from a set of workflows with similar purposes. FIG. 16 shows nodes 1601 to 1605. Then, the workflow shown in FIG. 7 is finally extracted from the file operation information shown in FIG. 8 by the processing of the workflow extraction unit 327.

ステップＳ６０４の処理が終了すると、図６に示すフローチャートにおける処理が終了する。 When the process of step S604 ends, the process in the flowchart shown in FIG. 6 ends.

本発明の実施形態では、中心ファイルクラスタに属するファイルと従属ファイルクラスタに属するファイルにおける操作履歴に基づいて操作シーケンスの集合を抽出し、当該操作シーケンスの集合に基づいてワークフローとなる頻出パターンを抽出している。
かかる構成によれば、複数のユーザが協調して複数のファイルを操作して作業を行う場合において、ユーザが独立して並行に作業を行うようなパターンも含めて、より汎用的なワークフローの抽出を実現することができる。これにより、このワークフローを用いて、ユーザのアイテム操作をナビゲートするなど、作業の効率を向上させることが可能となる。例えば、ワークフローを可視化して業務の見直しに役立てたり、ワークフローシステム構築の参考にしたり、ユーザのファイル操作をナビゲートするファイル推薦に用いたりするなど、幅広く利用可能である。 In the embodiment of the present invention, a set of operation sequences is extracted based on operation histories in a file belonging to the central file cluster and a file belonging to the dependent file cluster, and a frequent pattern serving as a workflow is extracted based on the set of operation sequences. ing.
According to such a configuration, when a plurality of users collaborate and operate a plurality of files to perform a work, a more general workflow extraction including a pattern in which the users work independently in parallel is performed. Can be realized. This makes it possible to improve work efficiency, such as navigating user item operations, using this workflow. For example, it can be widely used, such as visualizing a workflow for use in reviewing work, as a reference for building a workflow system, and for recommending files for navigating user file operations.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。
即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。
このプログラム及び当該プログラムを記憶したコンピュータ読み取り可能な記録媒体は、本発明に含まれる。 (Other embodiments)
The present invention can also be realized by executing the following processing.
That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, etc.) of the system or apparatus reads the program. It is a process to be executed.
This program and a computer-readable recording medium storing the program are included in the present invention.

なお、上述した本発明の実施形態は、いずれも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。即ち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 Note that the above-described embodiments of the present invention are merely examples of implementation in practicing the present invention, and the technical scope of the present invention should not be construed as being limited thereto. It is. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

３１０ユーザ端末、３２０ファイル管理システム、３２１操作取得部、３２２ファイル管理部、３２３データベース、３２４操作履歴管理部、３２５操作履歴データベース、３２６情報送信部、３２７ワークフロー抽出部、３２７１類似度計算部、３２７２クラスタリング部、３２７３中心クラスタ抽出部、３２７４共起確率計算部、３２７５従属クラスタ特定部、３２７６シーケンス抽出部、３２７７頻出パターン抽出部 310 user terminal, 320 file management system, 321 operation acquisition unit, 322 file management unit, 323 database, 324 operation history management unit, 325 operation history database, 326 information transmission unit, 327 workflow extraction unit, 3271 similarity calculation unit, 3272 Clustering unit, 3273 Central cluster extracting unit, 3274 Co-occurrence probability calculating unit, 3275 Dependent cluster specifying unit, 3276 Sequence extracting unit, 3277 Frequent pattern extracting unit

Claims

複数のアイテムにおける各アイテムに対する各ユーザの操作履歴から時系列の頻出パターンを抽出する頻出パターン抽出装置であって、
前記複数のアイテムにおける各アイテム間の類似度に基づいて、前記複数のアイテムを複数のアイテムクラスタにクラスタリングするクラスタリング手段と、
前記複数のアイテムクラスタの中から、前記頻出パターンの中心となる中心アイテムクラスタを抽出する中心クラスタ抽出手段と、
前記複数のアイテムクラスタの中から、前記中心アイテムクラスタと同じ前記頻出パターンに含まれる従属アイテムクラスタを特定する従属クラスタ特定手段と、
前記中心アイテムクラスタに属するアイテムと前記従属アイテムクラスタに属するアイテムにおける操作履歴に基づいて、前記各ユーザにおける前記中心アイテムクラスタおよび前記従属アイテムクラスタの操作シーケンスの集合を抽出するシーケンス抽出手段と、
前記シーケンス抽出手段で抽出した操作シーケンスの集合に基づいて、前記頻出パターンを抽出する頻出パターン抽出手段と
を有することを特徴とする頻出パターン抽出装置。 A frequent pattern extraction device that extracts a time-series frequent pattern from each user's operation history for each item in a plurality of items,
Clustering means for clustering the plurality of items into a plurality of item clusters based on the similarity between the items in the plurality of items;
A central cluster extraction means for extracting a central item cluster that is the center of the frequent pattern from the plurality of item clusters;
A subordinate cluster specifying means for specifying a subordinate item cluster included in the same frequent pattern as the central item cluster from the plurality of item clusters;
Sequence extraction means for extracting a set of operation sequences of the central item cluster and the dependent item cluster for each user based on an operation history of an item belonging to the central item cluster and an item belonging to the dependent item cluster;
A frequent pattern extraction device comprising: a frequent pattern extraction unit that extracts the frequent pattern based on a set of operation sequences extracted by the sequence extraction unit.

前記中心クラスタ抽出手段は、前記複数のアイテムにおける各アイテムの利用情報に基づいて、前記複数のアイテムクラスタの中から前記中心アイテムクラスタを抽出することを特徴とする請求項１に記載の頻出パターン抽出装置。 The frequent pattern extraction according to claim 1, wherein the central cluster extraction unit extracts the central item cluster from the plurality of item clusters based on usage information of each item in the plurality of items. apparatus.

前記利用情報は、前記各アイテムを利用するユーザ数および前記各アイテムの利用方法のうちの少なくとも１つの情報を含むことを特徴とする請求項２に記載の頻出パターン抽出装置。 3. The frequent pattern extraction apparatus according to claim 2, wherein the usage information includes at least one information of a number of users who use each item and a usage method of each item.

前記中心アイテムクラスタと、前記複数のアイテムクラスタのうちの前記中心アイテムクラスタを除くその他のアイテムクラスタとの共起確率を計算する共起確率計算手段を更に有し、
前記従属クラスタ特定手段は、前記共起確率計算手段で計算した共起確率に基づいて、前記従属アイテムクラスタを特定することを特徴とする請求項１乃至３のいずれか１項に記載の頻出パターン抽出装置。 A co-occurrence probability calculating means for calculating a co-occurrence probability of the central item cluster and other item clusters excluding the central item cluster of the plurality of item clusters;
The frequent pattern according to any one of claims 1 to 3, wherein the subordinate cluster specifying unit specifies the subordinate item cluster based on the co-occurrence probability calculated by the co-occurrence probability calculating unit. Extraction device.

前記シーケンス抽出手段は、前記中心アイテムクラスタに属するアイテムと前記従属アイテムクラスタに属するアイテムとの共起関係を利用して、前記操作シーケンスの集合を抽出することを特徴とする請求項１乃至４のいずれか１項に記載の頻出パターン抽出装置。 5. The set of operation sequences according to claim 1, wherein the sequence extraction unit extracts a set of the operation sequences using a co-occurrence relationship between an item belonging to the central item cluster and an item belonging to the subordinate item cluster. The frequent pattern extraction apparatus of any one of Claims.

前記頻出パターン抽出手段は、前記シーケンス抽出手段で抽出した操作シーケンスの集合から、シーケンシャルパターンマイニングにより抽出された頻出する部分系列データ集合を要約する形で得られたパターンを前記頻出パターンとして抽出することを特徴とする請求項１乃至５のいずれか１項に記載の頻出パターン抽出装置。 The frequent pattern extraction unit extracts, as the frequent pattern, a pattern obtained by summarizing a frequent partial series data set extracted by sequential pattern mining from the set of operation sequences extracted by the sequence extraction unit. The frequent pattern extraction device according to any one of claims 1 to 5, wherein:

複数のアイテムにおける各アイテムに対する各ユーザの操作履歴から時系列の頻出パターンを抽出する頻出パターン抽出装置による頻出パターン抽出方法であって、
前記複数のアイテムにおける各アイテム間の類似度に基づいて、前記複数のアイテムを複数のアイテムクラスタにクラスタリングするクラスタリングステップと、
前記複数のアイテムクラスタの中から、前記頻出パターンの中心となる中心アイテムクラスタを抽出する中心クラスタ抽出ステップと、
前記複数のアイテムクラスタの中から、前記中心アイテムクラスタと同じ前記頻出パターンに含まれる従属アイテムクラスタを特定する従属クラスタ特定ステップと、
前記中心アイテムクラスタに属するアイテムと前記従属アイテムクラスタに属するアイテムにおける操作履歴に基づいて、前記各ユーザにおける前記中心アイテムクラスタおよび前記従属アイテムクラスタの操作シーケンスの集合を抽出するシーケンス抽出ステップと、
前記シーケンス抽出ステップで抽出した操作シーケンスの集合に基づいて、前記頻出パターンを抽出する頻出パターン抽出ステップと
を有することを特徴とする頻出パターン抽出方法。 A frequent pattern extraction method by a frequent pattern extraction device that extracts a time-series frequent pattern from each user's operation history for each item in a plurality of items,
A clustering step of clustering the plurality of items into a plurality of item clusters based on the similarity between the items in the plurality of items;
A center cluster extraction step of extracting a center item cluster that is the center of the frequent pattern from the plurality of item clusters;
A subordinate cluster specifying step of specifying a subordinate item cluster included in the same frequent pattern as the central item cluster from the plurality of item clusters;
A sequence extraction step of extracting a set of operation sequences of the central item cluster and the dependent item cluster in each user based on an operation history of an item belonging to the central item cluster and an item belonging to the dependent item cluster;
A frequent pattern extraction method comprising: a frequent pattern extraction step for extracting the frequent pattern based on a set of operation sequences extracted in the sequence extraction step.

複数のアイテムにおける各アイテムに対する各ユーザの操作履歴から時系列の頻出パターンを抽出する頻出パターン抽出装置による頻出パターン抽出方法をコンピュータに実行させるためのプログラムであって、
前記複数のアイテムにおける各アイテム間の類似度に基づいて、前記複数のアイテムを複数のアイテムクラスタにクラスタリングするクラスタリングステップと、
前記複数のアイテムクラスタの中から、前記頻出パターンの中心となる中心アイテムクラスタを抽出する中心クラスタ抽出ステップと、
前記複数のアイテムクラスタの中から、前記中心アイテムクラスタと同じ前記頻出パターンに含まれる従属アイテムクラスタを特定する従属クラスタ特定ステップと、
前記中心アイテムクラスタに属するアイテムと前記従属アイテムクラスタに属するアイテムにおける操作履歴に基づいて、前記各ユーザにおける前記中心アイテムクラスタおよび前記従属アイテムクラスタの操作シーケンスの集合を抽出するシーケンス抽出ステップと、
前記シーケンス抽出ステップで抽出した操作シーケンスの集合に基づいて、前記頻出パターンを抽出する頻出パターン抽出ステップと
をコンピュータに実行させるためのプログラム。 A program for causing a computer to execute a frequent pattern extraction method by a frequent pattern extraction device that extracts a time-series frequent pattern from an operation history of each user for each item in a plurality of items,
A clustering step of clustering the plurality of items into a plurality of item clusters based on the similarity between the items in the plurality of items;
A center cluster extraction step of extracting a center item cluster that is the center of the frequent pattern from the plurality of item clusters;
A subordinate cluster specifying step of specifying a subordinate item cluster included in the same frequent pattern as the central item cluster from the plurality of item clusters;
A sequence extraction step of extracting a set of operation sequences of the central item cluster and the dependent item cluster in each user based on an operation history of an item belonging to the central item cluster and an item belonging to the dependent item cluster;
A program for causing a computer to execute a frequent pattern extraction step of extracting the frequent pattern based on a set of operation sequences extracted in the sequence extraction step.