WO2009107412A1

WO2009107412A1 - Graph structure estimation apparatus, graph structure estimation method, and program

Info

Publication number: WO2009107412A1
Application number: PCT/JP2009/050358
Authority: WO
Inventors: 遼平藤巻; 健司山西
Original assignee: 日本電気株式会社
Priority date: 2008-02-27
Filing date: 2009-01-14
Publication date: 2009-09-03

Abstract

Provided are a graph structure estimation apparatus, a graph structure estimation method, and a program by which nodes can be classified according to the activity degree, that is the importance degree, of the nodes. The graph structure estimation apparatus (100) comprises a calculation means (110) which when receiving graph data expressed by the nodes and links showing the degrees of relations between the respective nodes, calculates information on the importance degree showing the importance degree of each node according to the graph data, a division means (120) which divides respective nodes into any one of a plurality of groups according to the information on the importance degree of the node, and an output means (130) which outputs the result of group-dividing of the nodes as information on the graph structure of the graph data.

Description

グラフ構造推定装置、グラフ構造推定方法およびプログラムGraph structure estimation apparatus, graph structure estimation method, and program

　本発明は、グラフ構造推定装置、グラフ構造推定方法およびプログラムに関し、例えば、点（ノード）と辺（リンク）によって構成されるグラフの構造を、各ノードの重要度に基づいて特徴づけるグラフ構造推定装置、グラフ構造推定方法およびプログラムに関する。 The present invention relates to a graph structure estimation device, a graph structure estimation method, and a program. For example, a graph structure estimation that characterizes the structure of a graph composed of points (nodes) and edges (links) based on the importance of each node. The present invention relates to a device, a graph structure estimation method, and a program.

　グラフデータとは、ノード集合、および、ノード間の関係を表現するリンク、によって表されるデータである。 Graph data is data represented by a node set and a link expressing a relationship between nodes.

　グラフデータとしては、例えば、Ｗｅｂ（ウェブ）ページとハイパーリンクとを表すデータ、ＳＮＳ（ソーシャルネットワーキングサイト）のユーザーと友人関係とを表すデータ、ネットワーク機器とトラフィックとを表すデータ、タンパク質構造における塩基とその間のインタラクション（相互作用）とを表すデータなどが挙げられる。 Graph data includes, for example, data representing Web pages and hyperlinks, data representing SNS (social networking site) users and friendships, data representing network devices and traffic, and bases in protein structures. Data representing the interaction (interaction) between them can be mentioned.

　上記の例では、Ｗｅｂページとユーザーとネットワーク機器と塩基のそれぞれが「ノード集合」に該当し、ハイパーリンクと友人関係とトラフィックと塩基間のインタラクションのそれぞれが「リンク」に該当する。 In the above example, each of the Web page, the user, the network device, and the base corresponds to a “node set”, and each of the hyperlink, friendship, traffic, and base interaction corresponds to a “link”.

　近年の研究から、グラフデータにて構成される多くのグラフ（ネットワーク）の構造には、スケールフリー性と呼ばれる性質が存在する事が確認されている（非特許文献１参照）。 From recent research, it has been confirmed that many graphs (networks) composed of graph data have a property called scale-free (see Non-Patent Document 1).

　スケールフリー性を持つグラフの性質の１つとして、大多数のノードの重要度は低いが、グラフ中で「ハブ」として機能する重要度の高いノードが、少数（低い確率で）存在するという性質がある。これは、ノードに対する重要度に関する確率分布が裾の長い分布をしていると言い換える事が可能である（分布の裾はつまり重要度が高いノードに相当し、裾が長いため重要度の高いノードも低い確率で存在しうる）。 One of the characteristics of a graph with scale-free property is that the majority of nodes are low in importance, but there are a small number (high probability) of high importance nodes that function as “hubs” in the graph. There is. This can be rephrased as the probability distribution related to the importance for the node has a long tail (the bottom of the distribution is equivalent to the node with high importance, and the node with high importance because the tail is long) Can exist with a low probability).

　このような性質は、ノードの持つリンク数やページランク（非特許文献７）などさまざまなノードの重要度に関して観測されている。例えば、スケールフリー性を持つグラフにおいて、ノードの持つリンク数は冪分布に従うという性質があり（図１参照）、ページランクは冪分布あるいは対数正規分布に従う性質がある。 This property has been observed with regard to the importance of various nodes such as the number of links and page rank (Non-Patent Document 7) of the nodes. For example, in a graph having a scale-free property, the number of links of a node has a property that follows a 冪 distribution (see FIG. 1), and the page rank has a property that follows a 冪 distribution or a lognormal distribution.

　これらのグラフデータをノードの部分集合（クラスタ）に分割する事でグラフデータを特徴付ける方法が、近年幾つか提案されている。グラフデータの特徴付けは、例えば、Ｗｅｂコミュニティの発見、または、ネットワークの部分システムの同定など、種々の利用分野で幅広く応用する事が可能である。 Several methods for characterizing graph data by dividing these graph data into node subsets (clusters) have been proposed in recent years. The characterization of graph data can be widely applied in various fields of use, such as discovery of a Web community or identification of a partial system of a network.

　グラフデータを特徴付ける方法としては、まず、コミュニティとは相互にインタラクションをしあうノードの集合とみなし、リンクが密な部分とリンクが疎な部分に分かれるようにグラフを分割する技術がある。 As a method of characterizing graph data, first, there is a technology that considers a community as a set of nodes that interact with each other, and divides the graph so that the link is divided into the dense part and the link is sparse.

　例えば、非特許文献２には、グラフまたは時間的に得られるグラフを符号化する場合の記述長を基準としてグラフの分割を行なうことで、コミュニティ構造を推定する技術が記載されている。 For example, Non-Patent Document 2 describes a technique for estimating a community structure by dividing a graph based on a description length when encoding a graph or a graph obtained in time.

　この技術では、グラフを分割した場合に各部分グラフの記述長と分割方法の記述長を足したものが最小となるように、グラフの分割が行われる（GraphScope：グラフスコープ）。 In this technology, when a graph is divided, the graph is divided so that the sum of the description length of each subgraph and the description length of the division method is minimized (GraphScope).

　また、スケールフリーネットワークを含む一般のネットワークが持つ、自己相似性に着目し、行列の直積によってネットワーク構造を表現し、それを推定するための技術がある（非特許文献３参照）。 Also, there is a technique for expressing and estimating the network structure by direct product of matrices, paying attention to the self-similarity of general networks including scale-free networks (see Non-Patent Document 3).

　また、以下のような技術も知られている。 The following technologies are also known.

　非特許文献４には、ヒストグラムによる近似方法に関して、最小記述長原理に従った方法が記載されている。非特許文献５には、ヒストグラムによる近似方法に関して、赤池情報量基準に従った方法が記載されている。非特許文献６には、動的モデル選択の枠組みに関する技術が記載されている。
A.　L.　Barabasi　and　R.　Albert.　Emergence　of　scaling　in　random　networks.　Science,　286:509-512,　1999. J.　Sun,　P.　S.　Yu,　S.　Papadimitriou,　and　C.　Faloutsos.　Graphscope:　Parameter-free　mining　of　large　time-evolving　graphs.　In　Proceedings　of　the　13th　ACM　SIGKDD　international　conference　on　Knowledge　discovery　and　data　mining,　2007. Jure　Leskovec　and　Christos　Faloutsos,　‘Scalable　Modeling　of　Real　Graphs　using　Kronecker　Multiplication’,　ICML2007 J.　Rissanen,　T.　P.　Speed,　and　B.　Yu.　Density　estimation　by　stochastic　complexity.　IEEE　Transactions　on　Information　Theory,　38(2):315-323,　1992. C.　C.　Taylor.　Akaike's　information　criterion　and　the　histogram.　Biometrika　74(3):636-639,　1987 K.　Yamanishi　and　Y.　Maruyama.　Dynamic　model　selection　with　its　applications　to　novelty　detection.　IEEE　Transactions　on　Information　Theory,　53(6):2180-2189,　2007. L.　Page,　S.　Brin,　R.　Motwanim　and　T.　Winograd.　The　PageRank　Citation　Ranking:　Bringing　Order　to　the　Web.　Technical　Report,　Stanford　Digital　Library　Technologies　Project,　1998. J.　M.　Kleinberg.　Authoritative　sources　in　a　hyperlinked　environment.　Journal　of　the　ACM,　46:604-632,　2003. Non-Patent Document 4 describes a method according to the principle of minimum description length regarding an approximation method using a histogram. Non-Patent Document 5 describes a method according to the Akaike information criterion for an approximation method using a histogram. Non-Patent Document 6 describes a technique related to a framework for dynamic model selection.
A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286: 509-512, 1999. J. Sun, P. S. Yu, S. Papadimitriou, and C. Faloutsos.Graphscope: Parameter-free mining of large time-evolving graphs.In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007. Jure Leskovec and Christos Faloutsos, 'Scalable Modeling of Real Graphs using Kronecker Multiplication', ICML2007 J. Rissanen, T. P. Speed, and B. Yu. Density estimation by stochastic complexity.IEEE Transactions on Information Theory, 38 (2): 315-323, 1992. C. C. Taylor.Akaike's information criterion and the histogram.Biometrika 74 (3): 636-639, 1987 K. Yamanishi and Y. Maruyama.Dynamic model selection with its applications to novelty detection.IEEE Transactions on Information Theory, 53 (6): 2180-2189, 2007. L. Page, S. Brin, R. Motwanim and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report, Stanford Digital Library Technologies Project, 1998. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46: 604-632, 2003.

　上述したグラフ構造の推定技術は、ノードの重要度に従ってグラフの構造を特定する事ができないという課題がある。 The graph structure estimation technique described above has a problem that the graph structure cannot be specified according to the importance of the node.

　その理由は、上述した技術には、ノードの重要度によってグラフの構造を特徴付けるという思想が無く、グラフの分割を計算する際に、ノードの重要度の情報が考慮されていないためである。 The reason is that the above-described technique does not have the idea of characterizing the structure of the graph by the importance of the node, and does not consider the importance information of the node when calculating the division of the graph.

　ノードの重要度に従って、グラフを区分する事は、そのグラフ（ネットワーク）における各ノードの活性度に従ってノードを分類する事と捕らえる事もできる。 Securing the graph according to the importance of the node can be regarded as classifying the node according to the activity of each node in the graph (network).

　このため、上述した技術では、例えば、ノードの活性度つまりノードの重要度に基づいてノードを分類することが困難であった。 For this reason, with the above-described technique, it is difficult to classify the nodes based on, for example, the activity of the nodes, that is, the importance of the nodes.

　本発明の目的は、上述した課題を解決することが可能なグラフ構造推定装置、グラフ構造推定方法及びプログラムを提供することにある。 An object of the present invention is to provide a graph structure estimation device, a graph structure estimation method, and a program capable of solving the above-described problems.

　上記目的を達成するために、本発明のグラフ構造推定装置は、複数のノードと、当該複数のノードのうちの各ノード間の関係の程度を示すリンクと、によって表されるグラフデータを受け付けた場合に、当該グラフデータに基づいて、前記ノードごとに、当該ノードの重要性の程度を示す重要度情報を計算する計算手段と、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、複数のグループのいずれかに分ける分割手段と、前記ノードのグループ分けの結果を、前記グラフデータのグラフ構造情報として出力する出力手段と、を含む。 In order to achieve the above object, the graph structure estimation apparatus of the present invention receives graph data represented by a plurality of nodes and links indicating the degree of relationship between the nodes among the plurality of nodes. In this case, for each of the nodes based on the graph data, calculation means for calculating importance information indicating the degree of importance of the node, and each of the nodes based on the importance information of the node, A dividing unit that divides the node into any of a plurality of groups; and an output unit that outputs a result of grouping the nodes as graph structure information of the graph data.

　本発明のグラフ構造推定方法は、グラフ構造推定装置が行うグラフ構造推定方法であって、複数のノードと、当該複数のノードのうちの各ノード間の関係の程度を示すリンクと、によって表されるグラフデータを受け付けた場合に、当該グラフデータに基づいて、前記ノードごとに、当該ノードの重要性の程度を示す重要度情報を計算する計算ステップと、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、複数のグループのいずれかに分ける分割ステップと、前記ノードのグループ分けの結果を、前記グラフデータのグラフ構造情報として出力する出力ステップと、を含む。 The graph structure estimation method of the present invention is a graph structure estimation method performed by a graph structure estimation device, and is represented by a plurality of nodes and links indicating the degree of relationship between the nodes among the plurality of nodes. A calculation step for calculating importance information indicating the degree of importance of the node for each of the nodes based on the graph data; and A division step of dividing the group into one of a plurality of groups based on the degree information; and an output step of outputting the result of grouping the nodes as graph structure information of the graph data.

　本発明のプログラムは、コンピュータを、複数のノードと、当該複数のノードのうちの各ノード間の関係の程度を示すリンクと、によって表されるグラフデータを受け付けた場合に、当該グラフデータに基づいて、前記ノードごとに、当該ノードの重要性の程度を示す重要度情報を計算する計算手段、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、複数のグループのいずれかに分ける分割手段、および、前記ノードのグループ分けの結果を、前記グラフデータのグラフ構造情報として出力する出力手段として機能させる。 The program of the present invention is based on graph data when the computer receives graph data represented by a plurality of nodes and a link indicating the degree of relationship between each of the plurality of nodes. And calculating means for calculating importance information indicating the degree of importance of the node for each of the nodes, dividing each of the nodes into one of a plurality of groups based on the importance information of the node And a function of outputting the grouping result of the nodes as graph structure information of the graph data.

　本発明によれば、ノードの活性度つまり重要度に基づいてノードを分類することが可能になる。 According to the present invention, it becomes possible to classify the nodes based on the activity of the nodes, that is, the importance.

入力データ関する、リンク数に対するノード頻度の冪分布の一例を示した図である。It is the figure which showed an example of wrinkle distribution of the node frequency with respect to the number of links regarding input data. 本発明の第１の実施の形態に関わるグラフ構造推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the graph structure estimation apparatus in connection with the 1st Embodiment of this invention. 無向グラフデータの一例を示す図である。It is a figure which shows an example of undirected graph data. 有向グラフデータの一例を示す図である。It is a figure which shows an example of directed graph data. 図２に示したグラフ分割推定装置の一例を示すブロック図である。It is a block diagram which shows an example of the graph division | segmentation estimation apparatus shown in FIG. ノード重要度に対するノード重要度の冪分布の領域分割の一例を示す図である。It is a figure which shows an example of the area | region division | segmentation of the wrinkle distribution of node importance with respect to node importance. 分割ルールに従って冪分布を分割した場合の、冪分布の分割に従ってグラフがどのように分割されるかを表す一例を示す図である。It is a figure which shows an example showing how a graph is divided | segmented according to the division | segmentation of a wrinkle distribution at the time of dividing a wrinkle distribution according to a division | segmentation rule. 本発明の第１の実施の形態に関わるグラフ構造推定装置の処理例を示すフローチャートである。It is a flowchart which shows the process example of the graph structure estimation apparatus in connection with the 1st Embodiment of this invention. 本発明の第２の実施の形態に関わるグラフ構造推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the graph structure estimation apparatus in connection with the 2nd Embodiment of this invention. ノード重要度分布をヒストグラム近似によって分割する一例を示す図である。It is a figure which shows an example which divides | segments node importance distribution by histogram approximation. 本発明の第３の実施の形態に関わるグラフ構造推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the graph structure estimation apparatus in connection with the 3rd Embodiment of this invention. 本発明の第３の実施の形態に関わるノード重要度分割最適化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the node importance division | segmentation optimization apparatus in connection with the 3rd Embodiment of this invention. グラフの分割と各部分グラフに対する確率分布の割り当ての一例を示す図である。It is a figure which shows an example of the division | segmentation of a graph, and allocation of probability distribution with respect to each subgraph. 本発明の第４の実施の形態に関わるグラフ構造推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the graph structure estimation apparatus in connection with the 4th Embodiment of this invention. 本発明の第４の実施の形態に関わるノード重要度分割最適化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the node importance division | segmentation optimization apparatus in connection with the 4th Embodiment of this invention. ノード重要度分布に従った分割と、各部分グラフの分割の一例を示す図である。It is a figure which shows an example of the division | segmentation according to node importance distribution, and the division | segmentation of each subgraph. 本発明の第５の実施の形態に関わるグラフ構造推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the graph structure estimation apparatus in connection with the 5th Embodiment of this invention. 本発明の第５の実施の形態に関わる動的ノード重要度分割最適化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the dynamic node importance division | segmentation optimization apparatus in connection with the 5th Embodiment of this invention. 本発明の第６の実施の形態に関わるグラフ構造推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the graph structure estimation apparatus in connection with the 6th Embodiment of this invention.

符号の説明Explanation of symbols

　　　１００、２００、３００、４００、５００、６００　グラフ構造推定装置
　　　１１０、５１０　ノード重要度計算装置
　　　１１０ａ　　重要度計算部
　　　１１０ｂ　　重要度計算用データ記憶部
　　　１２０　　　グラフ分割計算装置
　　　１２０ａ　　グラフ分割計算部
　　　１２０ｂ　　分割用データ記憶部
　　　１３０　　　推定構造出力装置
　　　２１０、３１０、４１０　ノード重要度分割最適化装置
　　　２１０ａ　　分割用最適化用データ記憶部
　　　２１０ｂ　　ノード重要度分割最適化部
　　　３１１、４１１　符号長計算部記憶装置
　　　３１２　　　グラフ符号長計算部
　　　３１３　　　ノード重要度分割符号長計算部
　　　３１４、４１３、５１３　最適パラメータ計算装置
　　　４１２　　　部分グラフ分割符号長計算部
　　　５１１　　　グラフ列符号長計算部記憶装置
　　　５１２　　　モデル列符号長計算部記憶装置
　　　５２０、６１０　動的ノード重要度分割最適化装置
　　　６２０　　　モデルパラメータ記憶装置 100, 200, 300, 400, 500, 600 Graph

structure estimation device

110, 510 Node importance calculation device 110a Importance calculation unit 110b Importance calculation data storage unit 120 Graph division calculation device 120a Graph division calculation unit 120b Data for division Storage unit 130 Estimated

structure output device

210, 310, 410 Node importance division optimization device 210a Optimization data storage unit 210b Node importance division optimization unit 311, 411 Code length calculation unit Storage device 312 Graph code length calculation Unit 313 node importance division code

length calculation unit

314, 413, 513 optimal parameter calculation device 412 subgraph division code length calculation unit 511 graph sequence code length calculation unit storage device 512 model sequence code length calculation

unit storage device

520, 10 Dynamic Node importance dividing the optimizing device 620 model parameter storage device

　次に、本発明の実施の形態について、図面を参照して詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

　［第１の実施の形態］
　図２は、本発明の第１の実施の形態に関わるグラフ構造推定装置１００を示したブロック図である。 [First Embodiment]
FIG. 2 is a block diagram showing the graph structure estimation apparatus 100 according to the first embodiment of the present invention.

　図２を参照すると、グラフ構造推定装置１００は、ノード重要度計算装置１１０と、グラフ分割計算装置１２０と、推定構造出力装置１３０とを備えている。 2, the graph structure estimation device 100 includes a node importance calculation device 110, a graph partition calculation device 120, and an estimation structure output device 130.

　グラフ構造推定装置１００は、例えば、ＣＰＵ、メモリおよび入出力装置を含むコンピュータである。グラフ構造推定装置１００は、ハードディスクまたはメモリに記録されたプログラムに従って動作する。ハードディスクまたはメモリは、一般的に、コンピュータにて読み取り可能な記録媒体と呼ぶことができる。 The graph structure estimation apparatus 100 is a computer including a CPU, a memory, and an input / output device, for example. The graph structure estimation apparatus 100 operates according to a program recorded on a hard disk or a memory. A hard disk or memory can be generally called a computer-readable recording medium.

　グラフ構造推定装置１００は、プログラムを記録媒体から読み取り実行することによって、ノード重要度計算装置１１０、グラフ分割計算装置１２０、および、推定構造出力装置１３０として機能する。 The graph structure estimation apparatus 100 functions as a node importance calculation apparatus 110, a graph division calculation apparatus 120, and an estimation structure output apparatus 130 by reading and executing a program from a recording medium.

　グラフ構造推定装置１００は、グラフデータ１４０を受け付け、グラフデータ１４０にて形成されるグラフの構造を推定し、その推定結果（グラフ構造推定結果）１５０を出力する。 The graph structure estimation apparatus 100 receives the graph data 140, estimates the structure of the graph formed by the graph data 140, and outputs the estimation result (graph structure estimation result) 150.

　グラフデータ１４０は、複数のノードと、複数のノードのうちの各ノード間の関係の程度（度合い）を示すリンクと、によって表される。以下では、リンクは、各ノード間の関係の程度（度合い）を数値で示すこととする。 The graph data 140 is represented by a plurality of nodes and links indicating the degree (degree) of the relationship between the nodes among the plurality of nodes. Hereinafter, the link indicates the degree (degree) of the relationship between the nodes by numerical values.

　なお、グラフデータ１４０は、単一のデータとして入力されてもよい。また、時間と共に変化するグラフデータが時系列的に順次入力される事も可能である。 The graph data 140 may be input as a single data. It is also possible to sequentially input graph data that changes with time in time series.

　ノード重要度計算装置１１０は、一般的に計算手段と呼ぶことができる。 The node importance calculation device 110 can be generally called a calculation means.

　ノード重要度計算装置１１０は、グラフデータ１４０を受け付けると、グラフデータ１４０に基づいて、グラフデータ１４０に示されたノードごとに、ノードと他の全てのノードとの関係の程度（度合い）を示すノード重要度を計算する。なお、ノード重要度は、重要度情報の一例である。 Upon receiving the graph data 140, the node importance calculation device 110 indicates the degree (degree) of the relationship between the node and all other nodes for each node indicated in the graph data 140 based on the graph data 140. Calculate node importance. The node importance is an example of importance information.

　なお、ノード重要度計算装置１１０は、予め、各ノードのノード重要度を計算するためのルールを記憶し、そのルールにしたがって、各ノードのノード重要度を計算してもよい。 Note that the node importance calculation device 110 may store a rule for calculating the node importance of each node in advance, and calculate the node importance of each node according to the rule.

　ノード重要度としては、ノードのグラフにおける重要性あるいは活性の程度（度合い）を計算するための任意の指標を利用することが可能である。 As the node importance, it is possible to use any index for calculating the importance or the degree (degree) of activity in the node graph.

　例えば、ノード重要度として各ノードの持つリンクの頻度を利用する場合を説明すると、ノード重要度計算装置１１０は、リンクが示す数値のうち、同一のノードに関連する数値の総和を、ノードごとに計算する。ノード重要度計算装置１１０は、ノードごとに計算された総和を、各ノードのノード重要度として用いる。 For example, the case where the link frequency of each node is used as the node importance will be described. The node importance calculation apparatus 110 calculates, for each node, the sum of the numerical values related to the same node among the numerical values indicated by the links. calculate. The node importance calculation device 110 uses the sum calculated for each node as the node importance of each node.

　また例えば、ノード重要度として各ノードに対するページランクを利用する場合を説明すると、ノード重要度計算装置１１０は、非特許文献７に示されるページランクアルゴリズムによって、各ノードに対するページランクを計算する。ノード重要度計算装置１１０は、ノードごとに計算されたページランクを、各ノードのノード重要度として用いる。 Further, for example, a case where the page rank for each node is used as the node importance will be described. The node importance calculation apparatus 110 calculates the page rank for each node by the page rank algorithm shown in Non-Patent Document 7. The node importance calculation device 110 uses the page rank calculated for each node as the node importance of each node.

　また例えば、ノード重要度として各ノードに対するハブ指標およびオーソリィテ指標（非特許文献８）を利用する場合を説明すると、ノード重要度計算装置１１０は、非特許文献８に示されるHITS（Hyperlink-Induced　Topic　Search）アルゴリズムによって、各ノードに対するハブ指標またはオーソリティ指標を計算する。ノード重要度計算装置１１０は、ノードごとに計算されたハブ指標またはオーソリティ指標を、各ノードのノード重要度として用いる。 Further, for example, when a hub index and an authority index (Non-patent Document 8) for each node is used as the node importance, the node importance calculation apparatus 110 is configured by HITS (Hyperlink-Induced Topic) shown in Non-Patent Document 8. The Hub index or authority index for each node is calculated by the Search algorithm. The node importance calculation device 110 uses the hub index or authority index calculated for each node as the node importance of each node.

　ノード重要度計算装置１１０は、各ノードのノード重要度を示す重要度情報と、グラフデータ１４０とを、グラフ分割計算装置１２０に提供する。 The node importance calculation device 110 provides importance information indicating the node importance of each node and the graph data 140 to the graph partition calculation device 120.

　グラフ分割計算装置１２０は、一般的に分割手段と呼ぶことができる。 The graph partitioning calculation device 120 can generally be referred to as a partitioning means.

　グラフ分割計算装置１２０は、重要度情報（各ノードのノード重要度）に基づいて、各ノードを複数のグループのいずれかに分ける。ノードのグループ分けは、グラフデータ１４０にて構成されるグラフを、各グループに分割することを意味する。 The graph partitioning calculation device 120 divides each node into one of a plurality of groups based on importance information (node importance of each node). Node grouping means that a graph composed of graph data 140 is divided into groups.

　例えば、グラフ分割計算装置１２０は、ノード重要度の値域（ノード重要度の取りうる最小値から最大値まで）を、１つまたは複数の分割用ノード重要度（分割用ノード重要度）を用いて分割することによって、複数のグループ（この場合、ノード重要度の各分割領域）、および、分割領域に含まれるノードを決定する。なお、分割領域に含まれるノードを決定することは、各ノードを複数のグループに分けることを意味する。 For example, the graph partitioning calculation device 120 uses the node importance value range (from the minimum value to the maximum value that can be taken by the node importance value) using one or more dividing node importance values (dividing node importance values). By dividing, a plurality of groups (in this case, each divided area of node importance) and nodes included in the divided area are determined. It should be noted that determining the nodes included in the divided area means dividing each node into a plurality of groups.

　なお、グラフ分割計算装置１２０は、予め、分割用のルールを記憶し、その分割用のルールにしたがって、ノード重要度と、グラフデータ１４０と、に基づいて、各ノードを複数のグループに分けてもよい。 The graph partitioning calculation device 120 stores the rules for partitioning in advance, and divides each node into a plurality of groups based on the node importance and the graph data 140 according to the rules for partitioning. Also good.

　グラフ分割計算装置１２０は、グラフの分割結果に関する情報（例えば、グラフの分割数、分割方法、および、各ノードが属している分割領域を示す情報）と、重要度情報とを、推定構造出力装置１３０に提供する。 The graph partitioning calculation device 120 includes information related to a graph partitioning result (for example, information indicating the number of partitioning of the graph, a partitioning method, and a partition region to which each node belongs), and importance information, and an estimated structure output device 130.

　推定構造出力装置１３０は、一般的に出力手段と呼ぶことができる。 The estimated structure output device 130 can be generally referred to as output means.

　推定構造出力装置１３０は、例えば、ノードのグループ分けの結果を、グラフデータのグラフ構造推定結果（グラフ構造情報）１５０として出力する。なお、推定構造出力装置１３０は、ノードのグループ分けの結果と重要度情報を、グラフデータのグラフ構造情報として出力してもよい。 The estimated structure output device 130 outputs, for example, a node grouping result as a graph structure estimation result (graph structure information) 150 of graph data. Note that the estimated structure output device 130 may output the result of node grouping and importance information as graph structure information of graph data.

　ここで、グラフデータ１４０について説明する。 Here, the graph data 140 will be described.

　グラフデータ１４０としては、一般に、無向グラフデータと有向グラフデータが存在する。 The graph data 140 generally includes undirected graph data and directed graph data.

　図３は、無向グラフデータの一例を示した図である。 FIG. 3 is a diagram showing an example of undirected graph data.

　図３において、無効グラフデータは、複数のノード２ａと、リンク２ｂと、を含む。 3, the invalid graph data includes a plurality of nodes 2a and links 2b.

　リンク２ｂ内の数値「１」は、図３に示した表中で対応するノード間にリンクがある事を表す。例えば、ノード１とノード２の間の数値は「１」なので、ノード１とノード２の間にリンクがあることが表される。 The numerical value “1” in the link 2b indicates that there is a link between corresponding nodes in the table shown in FIG. For example, the numerical value between the node 1 and the node 2 is “1”, which means that there is a link between the node 1 and the node 2.

　リンク２ｂ内の数値「０」は、図３に示した表中で対応するノード間にリンクがない事を表す。例えば、ノード１とノードｎの間の数値は「０」なので、ノード１とノードｎの間にリンクがないことが表される。 The numerical value “0” in the link 2b indicates that there is no link between corresponding nodes in the table shown in FIG. For example, since the numerical value between the node 1 and the node n is “0”, it indicates that there is no link between the node 1 and the node n.

　無向グラフでは、ノード１からノード２へのリンクと、ノード２からノード１へのリンクの間に、区別はない。このため、無向グラフデータの表内（図３）のリンクにて表現される値は、ノード間のインデックス（ノード番号）が入れ替わっても同じ値となり、行の数（ｎ）と列の数（ｎ）は等しい。 In the undirected graph, there is no distinction between the link from node 1 to node 2 and the link from node 2 to node 1. For this reason, the value expressed by the link in the table of undirected graph data (FIG. 3) becomes the same value even if the index (node number) between nodes is changed, and the number of rows (n) and the number of columns (N) is equal.

　この例では、リンクは「１」または「０」の２値をとるが、リンクが示す数値として、リンクの強さを表す実数値が用いられてもよい。 In this example, the link takes a binary value of “1” or “0”, but a real value representing the strength of the link may be used as a numerical value indicated by the link.

　図４は、有向グラフデータの一例を示した図である。 FIG. 4 is a diagram showing an example of directed graph data.

　有向グラフでは、行と列が同じノードを表す必要はない。図４は、例えば、行ノード１から列ノード２へのリンクの強さが「０．５」である事を表している。これに付随し、行の数（ｎｒ）と列の数（ｎｃ）も等しい必要はない。 In a directed graph, rows and columns need not represent the same node. FIG. 4 shows that, for example, the strength of the link from the row node 1 to the column node 2 is “0.5”. Accompanying this, the number of rows (nr) and the number of columns (nc) need not be equal.

　グラフデータの具体的な例を挙げると、Ｗｅｂページのハイパーリンクを分析する場合、ノードは各Ｗｅｂページであり、リンクはＷｅｂページ間のハイパーリンクの有無とする事が考えられる。 As a specific example of graph data, when analyzing a hyperlink of a web page, it is conceivable that the node is each web page and the link is the presence or absence of a hyperlink between web pages.

　また、ＳＮＳネットワークの分析をする場合には、ノードは各ユーザーであり、リンクはユーザー間の友人登録の有無とする事が考えられる。 Also, when analyzing the SNS network, the node is each user, and the link may be a friend registration between users.

　また、ＬＡＮの解析をする場合には、ノードは各ネットワーク機器であり、リンクは機器間のトラフィック量などとする事が考えられる。 Also, when analyzing the LAN, the node is each network device, and the link may be the traffic volume between the devices.

　以降では、グラフデータはＧと表記し、（行）ノードｉから（列）ノードｊへのリンクをｇ_ｉｊと表記する。 Hereinafter, the graph data is denoted as G, and the link from the (row) node i to the (column) node j is denoted as g _ij .

　また、|Ｇ|は、グラフの疎密の度合いを表す量で、例えば（１）式で定義可能である。 Also, | G | is a quantity representing the degree of density of the graph, and can be defined by, for example, the expression (1).

　（１）式では、ｇ_ｉｊが「１」または「０」をとる場合には、|Ｇ|は、グラフ中のリンクの総数を意味する。 In the formula (1), when g _ij takes “1” or “0”, | G | means the total number of links in the graph.

　図５は、グラフ構造推定装置１００の一例を示したブロック図である。図５において、図２に示したものと同一のものには同一符号を付してある。

FIG. 5 is a block diagram illustrating an example of the graph structure estimation apparatus 100. 5, the same components as those shown in FIG. 2 are denoted by the same reference numerals.

　図５において、ノード重要度計算装置１１０は、重要度計算用データ記憶部１１０ａと、重要度計算部１１０ｂとを含む。 In FIG. 5, the node importance calculation device 110 includes an importance calculation data storage unit 110a and an importance calculation unit 110b.

　重要度計算用データ記憶部１１０ａは、一般的に重要度計算用データ記憶手段と呼ぶことができる。重要度計算用データ記憶部１１０ａは、グラフデータ１４０を用いて各ノードのノード重要度を計算するためのルール（例えば、計算式）を記憶している。 The importance calculation data storage unit 110a can be generally referred to as importance calculation data storage means. The importance calculation data storage unit 110 a stores rules (for example, calculation formulas) for calculating the node importance of each node using the graph data 140.

　重要度計算部１１０ｂは、一般的に重要度計算手段と呼ぶことができる。重要度計算部１１０ｂは、重要度計算用データ記憶部１１０ａ内のルールと、グラフデータ１４０と、を用いて、各ノードのノード重要度を計算する。 The importance level calculation unit 110b can be generally called importance level calculation means. The importance calculation unit 110b calculates the node importance of each node using the rules in the importance calculation data storage unit 110a and the graph data 140.

　ノード重要度の計算方法として、任意の方法を利用する事が可能である。 Any method can be used as the node importance calculation method.

　例えば、ノード重要度として各ノードの持つリンクの頻度を利用する場合を説明すると、重要度計算部１１０ｂは、行ノードｉに対するノード重要度ｘ_ｒ，ｉを、（２）式にしたがって求められる絶対リンク頻度、または、（３）式にしたがって求められる相対リンク頻度として計算する事が可能である。 For example, when describing the case of using the frequency of the link with the respective node as a node importance, the importance calculating unit 110b, the node importance x _r for row node _i, a _i, absolute obtained according (2) It is possible to calculate the link frequency or the relative link frequency obtained according to the equation (3).

　なお、重要度計算用データ記憶部１１０ａは、例えば、（１）式と（２）式と（３）式を記憶している。 Note that the importance calculation data storage unit 110a stores, for example, Expressions (1), (2), and (3).

　また例えば、ノード重要度としては、非特許文献７に示されるページランクや非特許文献８に示されるハブ指標およびオーソリティ指標などを利用することが可能である。その場合には、重要度計算用データ記憶部１１０ａは、ページランクの計算式やハブ指標およびオーソリティ指標の計算式を記憶している。 For example, as the node importance, the page rank shown in Non-Patent Document 7, the hub index and authority index shown in Non-Patent Document 8, or the like can be used. In that case, the importance calculation data storage unit 110a stores a page rank calculation formula, a hub index, and an authority index calculation formula.

　同様の手順で計算される列ノードｊに対するノード重要度をｘ_ｃ，ｊと表記する事にする。無向グラフの場合には、行と列の区別がないため、ノードｉのリンク頻度を単にｘ_ｉと表記する。 The node importance for the column node j calculated in the same procedure is expressed as _{xc, j} . In the case of an undirected graph, there is no distinction between rows and columns, simply referred to as x _i link frequency of node i.

　重要度計算部１１０ｂは、各ノードのノード重要度を示す重要度情報と、グラフデータ１４０とを、グラフ分割計算装置１２０に提供する。

The importance calculation unit 110b provides importance information indicating the node importance of each node and the graph data 140 to the graph partition calculation device 120.

　グラフ分割計算装置１２０は、分割用データ記憶部１２０ａと、グラフ分割計算部１２０ｂとを含む。 The graph division calculation device 120 includes a division data storage unit 120a and a graph division calculation unit 120b.

　分割用データ記憶部１２０ａは、一般的に分割用データ記憶手段と呼ぶことができる。分割用データ記憶部１２０ａは、ノード重要度計算装置１００で計算されたノード重要度を用いて、グラフデータ１４０にて形成されるグラフを分割するためのルール（例えば、ノードの分類条件）を記憶する。 The division data storage unit 120a can be generally referred to as division data storage means. The division data storage unit 120a stores rules (for example, node classification conditions) for dividing the graph formed by the graph data 140 using the node importance calculated by the node importance calculation device 100. To do.

　グラフ分割計算部１２０ｂは、分割用データ記憶部１２０ａ内のルールと、ノード重要度と、グラフデータ１４０とを用いて、グラフデータ１４０にて形成されるグラフを分割する。 The graph division calculation unit 120b divides the graph formed by the graph data 140 using the rules in the division data storage unit 120a, the node importance, and the graph data 140.

　グラフ分割計算部１２０ｂは、分割用データ記憶部１２０ａ内のルールにしたがって、１つまたは複数の分割用ノード重要度を設定し、ノード重要度を、分割用ノード重要度を用いて分割する。 The graph division calculation unit 120b sets one or more division node importance levels according to the rules in the division data storage unit 120a, and divides the node importance levels using the division node importance levels.

　グラフを分割するためのルールとしては、例えば、ノード重要度が、ノードの持つリンク数を表す場合、「リンク数が１００以上→領域１、リンク数が５０以上１００未満→領域２、リンク数が５０未満→領域３」などのように、ノードの分類先（領域；グループ）をリンク数に応じて絶対的に決めるルールも考えられる。この場合、リンク数＝１００、および、リンク数＝５０が、分割用ノード重要度となる。 As a rule for dividing the graph, for example, when the node importance indicates the number of links of the node, “the number of links is 100 or more → area 1, the number of links is 50 or more and less than 100 → area 2, the number of links is A rule is also conceivable in which the node classification destination (area; group) is absolutely determined according to the number of links, such as “less than 50 → area 3”. In this case, the number of links = 100 and the number of links = 50 are division node importance levels.

　また、グラフを分割するためのルールとしては、例えば、ノード重要度が、ノードの持つリンク数を表す場合、「リンク数が上位３０％に入る→領域１、リンク数が上位５０％未満→領域２」などと、ノードの分類先（領域；グループ）をリンク数に応じて相対的に決めるルールも考えられる。この場合、上位３０％に相当するリンク数、および、上位５０％に相当するリンク数が、分割用ノード重要度となる。 Further, as a rule for dividing the graph, for example, when the node importance indicates the number of links of the node, “the number of links is in the upper 30% → area 1, the number of links is less than the upper 50% → area A rule that relatively determines the node classification destination (area; group) according to the number of links, such as “2”, is also conceivable. In this case, the number of links corresponding to the top 30% and the number of links corresponding to the top 50% are the node importance for division.

　図６は、ノード重要度を２つの分割用ノード重要度を用いて、３分割する一例を模式的に示した図である。縦軸は、横軸のノード重要度に対応するノードの数を表している。 FIG. 6 is a diagram schematically showing an example of dividing the node importance into three parts using the two node importance levels for division. The vertical axis represents the number of nodes corresponding to the node importance on the horizontal axis.

　この場合には、領域１に属するノードは、リンクを少数のみ持つ通常のノード、領域３に属するノードは、リンクを多数持つハブ的なノード、領域２に属するノードは、その中間のノードと解釈可能である。 In this case, the node belonging to region 1 is interpreted as a normal node having only a few links, the node belonging to region 3 is interpreted as a hub node having a large number of links, and the node belonging to region 2 is interpreted as an intermediate node. Is possible.

　また、図７は、有向グラフの行方向と列方向のそれぞれを、分割用ノード重要度を用いて分割した場合に、グラフがどのように分割されるのかを表した図である。 FIG. 7 is a diagram showing how the graph is divided when the row direction and the column direction of the directed graph are divided using the node importance for division.

　図７では、便宜的に、各行および列がノード重要度に従って整列されている事に注意する。 Note that in FIG. 7, for convenience, each row and column is aligned according to node importance.

　グラフ分割計算部１２０ｂは、重要度情報と、グラフの分割結果に関する情報（以下「グラフ構造情報」と称する。）を、推定構造出力装置１３０に提供する。 The graph division calculation unit 120b provides the estimated structure output device 130 with the importance level information and information about the graph division result (hereinafter referred to as “graph structure information”).

　推定構造出力装置１３０は、例えば、グラフ分割計算装置１２０で推定されたグラフの構造を表すグラフ構造情報と、ノード重要度計算装置１１０で計算された重要度情報とのいずれかまたは両方を出力する。 The estimated structure output device 130 outputs, for example, one or both of graph structure information representing the structure of the graph estimated by the graph partitioning calculation device 120 and importance information calculated by the node importance calculation device 110. .

　出力先は、グラフ構造推定装置１００に接続されたディスプレイなどの出力装置であってもよいし、ネットワークを介して接続された出力装置または端末装置であってもよい。 The output destination may be an output device such as a display connected to the graph structure estimation device 100, or may be an output device or a terminal device connected via a network.

　ここで言う推定されたグラフの構造とは、各ノードがどの領域に属するかという情報、行および列の分割数と分割の幅などの情報の事である。 Here, the estimated graph structure is information such as which region each node belongs to, information such as the number of row and column divisions and the division width.

　図８は、グラフ構造推定装置１００の動作を説明するためのフローチャートである。 FIG. 8 is a flowchart for explaining the operation of the graph structure estimation apparatus 100.

　図８を参照すると、ノード重要度計算装置１１０は、無向グラフデータまたは有向グラフデータを入力する（Ｓ１００）。 Referring to FIG. 8, the node importance calculation device 110 inputs undirected graph data or directed graph data (S100).

　続いて、ノード重要度計算装置１１０は、入力されたグラフデータが示す各ノードのノード重要度を計算する（Ｓ１０１）。 Subsequently, the node importance calculation device 110 calculates the node importance of each node indicated by the input graph data (S101).

　次に、グラフ分割計算装置１２０は、計算されたノード重要度を利用して、グラフのノードの分割（ノードのグループ分け）を計算する（Ｓ１０２）。 Next, the graph partition calculation device 120 calculates the node division (node grouping) of the graph using the calculated node importance (S102).

　次に、推定構造出力装置１３０は、グラフ分割計算装置１２０で推定されたグラフの構造（グループ分け結果）を表すグラフ構造情報と、ノード重要度計算装置１１０で計算された重要度情報とのいずれかまたは両方を出力する（Ｓ１０３）。 Next, the estimated structure output device 130 selects either the graph structure information representing the structure of the graph (grouping result) estimated by the graph partitioning calculation device 120 or the importance information calculated by the node importance calculation device 110. Or both are output (S103).

　グラフが時間的に順次得られる場合には、グラフ構造推定装置１００は、この処理を繰り返す事によって、グラフ構造を推定する事が可能である。 When the graphs are obtained sequentially in time, the graph structure estimation apparatus 100 can estimate the graph structure by repeating this process.

　本実施形態によれば、ノード重要度計算装置１１０は、ノードごとに重要度情報を計算する。グラフ分割計算装置１２０は、ノードのそれぞれを、そのノードの重要度情報に基づいて、複数のグループのいずれかに分ける。推定構造出力装置１３０は、ノードのグループ分けの結果を、グラフデータのグラフ構造情報として出力する。 According to this embodiment, the node importance calculation device 110 calculates importance information for each node. The graph partitioning calculation device 120 divides each node into one of a plurality of groups based on the importance level information of the node. The estimated structure output device 130 outputs the result of node grouping as graph structure information of graph data.

　このため、各ノードを、ノードの重要度にしたがってグループ分けすることが可能になる。例えば、ハブ構造を持ったネットワークは、ハブに対する攻撃に対して脆弱である事が知られているが、本実施形態によってノードを重要度で分類すれば、各ノードについて、攻撃に対する防御の必要度合いをレベル分けすることが可能となる。 For this reason, each node can be grouped according to the importance of the node. For example, a network having a hub structure is known to be vulnerable to attacks against the hub, but if nodes are classified according to importance according to this embodiment, the degree of necessity of defense against attacks for each node. Can be divided into levels.

　本実施形態では、グラフ分割計算装置１２０は、ノード重要度情報の取りうる最小値から最大値までを表すノード重要度の値域を、１つまたは複数の分割用ノード重要度を用いて分割して得られた複数の分割領域を、複数のグループとする。 In the present embodiment, the graph partitioning calculation device 120 divides the node importance value range representing the minimum value to the maximum value that can be taken by the node importance information by using one or more dividing node importance values. The obtained plurality of divided regions are set as a plurality of groups.

　［第２の実施の形態］
　図９は、本発明の第２の実施の形態に関わるグラフ構造推定装置２００を示したブロック図である。 [Second Embodiment]
FIG. 9 is a block diagram showing a graph structure estimation apparatus 200 according to the second embodiment of the present invention.

　グラフ構造推定装置２００は、例えば、ＣＰＵ、メモリおよび入出力装置を含むコンピュータである。グラフ構造推定装置２００は、ハードディスクまたはメモリに記録されたプログラムに従って動作する。 The graph structure estimation apparatus 200 is a computer including a CPU, a memory, and an input / output device, for example. The graph structure estimation apparatus 200 operates according to a program recorded on a hard disk or a memory.

　グラフ構造推定装置２００は、プログラムを記録媒体から読み取り実行することによって、ノード重要度計算装置１１０、ノード重要度分割最適化装置２１０、および、推定構造出力装置１３０として機能する。 The graph structure estimation apparatus 200 functions as a node importance calculation apparatus 110, a node importance division optimization apparatus 210, and an estimation structure output apparatus 130 by reading and executing a program from a recording medium.

　図９を参照すると、グラフ構造推定装置２００は、図２に示した第１の実施の形態に関わるグラフ構造推定装置１００と比較して、グラフ分割計算装置１２０に代えて、ノード重要度分割最適化装置２１０を有する点で相違する。 Referring to FIG. 9, the graph structure estimation apparatus 200 replaces the graph partition calculation apparatus 120 in comparison with the graph structure estimation apparatus 100 according to the first embodiment illustrated in FIG. It differs in that it has the conversion device 210.

　以下、グラフ構造推定装置２００について、グラフ構造推定装置１００との相違点を中心に説明する。 Hereinafter, the graph structure estimation apparatus 200 will be described focusing on differences from the graph structure estimation apparatus 100.

　ノード重要度分割最適化装置２１０は、一般的に分割手段、分割符号長計算手段および分割制御手段と呼ぶことができる。 The node importance division optimization device 210 can be generally called a division unit, a division code length calculation unit, and a division control unit.

　ノード重要度分割最適化装置２１０は、分割最適化用データ記憶部２１０ａと、ノード重要度分割最適化部２１０ｂとを含む。 The node importance division optimization apparatus 210 includes a division optimization data storage unit 210a and a node importance division optimization unit 210b.

　分割最適化用データ記憶部２１０ａは、一般的に分割最適化用データ記憶手段と呼ぶことができる。分割最適化用データ記憶部２１０ａは、グラフを分割するための単純なルールではなく、入力データ（グラフデータ）に対してグラフの分割を最適化するための計算手順を記憶している。 The division optimization data storage unit 210a can be generally referred to as division optimization data storage means. The division optimization data storage unit 210a stores not a simple rule for dividing a graph but a calculation procedure for optimizing the division of the graph with respect to the input data (graph data).

　ノード重要度分割最適化装置２１０（具体的には、ノード重要度分割最適化部２１０ｂ）は、分割最適化用データ記憶部２１０ａ内の計算手順に従って、グラフの最適な分割モデルを計算する。 The node importance division optimization device 210 (specifically, the node importance division optimization unit 210b) calculates the optimal division model of the graph according to the calculation procedure in the division optimization data storage unit 210a.

　最適化の対象となるパラメータとしては、例えば、分割数、および、各領域の大きさなどがある。このパラメータによって、分割用ノード重要度が決定される。 Optimized parameters include, for example, the number of divisions and the size of each area. Based on this parameter, the node importance for division is determined.

　ノード重要度分割最適化装置２１０は、図６であらわされるような、ノード重要度の値域の分割モデル（分割用ノード重要度の設定）を、任意の最適化方法（最適化基準）、例えば、最小記述長原理、赤池情報量基準、または、ベイズ情報量基準を利用して、グラフデータに基づき計算する事が可能である。 The node importance division optimization apparatus 210 converts a node importance value range division model (setting of node importance for division) into an arbitrary optimization method (optimization criterion), for example, as shown in FIG. It is possible to calculate based on graph data using the minimum description length principle, Akaike information criterion, or Bayesian information criterion.

　分割モデルを決める方法としては、例えば、図１０に示されるように、ノード重要度の値域におけるノード重要度の分布（ノード重要度分布）をヒストグラムによって近似し、ヒストグラムの各領域にグラフの分割領域（グループ）を対応させる方法が考えられる。 As a method for determining the division model, for example, as shown in FIG. 10, the node importance distribution (node importance distribution) in the node importance value range is approximated by a histogram, and a graph divided area is divided into each area of the histogram. A method of making (group) correspond can be considered.

　ノード重要度分布をヒストグラムによって近似する方法に関しては、例えば、最小記述長原理に従った方法（例えば非特許文献３）、または、赤池情報量基準に従った方法（例えば非特許文献４）などを利用する事が可能である。 As a method of approximating the node importance distribution by a histogram, for example, a method according to the minimum description length principle (for example, Non-Patent Document 3) or a method according to the Akaike information criterion (for example, Non-Patent Document 4) is used. It is possible to use.

　例えば、ノード重要度分割最適化装置２１０は、分割用ノード重要度の個数および各分割用ノード重要度の値（ノード重要度の値域の分割結果）を分割モデルとし、互いに異なる分割モデルごとに、ノード重要度および分割モデルを符号化するために必要な分割記述長を算出する。この際、分割記述長の中には、分割モデル自身の記述長も含まれていることに注意が必要である。 For example, the node importance degree division optimization device 210 uses the number of node importance levels for division and the value of each node importance level (the result of dividing the range of node importance levels) as a division model, and for each different division model, The node description and the division description length necessary for encoding the division model are calculated. At this time, it should be noted that the divided description length includes the description length of the divided model itself.

　ノード重要度分割最適化装置２１０は、互いに異なる分割モデルの中から、分割記述長が最小になる最適化分割モデルを特定し、ノードのそれぞれを、重要度情報に基づいて、最適化分割モデルにて特定される複数のグループのいずれかに分ける。 The node importance division optimization device 210 identifies an optimized division model having a minimum division description length from among different division models, and converts each of the nodes into an optimized division model based on importance information. Divided into one of a plurality of groups.

　入力されたグラフデータ１４０が有向グラフデータの場合には、ノード重要度分割最適化装置２１０は、行方向と列方向のノード重要度の分布のそれぞれをヒストグラムによって近似して分割を最適化する事が可能である。 When the input graph data 140 is directed graph data, the node importance division optimization device 210 may optimize the division by approximating each of the node importance distributions in the row direction and the column direction by a histogram. Is possible.

　本実施の形態のグラフ構造推定装置２００の動作は、図８のステップＳ１０２の処理で、グラフの分割が計算される際、予め記憶されているルールによって分割するのではなく、ステップＳ１０１で計算された重要度情報に対して分割の最適化を実施する点を除き、図２に示した第１の実施の形態に関わるグラフ構造推定装置１００の動作と同じである。 The operation of the graph structure estimation apparatus 200 according to the present embodiment is calculated in step S101 instead of dividing according to a pre-stored rule when the graph division is calculated in step S102 of FIG. The operation is the same as the operation of the graph structure estimation apparatus 100 according to the first embodiment shown in FIG. 2 except that the division information is optimized.

　本実施形態によれば、ノード重要度分割最適化装置２１０は、互いに異なる分割モデルごとに、ノード重要度および分割モデルを符号化するために必要な分割記述長を算出する。ノード重要度分割最適化装置２１０は、互いに異なる分割モデルの中から、分割記述長が最小になる最適化分割モデルを特定し、ノードのそれぞれを、ノードの重要度情報に基づいて、最適化分割モデルにて特定される複数のグループのいずれかに分ける。 According to the present embodiment, the node importance degree division optimization apparatus 210 calculates a node importance degree and a division description length necessary for encoding the division model for each different division model. The node importance division optimization device 210 identifies an optimized division model that minimizes the division description length from among different division models, and optimizes each of the nodes based on node importance information. Divide into one of multiple groups specified in the model.

　この場合、分割モデルの最適化が可能になる。 In this case, the division model can be optimized.

　なお、ノード重要度分割最適化装置２１０は、動的計画法を用いて、最適化分割モデルを特定することが望ましい。 Note that it is desirable that the node importance division optimization apparatus 210 specifies an optimized division model using dynamic programming.

　また、ノード重要度分割最適化装置２１０は、分割記述長を小さくする方向へ、分割結果を更新することを繰り返すことによって、最適化分割モデルを特定してもよい。 Further, the node importance division optimization device 210 may identify the optimized division model by repeatedly updating the division result in the direction of reducing the division description length.

　また、ノード重要度分割最適化装置２１０は、動的計画法を用いて計算された分割結果を初期値とし、初期値から分割記述長を小さくする方向へ分割結果を更新することを繰り返すことによって、最適化分割モデルを特定してもよい。 Further, the node importance degree division optimization apparatus 210 sets the division result calculated by using dynamic programming as an initial value, and repeatedly updates the division result from the initial value in a direction of reducing the division description length. The optimized division model may be specified.

　なお、これらの特定方法については、後述する第３の実施の形態での説明内の「グラフ符号長とノード重要度分割の符号長の和」を「ノード重要度分割の符号長（分割記述長）」と読み代えることにより説明可能である。 As for these specifying methods, “the sum of the graph code length and the code length of the node importance division” in the description of the third embodiment to be described later is referred to as “the code length of the node importance division (division description length). ) ".

　［第３の実施の形態］
　図１１は、本発明の第３の実施の形態に関わるグラフ構造推定装置３００を示したブロック図である。 [Third Embodiment]
FIG. 11 is a block diagram showing a graph structure estimation apparatus 300 according to the third embodiment of the present invention.

　グラフ構造推定装置３００は、例えば、ＣＰＵ、メモリおよび入出力装置を含むコンピュータである。グラフ構造推定装置３００は、ハードディスクまたはメモリに記録されたプログラムに従って動作する。 The graph structure estimation apparatus 300 is a computer including a CPU, a memory, and an input / output device, for example. The graph structure estimation apparatus 300 operates according to a program recorded in a hard disk or memory.

　グラフ構造推定装置３００は、プログラムを記録媒体から読み取り実行することによって、ノード重要度計算装置１１０、ノード重要度分割最適化装置３１０、および、推定構造出力装置１３０として機能する。 The graph structure estimation apparatus 300 functions as a node importance calculation apparatus 110, a node importance division optimization apparatus 310, and an estimation structure output apparatus 130 by reading and executing a program from a recording medium.

　図１１を参照すると、グラフ構造推定装置３００は、図９に示した第２の実施の形態に関わるグラフ構造推定装置２００と比較して、ノード重要度分割最適化装置２１０に代えて、ノード重要度分割最適化装置３１０を有する点で相違する。 Referring to FIG. 11, the graph structure estimation apparatus 300 is different from the graph structure estimation apparatus 200 according to the second embodiment illustrated in FIG. The difference is that the degree division optimization device 310 is provided.

　以下、グラフ構造推定装置３００について、グラフ構造推定装置２００との相違点を中心に説明する。 Hereinafter, the graph structure estimation apparatus 300 will be described focusing on differences from the graph structure estimation apparatus 200.

　ノード重要度分割最適化装置３１０は、一般的に分割手段と呼ぶことができる。 The node importance division optimization device 310 can be generally called a division means.

　ノード重要度分割最適化装置３１０は、図１２に示されるように、符号長計算部記憶装置３１１と、最適パラメータ計算装置３１４とを備えている。符号長計算部記憶装置３１１は、グラフ符号長計算部３１２およびノード重要度分割符号長計算部３１３を記憶している。 The node importance division optimization device 310 includes a code length calculation unit storage device 311 and an optimum parameter calculation device 314 as shown in FIG. The code length calculation unit storage device 311 stores a graph code length calculation unit 312 and a node importance degree division code length calculation unit 313.

　ノード重要度分割最適化装置３１０は、グラフデータ１４０と、ノード重要度計算装置１１０で計算されたノード重要度情報３１５とを入力とし、グラフ構造推定結果１５０を出力する。 The node importance division optimization device 310 receives the graph data 140 and the node importance information 315 calculated by the node importance calculation device 110, and outputs a graph structure estimation result 150.

　符号長計算部記憶装置３１１は、一般的に符号長計算手段と呼ぶことができる。 The code length calculation unit storage device 311 can be generally called code length calculation means.

　ノード重要度分割符号長計算部３１３は、一般的に分割符号長計算手段と呼ぶことができる。また、グラフ符号長計算部３１２は、一般的にグラフ符号長計算手段と呼ぶことができる。 The node importance division code length calculation unit 313 can be generally called division code length calculation means. Graph code length calculation unit 312 can be generally called graph code length calculation means.

　なお、ノード重要度分割符号長計算部３１３とグラフ符号長計算部３１２は、例えば、コンピュータにて実行されたときに所定の計算を実行するプログラムである。 Note that the node importance division code length calculation unit 313 and the graph code length calculation unit 312 are programs that execute predetermined calculations when executed by a computer, for example.

　ノード重要度分割符号長計算部３１３とグラフ符号長計算部３１２は、あるパラメータにおける、ノード重要度情報およびノード重要度の分割モデルを符号化するために必要な記述長と、その分割の元でグラフを符号化するための記述長を計算するための手順を示す。 The node importance division code length calculation unit 313 and the graph code length calculation unit 312 use the description length necessary for encoding the node importance information and the node importance division model in a certain parameter, and the source of the division. A procedure for calculating a description length for encoding a graph is shown.

　例えば、ノード重要度分割符号長計算部３１３は、分割用ノード重要度を用いたノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、ノード重要度情報および分割モデルを符号化するために必要な分割記述長を算出する。この際、分割記述長の中には、分割モデル自身の記述長も含まれていることに注意が必要である。 For example, the node importance division code length calculation unit 313 uses the division result of the node importance value range using the node importance for division as a division model, and codes the node importance information and the division model for each different division model. The division description length necessary for conversion to At this time, it should be noted that the divided description length includes the description length of the divided model itself.

　また、グラフ符号長計算部３１２は、ノードのそれぞれを、ノード重要度情報に基づいて、分割モデルにて特定される複数のグループのいずれかに分けた際に、ノードのグループ分けによって分割されたグラフを符号化するためのグラフ記述長を、分割モデルごとに算出する。 Also, the graph code length calculation unit 312 divides each of the nodes by grouping the nodes when dividing each of the nodes into any of a plurality of groups specified by the division model based on the node importance information. The graph description length for encoding the graph is calculated for each division model.

　ノード重要度の分割結果を符号化するために必要な記述長は、ノード重要度分割最適化装置２１０が行う方法と同様の方法で計算可能である。 The description length required for encoding the node importance division result can be calculated by the same method as that performed by the node importance division optimization apparatus 210.

　分割モデルが与えられると、グラフＧは、図１３で示されるように、幾つかの部分グラフへ分割される（図１３では、Ｇ_１，１からＧ_２，３まで）。 Given a split model, graph G is split into several subgraphs (in FIG. 13, from G _1,1 to G _2,3 ), as shown in FIG.

　このとき、ノード重要度分割最適化装置３１０は、各部分グラフに対して、リンクの発生する確率分布をそれぞれ割り当てる（図１３ではｐ_１，１からｐ_２，３まで）。 At this time, the node importance division optimization device 310 assigns a probability distribution in which a link is generated to each subgraph (from p _1,1 to p _{2,3 in} FIG. 13).

　リンクが「１」または「０」の２値の場合には、リンクの発生する確率分布は、ベルヌーイ分布とする事ができる。 When the link is a binary value of “1” or “0”, the probability distribution that the link is generated can be a Bernoulli distribution.

　また、リンクが「０」から「１」の間の値をとる場合には、リンクの発生する確率分布は、ベータ分布とする事が可能である。 Also, when the link takes a value between “0” and “1”, the probability distribution that the link is generated can be a beta distribution.

　また、リンクが「０」以上の値をとる場合には、指数分布やガンマ分布とする事が可能である。 Also, when the link takes a value of “0” or more, it can be an exponential distribution or a gamma distribution.

　このように、ノード重要度分割最適化装置３１０は、リンクの定義によって、それぞれ適切な確率分布を割り当てる事ができる。 Thus, the node importance division optimization device 310 can assign an appropriate probability distribution according to the definition of the link.

　グラフ符号長とは、各部分グラフのリンクを、割り当てられた確率（分布）によって生成した場合の符号長を指す。 The graph code length refers to the code length when the link of each subgraph is generated with the assigned probability (distribution).

　各分割領域はノードとしての性質が異なるため、各領域内および各領域間のインタラクションは異なるモデルで表現する事が自然である。 Since each divided area has a different property as a node, it is natural to represent the interaction within each area and between each area using different models.

　ノード重要度情報だけでなく、グラフの符号化を考える事によって、それらのインタラクションを考慮した領域の分割を行なう事が可能となる。  By considering not only the node importance information but also the encoding of the graph, it becomes possible to divide the area in consideration of those interactions.

　最適パラメータ計算装置３１４は、一般的に分割制御手段と呼ぶことができる。 The optimum parameter calculation device 314 can be generally called a division control means.

　最適パラメータ計算装置３１４は、グラフデータ１４０とノード重要度情報３１５を読み込むと、符号長計算部記憶装置３１１に記憶されている計算部３１２および３１３を読み込み、計算部３１２および３１３を用いて、グラフ符号長とノード重要度分割の符号長の和を最小とするパラメータ（分割モデル）を計算する。 When the optimal parameter calculation device 314 reads the graph data 140 and the node importance level information 315, the optimal parameter calculation device 314 reads the

calculation units

312 and 313 stored in the code length calculation unit storage device 311 and uses the

calculation units

312 and 313 to read the graph. A parameter (division model) that minimizes the sum of the code length and the code length of node importance division is calculated.

　以下で、具体的な計算手順に関する一例を無向グラフと有向グラフの場合に関してそれぞれ説明する。
[無向グラフの場合の一例]　
　本実施形態では、無向グラフに関し、リンクが「１」または「０」の２値をとり、分割数および各領域の大きさ（分割モデル）を最適化する場合を説明する。 Hereinafter, an example regarding a specific calculation procedure will be described for the case of an undirected graph and a directed graph, respectively.
[Example of undirected graph]
In the present embodiment, a description will be given of a case where a link takes a binary value of “1” or “0” and the number of divisions and the size of each region (division model) are optimized for an undirected graph.

　今、ノードはノード重要度に従ってソートされている、すなわちｘ_１≦…≦ｘ_ｎとする。ただし、ｎはノード数をあらわす。 Now, nodes are sorted according to the node importance, that is, x ₁ ≦ ... ≦ x _n. Here, n represents the number of nodes.

　また、ｘ^ｎ＝ｘ_１，…，ｘ_ｎとし、ｘ_ｉの値域は［０，Ｒ］とする。 Further, x ⁿ = x ₁ ,..., X _n and the value range of x _i is [0, R].

　ノード重要度の分布をヒストグラムによって近似する場合には（第２の実施の形態にも対応）、横軸（ノード重要度）を離散化する必要がある。このため、その最小刻み幅をｄとする。この時、各領域の切れ目をａ＝ｋｄと表す。ただし、ａ＝（ａ_１，…，ａ_ｍ－１）とし、ｋ＝（ｋ_１，…，ｋ_ｍ－１）。 When the distribution of node importance is approximated by a histogram (also corresponding to the second embodiment), the horizontal axis (node importance) needs to be discretized. Therefore, let d be the minimum step size. At this time, the break of each region is expressed as a = kd. However, a = (a ₁ ,..., A _m−1 ) and k = (k ₁ ,..., K _m−1 ).

　この時、各領域は［０，ａ_１］，（ａ_１，ａ_２］，…，（ａ_ｍ－１，Ｒ］と指定される。いま、ａ_０＝０、ａ_ｍ＝Ｒと定義し、Ｒ_ｉ＝ａ_ｉ－ａ_ｉ－１とする。また、各領域に属するデータの個数をｎ_ｉとする。また、領域の大きさの最小値をｄκとし、ｒ＝Ｒ／ｄと定義する。 At this time, each area is designated as [0, a ₁ ], (a ₁ , a ₂ ], ..., (a _m−1 , R], where a ₀ = 0 and a _m = R are defined. , R _i = a _i −a _i−1 , the number of data belonging to each region is n _i, and the minimum size of the region is defined as dκ and defined as r = R / d .

　ノード重要度の分割に必要な記述長は、（４）式から（９）式で定義する事が可能である。ただし、ｌｏｇ*ｄは、ｌｏｇ　ｄ＋ｌｏｇｌｏｇ　ｄ＋・・・として正のｌｏｇｌｏｇ・・・ｌｏｇ　ｄを足した量で、ｄの分布が未知の場合に最小の記述長を与える量であることが知られている。 The description length necessary for dividing node importance can be defined by equations (4) to (9). However, log * d is an amount obtained by adding positive loglog ... logd as log d + loglog d + ... and is known to give the minimum description length when the distribution of d is unknown. Yes.

（５）式は、ｘ^ｎを符合化するための記述長を表し、（８）式および（９）式がモデルを記述するために必要な記述長を表している。ここで言うモデルとは、分割の数（ｍ）および分割方法（ｋ，ｒ，ｄ，κ）を指す。

Expression (5) represents a description length for encoding ^xn , and Expressions (8) and (9) represent a description length necessary for describing a model. The model here refers to the number of divisions (m) and the division method (k, r, d, κ).

　ｍ，ｒ，ｄ，κが与えられると，グラフ符号長は（１０）式、（１１）式で定義する事が可能である。ただし、|Ｇ_ｉ，ｊ|は、部分グラフＧ_ｉ，ｊに含まれるリンクの総数、θ_ｉ，ｊは、部分グラフＧ_ｉ，ｊに割り当てられたベルヌーイ分布ｐ_ｉ，ｊに関する１の確率を表す。 Given m, r, d, and κ, the graph code length can be defined by equations (10) and (11). Where | G _{i, j} | is the total number of links included in subgraph G _{i, j} , θ _{i, j} is the probability of 1 for Bernoulli distribution p _{i, j} assigned to subgraph G _{i, j.} To express.

　また、グラフを符合化するためには、ｎ_ｉおよびｎ_ｊも符合化する必要があるが、ｎ_ｉおよびｎ_ｊの符号長は（５）式に含まれているため、Ｌ_Ｇには含めていない。 Further, in order to encode the graph is n _i and n _j must also be encoded, since the code length of n _i and n _j are included in equation (5), included in the L _G Not.

　最適パラメータ計算装置３１４は、（１２）式の最適化問題を解く事によって最適な分割パラメータ（分割モデル）を決定する。

The optimum parameter calculation device 314 determines the optimum division parameter (division model) by solving the optimization problem of the equation (12).

　（１２）式の最適化方法は、ｍ_ｍｉｎ≦ｍ≦ｍ_ｍａｘ、ｄ_ｍｉｎ≦ｄ≦ｄ_ｍａｘ、κ_ｍｉｎ≦κ≦κ_ｍａｘとパラメータの範囲を指定し、パラメータの組み合わせに対して網羅的に（１２）式を計算して最適なパラメータを計算する方法が考えられる。

The optimization method of the expression (12) specifies the parameter ranges as m _min ≦ m ≦ m _max , d _min ≦ d ≦ d _max , κ _min ≦ κ ≦ κ _max, and comprehensively covers the parameter combinations. A method of calculating the optimum parameter by calculating equation (12) is conceivable.

　また例えば、効率的な局所最小化解を探索するために、以下の勾配法による手順に従って、L_h+L_Gを小さくする方向へパラメータを動かしながら解を探索する方法が考えられる。 Further, for example, to search for efficient local minimization solutions, according to the procedure according to the following gradient method, a method of searching a solution while moving the parameter in the direction of reducing the L _h + L _G are considered.

　まず、分割ノード重要度kをk=k⁰と初期化する。 First, the split node importance k is initialized to k = k ⁰ .

　次に、p回目の繰り返しにおける分割ノード重要度をk^pとすると、k^p=min{k^p-1 _q+,　k^p-1 _q-}　としてk^pを計算する。ただし、k^p _q+=(k^p ₁,　k^p ₂,　…,　k^p _q　+　1,　…,　k^p _m-1)　およびk^p _q-=(k^p ₁,　k^p ₂,　…,　k^p _q　-　1,　…,　k^p _m-1)であり、これはp回目の繰り返しにおけるq番目の分割ノード重要度を正または負の方向へ一つずらした分割ノード重要度を表す。このように探索を行なうと、（８）式から計算されるkの符号長はL(k^p)　<=　L(k^p-1)である。 Then, when the split node importance in p th iteration and ^{^{k p, k p = min {}} k p-1 q +, k p-1 q-} to calculate the k ^p as. Where k ^p _{q +} = (k ^p ₁ , k ^p ₂ ,…, k ^p _q + 1,…, k ^p _m-1 ) and k ^p _q- = (k ^p ₁ , k ^p ₂ ,…, k ^p _q −1,..., k ^p _m−1 ), which represents the importance level of the divided node obtained by shifting the q-th divided node importance level in the p-th iteration by one in the positive or negative direction. When searching is performed in this manner, the code length of k calculated from the equation (8) is L (k ^p ) <= L (k ^p−1 ).

　以上の繰り返しを、L(k^p)　=　L(k^p-1)が満たされるまで探索を行う事で、L_h+L_Gを局所的に最小する解を効率よく計算する事が可能である。 By repeating the above iteration until L (k ^p ) = L (k ^p-1 ) is satisfied, it is possible to efficiently calculate a solution that locally minimizes L _h + L _G .

　以上の探索を、ｍ_ｍｉｎ≦ｍ≦ｍ_ｍａｘ、ｄ_ｍｉｎ≦ｄ≦ｄ_ｍａｘ、κ_ｍｉｎ≦κ≦κ_ｍａｘで指定された範囲のパラメータの組み合わせに対して行い、（１２）式を近似的に計算する。 The above search is performed for a combination of parameters in the ranges specified by m _min ≦ m ≦ m _max , d _min ≦ d ≦ d _max , and κ _min ≦ κ ≦ κ _max , and the expression (12) is approximated. calculate.

　この場合、最適パラメータ計算装置３１４は、分割記述長とグラフ記述長の和を小さくする方向へ分割結果を更新することを繰り返すことによって、最適化分割モデルを特定することになる。 In this case, the optimum parameter calculation device 314 identifies the optimized division model by repeatedly updating the division result in a direction to reduce the sum of the division description length and the graph description length.

　また例えば、以下の手順に従って動的計画問題として再帰的に最適パラメータを近似計算する方法が考えられる。 Also, for example, a method of approximating optimal parameters recursively as a dynamic programming problem according to the following procedure is conceivable.

　まず、ａ’＝（ａ,τ）とすると、（５）式および（１０）式は、（１３）式および（１４）式のように分解できる。ただし、ｎ（Ｒ）はノード重要度が［０，Ｒ］区間に含まれるノードの数とする。 First, if a ′ = (a, τ), the expressions (5) and (10) can be decomposed as the expressions (13) and (14). However, n (R) is the number of nodes included in the interval [0, R] with node importance.

　この時、Ｌ_ｍ（Ｒ）を（１５）式によって定義すると、（１３）式と（１４）式の和は（１６）式の動的計画問題を解く事によって最小化可能である。この動的計画問題は、G_m+1,jがa’に依存するため、（１３）式と（１４）式の和を厳密に最小化する事はできないが、近似解を得る事が可能である。

At this time, if L _m (R) is defined by equation (15), the sum of equations (13) and (14) can be minimized by solving the dynamic programming problem of equation (16). In this dynamic programming problem, since G _{m + 1, j} depends on a ′, the sum of Equations (13) and (14) cannot be strictly minimized, but an approximate solution can be obtained. It is.

　（１２）式の最小化の対象は、（１３）式と（１４）式の和に、Ｌ（ｋ）とＬ（ｍ，ｒ，ｄ，κ）を加えたものなので、動的計画法によって（１２）式の最小化問題の探索空間を狭める事が可能である。

The object of minimization of equation (12) is the sum of equations (13) and (14) plus L (k) and L (m, r, d, κ). It is possible to narrow down the search space for the minimization problem of equation (12).

　さらに、前記勾配法による探索と動的計画法による探索を組み合わせる事も考えられる。これは、前述の動的計画問題を解いて得られた解を、勾配法による探索の初期値として利用する。これによって、初期値をランダムに決定するよりもよい初期値が得られ、よりよい局所解を得る事が可能となる。 Furthermore, it is possible to combine the search by the gradient method and the search by the dynamic programming method. This uses the solution obtained by solving the above-mentioned dynamic programming problem as the initial value of the search by the gradient method. As a result, a better initial value can be obtained than when the initial value is determined randomly, and a better local solution can be obtained.

　この場合、最適パラメータ計算装置３１４は、動的計画法を用いて計算された分割結果を初期値とし、初期値から分割記述長とグラフ記述長の和を小さくする方向へ分割結果を更新することを繰り返すことによって、最適化分割モデルを特定することになる。 In this case, the optimum parameter calculation device 314 sets the division result calculated using dynamic programming as an initial value, and updates the division result from the initial value in a direction of reducing the sum of the division description length and the graph description length. By repeating the above, an optimized division model is specified.

［有向グラフの場合の一例]　
　本実施形態では、有向グラフに関し、リンクが「１」または「０」の２値をとり、分割数および各領域の大きさを最適化する場合を説明する。 [Example of directed graph]
In the present embodiment, a description will be given of a case where a link takes a binary value of “1” or “0” and the number of divisions and the size of each region are optimized for a directed graph.

　この例の説明では、無向グラフの場合の例と同様の表記を用い、rおよびcは、行および列に対する変数である事を表すとする。 In the explanation of this example, the same notation as in the case of the undirected graph is used, and it is assumed that r and c are variables for rows and columns.

　有向グラフの場合、ノード重要度、グラフ、モデルの符号長の総和は、（１７）式で与えられる。 In the case of a directed graph, the node importance, the sum of the code lengths of the graph and the model are given by equation (17).

　（１７）式の右辺の初項および第２項は、（５）式と同様に計算される行及び列のノード頻度分布符号長であり、第２項は（１８）式で与えられるグラフの符号長であり、第４項から第７項は（８）式および（９）式と同様に計算されるモデルの符号長を表している。 The first term and the second term on the right side of equation (17) are the row frequency and column node frequency distribution code lengths calculated in the same manner as equation (5), and the second term is the graph given by equation (18). The fourth term to the seventh term represent the code length of the model calculated in the same manner as the equations (8) and (9).

　最適パラメータ計算装置３１４は、（１７）式のＬを最小化するパラメータm_r,r_r,d_r,κ_r,　m_c,r_c,d_c,κ_c,を計算する事で、最適なグラフ分割を計算可能である。

The optimum parameter calculation device 314 calculates the parameters m _r , r _r , d _r , κ _r , m _c , r _c , d _c , κ _c , which minimize L in the equation (17). Graph partitioning can be calculated.

　最適化の方法は，パラメータの組み合わせに対して網羅的に（１７）式を計算して、Ｌを最小とするパラメータの組み合わせを選択する方法が考えられる。 As an optimization method, it is conceivable to comprehensively calculate equation (17) for a combination of parameters and select a combination of parameters that minimizes L.

　また例えば、無向グラフと同様にして、列方向および行方向の分割に対して勾配法による探索を繰り返す事によって局所的な解を計算する事も可能である。 Also, for example, as in the case of an undirected graph, it is also possible to calculate a local solution by repeating the search by the gradient method for the division in the column direction and the row direction.

　また例えば、無向グラフと同様にして、（１７）式を最小化する問題は動的計画問題に帰着させる事が可能であり、最適パラメータ計算装置３１４は、動的計画法を用いて最適パラメータを計算する事も可能である。 Further, for example, similarly to the undirected graph, the problem of minimizing the expression (17) can be reduced to a dynamic programming problem, and the optimum parameter calculation device 314 uses the dynamic programming method to obtain the optimum parameter. Can also be calculated.

　最適パラメータ計算装置３１４は、最小記述長原理に基づき、ノード重要度の値域の分割を符号化するために必要な記述長と、その分割の元でグラフを符号化するための記述長の和を最小とする分割パラメータ（分割数や各領域の大きさなど）を最適化する。 Based on the minimum description length principle, the optimum parameter calculation device 314 calculates the sum of the description length necessary for encoding the division of the node importance range and the description length for encoding the graph under the division. Optimize the division parameters to be minimized (number of divisions, size of each area, etc.).

　本実施の形態のグラフ構造推定装置３００の動作は、図８のステップＳ１０２の処理で、グラフの分割が計算される際、予め記憶されているルールによって分割するのではなく、ステップＳ１０１で計算されたノード重要度情報に対して分割の最適化を実施する点を除き、図２に示した第１の実施の形態に関わるグラフ構造推定装置１００の動作と同じである。 The operation of the graph structure estimation apparatus 300 according to the present embodiment is not calculated according to a pre-stored rule but calculated in step S101 when the graph is calculated in the process of step S102 of FIG. The operation is the same as that of the graph structure estimation apparatus 100 according to the first embodiment shown in FIG. 2 except that the optimization of the division is performed on the node importance information.

　本実施の形態では、最適化の基準として、最小記述長原理を用いた例を説明しているが、最適化の基準として、赤池情報量基準やベイズ情報量基準など、その他の類似の基準を用いる事も可能である。 In this embodiment, an example using the minimum description length principle is described as an optimization criterion, but other similar criteria such as Akaike information criterion and Bayesian information criterion are used as optimization criteria. It is also possible to use it.

　本実施形態によれば、ノード重要度分割符号長計算部３１３は、ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、ノード重要度情報および分割モデルを符号化するために必要な分割記述長を算出する。 According to the present embodiment, the node importance division code length calculation unit 313 uses the division result of the range of node importance as a division model, and encodes the node importance information and the division model for each different division model. Calculate the required division description length.

　グラフ符号長計算部３１２は、ノードのそれぞれを、ノード重要度情報に基づいて、分割モデルにて特定される複数のグループのいずれかに分けた際に、ノードのグループ分けによって分割されたグラフを符号化するためのグラフ記述長を、分割モデルごとに算出する。 When the graph code length calculation unit 312 divides each of the nodes into any of a plurality of groups specified by the division model based on the node importance information, the graph code length calculation unit 312 displays the graph divided by the node grouping. The graph description length for encoding is calculated for each division model.

　最適パラメータ計算装置３１４は、互いに異なる分割モデルの中から、分割記述長とグラフ記述長の和が最小になる最適化分割モデルを特定し、ノードのそれぞれを、ノード重要度情報に基づいて、最適化分割モデルにて特定される複数のグループのいずれかに分ける。 The optimum parameter calculation device 314 specifies an optimized division model that minimizes the sum of the division description length and the graph description length from among different division models, and optimizes each of the nodes based on the node importance information. It is divided into one of a plurality of groups specified by the generalized division model.

　このため、グラフ符号長を考慮しながら、最適な分割モデルを特定することが可能になる。 For this reason, it is possible to specify an optimal division model in consideration of the graph code length.

［第４の実施の形態］
　図１４は、本発明の第４の実施の形態に関わるグラフ構造推定装置４００を示したブロック図である。 [Fourth Embodiment]
FIG. 14 is a block diagram showing a graph structure estimation apparatus 400 according to the fourth embodiment of the present invention.

　グラフ構造推定装置４００は、例えば、ＣＰＵ、メモリおよび入出力装置を含むコンピュータである。グラフ構造推定装置４００は、ハードディスクまたはメモリに記録されたプログラムに従って動作する。 The graph structure estimation device 400 is a computer including a CPU, a memory, and an input / output device, for example. The graph structure estimation apparatus 400 operates according to a program recorded on a hard disk or a memory.

　グラフ構造推定装置４００は、プログラムを記録媒体から読み取り実行することによって、ノード重要度計算装置１１０、ノード重要度分割最適化装置４１０、および、推定構造出力装置１３０として機能する。 The graph structure estimation device 400 functions as a node importance calculation device 110, a node importance division optimization device 410, and an estimation structure output device 130 by reading and executing a program from a recording medium.

　図１４を参照すると、グラフ構造推定装置４００は、図１１に示した第３の実施の形態に関わるグラフ構造推定装置３００と比較して、ノード重要度分割最適化装置３１０に代えて、ノード重要度分割最適化装置４１０を有する点で相違する。 Referring to FIG. 14, the graph structure estimation apparatus 400 is different from the graph structure estimation apparatus 300 according to the third embodiment shown in FIG. The difference is that the degree-dividing optimization device 410 is provided.

　以下、グラフ構造推定装置４００について、グラフ構造推定装置３００との相違点を中心に説明する。 Hereinafter, the graph structure estimation apparatus 400 will be described focusing on differences from the graph structure estimation apparatus 300.

　ノード重要度分割最適化装置４１０は、一般的に分割手段と呼ぶことができる。 The node importance division optimization device 410 can be generally called a division means.

　ノード重要度分割最適化装置４１０は、図１５に示されるように、符号長計算部記憶装置４１１と、最適パラメータ計算装置４１３とを備えている。符号長計算部記憶装置４１１は、ノード重要度分割符号長計算部３１３および部分グラフ分割符号長計算部４１２を記憶している。 The node importance division optimization device 410 includes a code length calculation unit storage device 411 and an optimum parameter calculation device 413, as shown in FIG. The code length calculation unit storage device 411 stores a node importance division code length calculation unit 313 and a subgraph division code length calculation unit 412.

　また、ノード重要度分割最適化装置４１０は、グラフデータ１４０と、ノード重要度計算装置１１０で計算されたノード重要度情報３１５とを入力とし、グラフ構造推定結果１５０を出力する。 Further, the node importance degree division optimization apparatus 410 receives the graph data 140 and the node importance degree information 315 calculated by the node importance degree calculation apparatus 110, and outputs a graph structure estimation result 150.

　第３の実施の形態に関わるグラフ構造推定装置３００との機能的な相違点は、本実施の形態では、グラフをノード重要度に従った分割をして得られる部分グラフを、さらに部分グラフへ分割する点である。 The functional difference from the graph structure estimation apparatus 300 according to the third embodiment is that, in this embodiment, a subgraph obtained by dividing a graph according to node importance is further converted into a subgraph. It is a point to divide.

　ノード重要度分割最適化装置４１０は、ノード重要度の値域の分割モデル（分割用ノード重要度の設定）を、任意の最適化方法（最適化基準）、例えば、最小記述長原理、赤池情報量基準、または、ベイズ情報量基準を利用して、グラフデータに基づき計算する事が可能である。 The node importance division optimization device 410 uses a node importance value range division model (setting of node importance for division) as an arbitrary optimization method (optimization criterion), for example, the minimum description length principle, Akaike information amount It is possible to calculate based on graph data using a standard or a Bayesian information criterion.

　図１６は、本実施の形態のグラフ分割の一例を示している。図１６の左図は、入力されたグラフをノード重要度に従って分割する例であり、右図は、部分グラフＧ_２，２がさらにその内部で分割されている事を示している。 FIG. 16 shows an example of graph division according to the present embodiment. The left figure of FIG. 16 is an example of dividing the input graph according to the node importance, and the right figure shows that the partial graphs G _{2 and 2} are further divided inside.

　符号長計算部記憶装置４１１は、一般的に符号長計算部記憶手段と呼ぶことができる。 The code length calculation unit storage device 411 can be generally called code length calculation unit storage means.

　部分グラフ分割符号長計算部４１２は、一般的に部分グラフ分割符号長計算手段と呼ぶことができる。 The subgraph division code length calculation unit 412 can be generally called a subgraph division code length calculation means.

　なお、ノード重要度分割符号長計算部３１３と部分グラフ分割符号長計算部４１２は、例えば、コンピュータにて実行されたときに所定の計算を実行するプログラムでもよい。 The node importance division code length calculation unit 313 and the subgraph division code length calculation unit 412 may be programs that execute predetermined calculations when executed by a computer, for example.

　ノード重要度分割符号長計算部３１３と部分グラフ分割符号長計算部４１２は、あるパラメータにおける、ノード重要度の値域の分割を符号化するために必要な記述長と、その分割の元で各部分グラフをさらに分割し、それを符号化するための記述長を計算するための手順を示す。 The node importance division code length calculation unit 313 and the subgraph division code length calculation unit 412 each have a description length necessary for encoding the division of the node importance value range in a certain parameter, and each part based on the division. The procedure for further dividing the graph and calculating the description length for encoding it is shown.

　例えば、部分グラフ分割符号長計算部４１２は、ノードのそれぞれを、ノードの重要度情報に基づいて、分割モデルにて特定される複数のグループのいずれかに分けた際に生じる各グループを、グループ内のノード間の関係に基づいて再分割した際に、各グループでの再分割の状態を符号化するために必要な再分割符号長を、分割モデルごとに算出する。 For example, the subgraph division code length calculation unit 412 assigns each group generated when dividing each of the nodes into one of a plurality of groups specified by the division model based on the importance information of the node. When the subdivision is performed based on the relationship between the nodes, the subdivision code length necessary for encoding the subdivision state in each group is calculated for each division model.

　ノード重要度の値域の分割を符号化するために必要な記述長は、ノード重要度分割最適化装置２１０が行う方法と同様の方法で計算可能である。 The description length necessary for encoding the division of the node importance value range can be calculated by a method similar to the method performed by the node importance division optimization apparatus 210.

　部分グラフの分割とその符合化方法は、例えば非特許文献１で提案された分割手法とその記述長の計算手段や、公知技術である木構造を用いた分割手法とその記述長の計算手段などを利用する事が可能である。 Subgraph division and its encoding method are, for example, the division method proposed in Non-Patent Document 1 and its description length calculation means, the well-known division method using a tree structure and its description length calculation means, etc. Can be used.

　最適パラメータ計算装置４１３は、一般的に分割制御手段と呼ぶことができる。 The optimum parameter calculation device 413 can be generally called a division control means.

　最適パラメータ計算装置４１３は、グラフデータ１４０とノード重要度情報３１５を読み込むと、符号長計算部記憶装置４１１に記憶されている計算部３１３および４１２を読み込み、計算部３１３および４１２を用いて、部分グラフ分割の符号長とノード重要度の値域の分割の符号長の和を最小とするパラメータを計算する。 When the optimal parameter calculation device 413 reads the graph data 140 and the node importance information 315, the optimum parameter calculation device 413 reads the

calculation units

313 and 412 stored in the code length calculation unit storage device 411, and uses the

calculation units

313 and 412 to A parameter that minimizes the sum of the code length of the graph division and the code length of the division of the node importance value range is calculated.

　例えば、最適パラメータ計算装置４１３は、互いに異なる分割モデルの中から、分割記述長と再分割符号長の和が最小になる最適化分割モデルを特定し、ノードのそれぞれを、ノードの重要度情報に基づいて、最適化分割モデルにて特定される複数のグループのいずれかに分ける。 For example, the optimum parameter calculation device 413 identifies an optimized division model that minimizes the sum of the division description length and the subdivision code length from among different division models, and sets each of the nodes as node importance information. Based on the plurality of groups specified by the optimized division model.

　本実施形態によれば、ノード重要度分割符号長計算部３１３は、分割用ノード重要度を用いたノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、ノード重要性および分割モデルを符号化するために必要な分割記述長を算出する。 According to the present embodiment, the node importance division code length calculation unit 313 uses the division result of the node importance value range using the node importance for division as a division model, and sets the node importance and difference for each different division model. A division description length necessary for encoding the division model is calculated.

　部分グラフ分割符号長計算部４１２は、ノードのそれぞれを、ノードのノード重要度情報に基づいて、分割モデルにて特定される複数のグループのいずれかに分けた際に生じる各グループを、グループ内のノード間の関係に基づいて再分割した際に、各グループでの再分割の状態を符号化するために必要な再分割符号長を、分割モデルごとに算出する。 The subgraph division code length calculation unit 412 divides each group generated when each node is divided into any of a plurality of groups specified by the division model based on the node importance information of the node. When the subdivision is performed based on the relationship between the nodes, the subdivision code length necessary for encoding the subdivision state in each group is calculated for each division model.

　最適パラメータ計算装置４１３は、互いに異なる分割モデルの中から、分割記述長と再分割符号長の和が最小になる最適化分割モデルを特定し、ノードのそれぞれを、ノード重要度情報に基づいて、最適化分割モデルにて特定される複数のグループのいずれかに分ける。 The optimal parameter calculation device 413 identifies an optimized division model that minimizes the sum of the division description length and the re-division code length from among different division models, and determines each of the nodes based on the node importance information, It is divided into one of a plurality of groups specified by the optimized division model.

　このため、部分グラフの再分割を考慮しながら、最適な分割モデルを特定することが可能になる。 For this reason, it becomes possible to identify the optimal division model while taking into account the subdivision of the subgraph.

　［第５の実施の形態］
　図１７は、本発明の第５の実施の形態に関わるグラフ構造推定装置５００を示したブロック図である。 [Fifth Embodiment]
FIG. 17 is a block diagram showing a graph structure estimation apparatus 500 according to the fifth embodiment of the present invention.

　グラフ構造推定装置５００は、例えば、ＣＰＵ、メモリおよび入出力装置を含むコンピュータである。グラフ構造推定装置５００は、ハードディスクまたはメモリに記録されたプログラムに従って動作する。 The graph structure estimation apparatus 500 is a computer including a CPU, a memory, and an input / output device, for example. The graph structure estimation apparatus 500 operates according to a program recorded on a hard disk or a memory.

　グラフ構造推定装置５００は、プログラムを記録媒体から読み取り実行することによって、ノード重要度計算装置５１０、動的ノード重要度分割最適化装置５２０、および、推定構造出力装置１３０として機能する。 The graph structure estimation apparatus 500 functions as a node importance calculation apparatus 510, a dynamic node importance division optimization apparatus 520, and an estimation structure output apparatus 130 by reading and executing a program from a recording medium.

　図１７を参照すると、グラフ構造推定装置５００は、図９に示した第２の実施の形態に関わるグラフ構造推定装置２００と比較して、ノード重要度計算装置１１０に代えてノード重要度計算装置５１０を有する点、ノード重要度分割最適化装置２１０に代えて、動的ノード重要度分割最適化装置５２０を有する点、グラフデータ１４０に代えてグラフデータ５３０を入力する点、グラフ構造推定結果１５０に代えてグラフ構造推定結果５４０を出力する点で相違する。 Referring to FIG. 17, the graph structure estimation apparatus 500 is replaced with the node importance calculation apparatus 110 in comparison with the graph structure estimation apparatus 200 according to the second embodiment illustrated in FIG. 9. A point having 510, a point having a dynamic node importance division optimization device 520 instead of the node importance division optimization device 210, a point inputting graph data 530 instead of the graph data 140, and a graph structure estimation result 150 Instead, the graph structure estimation result 540 is output.

　グラフデータ５３０は、時系列的に得られたグラフ列であり、Ｇ^ｔ＝Ｇ_１，Ｇ_２，・・・，Ｇ_ｔとする。グラフ構造推定結果５４０は、グラフデータ５３０に対応する各時刻におけるグラフ構造の列である。 The graph data 530 is a graph sequence obtained in time series, and G ^t = G ₁ , G ₂ ,..., G _t . The graph structure estimation result 540 is a graph structure column at each time corresponding to the graph data 530.

　ノード重要度計算装置５１０は、一般的に計算手段と呼ぶことができる。 The node importance calculation device 510 can be generally called a calculation means.

　ノード重要度計算装置５１０は、入力されたグラフ列Ｇ^ｔに対して、各時刻におけるノード重要度の列ｘ^ｔ，ｎ＝ｘ_１ ^ｎ，ｘ_２ ^ｎ，・・・，ｘ_ｔ ^ｎを計算する。例えば、ノード重要度計算装置５１０は、グラフデータを時系列で受け付け、グラフデータを受け付けるたびに、グラフデータに基づいて、グラフデータに表されたノードごとにノード重要度を計算する。 The node importance calculation device 510 calculates a node importance sequence x ^{t, n} = x ₁ ⁿ , x ₂ ⁿ ,..., X _t ⁿ at each time for the input graph sequence G ^t . . For example, the node importance calculation device 510 receives graph data in time series, and calculates the node importance for each node represented in the graph data based on the graph data every time the graph data is received.

　なお、ノード重要度計算装置５１０は、ノード重要度として、ノード重要度計算装置１１０のように、任意の指標（例えば、各ノードの持つリンクの頻度、ページランク、ハブ指標、または、オーソリィテ指標）を利用することが可能である。 The node importance calculation device 510 has an arbitrary index (for example, the link frequency, page rank, hub index, or authority index of each node) as the node importance calculation device 110 as the node importance. Can be used.

　動的ノード重要度分割最適化装置５２０は、一般的に分割手段と呼ぶことができる。 The dynamic node importance division optimization apparatus 520 can be generally called division means.

　動的ノード重要度分割最適化装置５２０は、グラフデータが受け付けられるたびに、ノード重要度の値域の分割を、任意の最適化基準（例えば、最小記述長原理、赤池情報量基準、または、ベイズ情報量基準）にしたがって、各グラフデータのノード重要度に基づいて最適化する。 The dynamic node importance division optimization device 520 divides the range of the node importance values by any optimization criterion (for example, the minimum description length principle, the Akaike information criterion, or Bayesian each time graph data is received. Optimized based on the node importance of each graph data according to the information amount standard).

　動的ノード重要度分割最適化装置５２０は、図１８に示されるように、グラフ列符号長計算部記憶装置５１１と、モデル列符号長計算部記憶装置５１２と、最適パラメータ計算装置５１３とを備えている。 As illustrated in FIG. 18, the dynamic node importance division optimization device 520 includes a graph sequence code length calculation unit storage device 511, a model sequence code length calculation unit storage device 512, and an optimal parameter calculation device 513. ing.

　また、動的ノード重要度分割最適化装置５２０は、グラフデータ５３０と、ノード重要度計算装置５１０で計算されたノード重要度情報５１４とを入力とし、グラフ構造推定結果５４０を出力する。 Also, the dynamic node importance division optimization device 520 receives the graph data 530 and the node importance information 514 calculated by the node importance calculation device 510, and outputs a graph structure estimation result 540.

　最適パラメータ計算装置５１３は、一般的に分割制御手段と呼ぶことができる。 The optimum parameter calculation device 513 can be generally called a division control means.

　最適パラメータ計算装置５１３は、入力されたグラフデータ５３０およびノード重要度情報５１４に対して、グラフ列符号長計算部記憶装置５１１およびモデル列符号長計算部記憶装置５１２から読み込まれた計算手段を利用して計算されるグラフ列およびモデル列の符号長の和を最小とするモデル（分割モデル）を選択する。 The optimum parameter calculation device 513 uses calculation means read from the graph sequence code length calculation unit storage device 511 and the model sequence code length calculation unit storage device 512 for the input graph data 530 and node importance level information 514. The model (division model) that minimizes the sum of the code lengths of the graph sequence and the model sequence calculated in this way is selected.

　最適パラメータ計算装置５１３は、例えば非特許文献６で提案されている動的モデル選択の枠組みを利用する事によって実現可能である。 The optimal parameter calculation device 513 can be realized by using a dynamic model selection framework proposed in Non-Patent Document 6, for example.

　今、時刻ｔにおけるモデルをＭ_ｔとし、モデルの列をＭ^ｔ＝Ｍ_１，Ｍ_２，・・・，Ｍ_ｔとする。 Now, let the model at time t be M _t , and the model row be M ^t = M ₁ , M ₂ ,..., M _t .

　動的モデル選択では、最適パラメータ計算装置５１３は、データ列（本実施形態では、グラフシーケンスＧ^ｔおよびノード重要度列ｘ^ｔ，ｎ）とモデル列Ｍ^ｔを符号化するための符号長を表す（１９）式を最小化するモデルを選択する。 In the dynamic model selection, the optimum parameter calculation device 513 represents a code length for encoding the data sequence (in this embodiment, the graph sequence G ^t and the node importance sequence x ^{t, n} ) and the model sequence M ^t. (19) Select a model that minimizes the equation.

　グラフ列符号長計算部５１１で計算されるグラフ列の符号長とは、（１９）式の右辺第１項であり、例えば符号長計算部記憶装置３１１や符号長計算部記憶装置４１１に記憶されている計算部によって計算される符号長とする事が可能である。

The code length of the graph sequence calculated by the graph sequence code length calculation unit 511 is the first term on the right side of the equation (19), and is stored in the code length calculation unit storage device 311 or the code length calculation unit storage device 411, for example. The code length can be calculated by the calculation unit.

　例として符号長計算部記憶装置３１１を利用する場合には、グラフ列符号長計算部５１１は、（５）式および（１０）式をグラフ列の符号長とすることができる。 As an example, when the code length calculation unit storage device 311 is used, the graph string code length calculation unit 511 can use the expressions (5) and (10) as the code length of the graph string.

　また、グラフ列符号長計算部５１１は、非特許文献６に示されるように、予測的確率的コンプレキシティを利用して（１９）式の右辺第１項を定義し計算する事も可能である。 In addition, as shown in Non-Patent Document 6, the graph sequence code length calculation unit 511 can also define and calculate the first term on the right side of Equation (19) using predictive probabilistic complexity. is there.

　その場合には、θ_Ｍｔを、時刻ｔにおいてモデルＭ_ｔを仮定した場合のデータの分布のパラメータとし、θ_Ｍｔ ^ｔ－１＝θ_Ｍ１，θ_Ｍ２，…，θ_Ｍｔ－１とすると、（１９）式の右辺第１項は（２０）式で与えられる。 In this case, if θ _Mt is a parameter of data distribution when the model M _t is assumed at time t, and θ _Mt ^t−1 = θ _M1 , θ _M2 ,..., Θ _Mt−1 , (19 ) The first term on the right side of the equation is given by equation (20).

　モデル列符号長計算部記憶装置５１２で計算されるモデル列の符号長とは、（１９）式の右辺第２項であり、例えば符号長計算部記憶装置３１１や符号長計算部記憶装置４１１に記憶されている計算部によって計算される符号長とする事が可能である。

The code length of the model sequence calculated by the model sequence code length calculation unit storage device 512 is the second term on the right side of the equation (19). For example, the code length calculation unit storage device 311 or the code length calculation unit storage device 411 The code length calculated by the stored calculation unit can be used.

　例として符号長計算部記憶装置３１１を利用する場合には、（８）式および（９）式がモデルの記述長に相当する。 For example, when the code length calculation unit storage device 311 is used, the expressions (8) and (9) correspond to the description length of the model.

　また例えば、非特許文献６に示されるように、モデルの時間遷移モデルを考慮し、モデルに対する予測的確率的コンプレキシティを利用して（１９）式の右辺第２項を定義し計算する事も可能である。 In addition, for example, as shown in Non-Patent Document 6, the second term on the right side of the equation (19) is defined and calculated using a predictive stochastic complexity for the model in consideration of the time transition model of the model. Is also possible.

　その場合には、時刻ｔにおけるモデル遷移のパラメータをα_ｔとし、α^ｔ=α_１，α_２，…，α_ｔとすると、（１９）式の右辺第２項は、（２１）式で与えられる。 In that case, if the parameter of the model transition at time t is α _t and α ^t = α ₁ , α ₂ ,..., Α _t , the second term on the right side of equation (19) is given by equation (21) It is done.

　最適パラメータ計算装置５１３は、（１９）式を最小化するモデル列およびパラメータを計算する方法として、候補となるモデル列およびパラメータの組み合わせを事前に設定し、それぞれに対して（１９）式を計算し、最小となるモデル列およびパラメータを選択する方法が考えられる。

The optimal parameter calculation device 513 sets a combination of candidate model sequences and parameters in advance as a method for calculating the model sequence and parameters for minimizing the formula (19), and calculates the formula (19) for each. Then, a method of selecting a model string and a parameter that are minimized can be considered.

　また、非特許文献６に示されるように、（１９）式を最小化するためのモデル列を、動的計画法を利用して計算する事も可能である。 Further, as shown in Non-Patent Document 6, it is also possible to calculate a model sequence for minimizing the equation (19) using dynamic programming.

　本実施形態によれば、動的ノード重要度分割最適化装置５２０は、ノード重要度の値域の分割結果を、任意の最適化基準にしたがって、各グラフデータのノード重要度に基づいて最適化する。 According to the present embodiment, the dynamic node importance division optimization device 520 optimizes the division result of the node importance value range based on the node importance of each graph data according to an arbitrary optimization criterion. .

　このため、ノード重要度分布に従ったノードの構造が、時間的に変化する場合に、その構造を特定する事が可能になる。 Therefore, when the node structure according to the node importance distribution changes with time, the structure can be specified.

　本実施形態では、グループ（クラスタ）の数の遷移を明示的にモデル化しているため、ノードの構造が時間的に変化する場合に、その構造を高い精度で推定することが可能になる。 In this embodiment, since the transition of the number of groups (clusters) is explicitly modeled, when the node structure changes with time, the structure can be estimated with high accuracy.

　［第６の実施の形態］
　図１９は、本発明の第６の実施の形態に関わるグラフ構造推定装置６００を示したブロック図である。 [Sixth Embodiment]
FIG. 19 is a block diagram showing a graph structure estimation apparatus 600 according to the sixth embodiment of the present invention.

　グラフ構造推定装置６００は、例えば、ＣＰＵ、メモリおよび入出力装置を含むコンピュータである。グラフ構造推定装置６００は、ハードディスクまたはメモリに記録されたプログラムに従って動作する。 The graph structure estimation apparatus 600 is a computer including a CPU, a memory, and an input / output device, for example. The graph structure estimation apparatus 600 operates according to a program recorded in a hard disk or memory.

　グラフ構造推定装置６００は、プログラムを記録媒体から読み取り実行することによって、ノード重要度計算装置１１０、動的ノード重要度分割最適化装置６１０、モデルパラメータ記憶装置６２０、および、推定構造出力装置１３０として機能する。 The graph structure estimation apparatus 600 reads out and executes a program from a recording medium, thereby forming a node importance calculation apparatus 110, a dynamic node importance division optimization apparatus 610, a model parameter storage apparatus 620, and an estimated structure output apparatus 130. Function.

　図１９を参照すると、グラフ構造推定装置６００は、図９に示した第２の実施の形態に関わるグラフ構造推定装置２００と比較して、ノード重要度分割最適化装置２１０に代えて、動的ノード重要度分割最適化装置６１０を有する点、および、モデルパラメータ記憶装置６２０を有する点、および、グラフデータ１４０に代えてグラフデータ６３０が入力される点、および、グラフ構造推定結果１５０に代えてグラフ構造推定結果６４０が出力される点で相違する。 Referring to FIG. 19, the graph structure estimation apparatus 600 is replaced with the node importance degree division optimization apparatus 210 as compared with the graph structure estimation apparatus 200 according to the second embodiment illustrated in FIG. 9. The point having the node importance division optimization device 610, the point having the model parameter storage device 620, the point where the graph data 630 is input instead of the graph data 140, and the graph structure estimation result 150 instead The difference is that a graph structure estimation result 640 is output.

　グラフデータ６３０は、時系列的に得られるグラフであり、時刻ｔに入力されるグラフデータをＧ_ｔとする。 Graph data 630, when a series obtainable graph, the graph data input at time t and G _t.

　グラフ構造推定結果６４０は、グラフデータ６３０に対応する各時刻におけるグラフ構造である。 The graph structure estimation result 640 is a graph structure at each time corresponding to the graph data 630.

　モデルパラメータ記憶装置６２０は、一般的に格納手段と呼ぶことができる。 The model parameter storage device 620 can be generally referred to as storage means.

　モデルパラメータ記憶装置６２０には、前の時刻までに計算された時間遷移モデルのパラメータ（このパラメータは、過去の最適化されたノード分布（ノード重要度の値域）の分割結果を示すことになる）が記憶されている。ここで言う、時間遷移モデルのパラメータとは、例えば第５の実施の形態で言うところの、Ｍ^ｔ，θ_Ｍｔ ^ｔ，α^ｔなどが相当する。 In the model parameter storage device 620, the parameter of the time transition model calculated up to the previous time (this parameter indicates the division result of the past optimized node distribution (node importance value range)). Is remembered. Here, the parameters of the time transition model correspond to, for example, M ^t , θ _Mt ^t , α ^t, etc., as described in the fifth embodiment.

　動的ノード重要度分割最適化装置６１０は、一般的に分割制御手段と呼ぶことができる。 The dynamic node importance division optimization device 610 can be generally called division control means.

　動的ノード重要度分割最適化装置６１０は、グラフデータ６３０、ノード重要度計算装置１１０で計算されたノード重要度情報、および、モデルパラメータ記憶装置６２０に記憶されているパラメータを読み込み、その時刻におけるグラフのノード重要度情報に従った分割を計算する。この計算には、例えば非特許文献３で提案されている逐次的動的選択アルゴリズムを適用する事が可能である。 The dynamic node importance division optimization device 610 reads the graph data 630, the node importance information calculated by the node importance calculation device 110, and the parameters stored in the model parameter storage device 620, and at that time Calculate the division according to the node importance information of the graph. For this calculation, for example, a sequential dynamic selection algorithm proposed in Non-Patent Document 3 can be applied.

　例えば、動的ノード重要度分割最適化装置６１０は、グラフデータが受け付けられるたびに、モデルパラメータ記憶装置６２０に格納された過去の最適化されたノード重要度の値域の分割結果と、ノードごとのノード重要度情報と、に基づいて、ノード重要度の値域の分割モデルを、任意の最適化基準（例えば、最小記述長原理、赤池情報量基準、または、ベイズ情報量基準）にしたがって最適化する。 For example, each time the graph data is received, the dynamic node importance division optimization device 610, the division result of the past optimized node importance value range stored in the model parameter storage device 620, the Based on the node importance information, the node importance value range division model is optimized according to any optimization criterion (for example, the minimum description length principle, Akaike information criterion, or Bayesian information criterion). .

　この例では、各時刻においてグラフデータおよびノード重要度情報が入力されるごとに、動的ノード重要度分割最適化装置６１０は、候補となる全てのモデルに関して（２０）式で表される予測的確率的コンプレキシティを計算する。そして、動的ノード重要度分割最適化装置６１０は、その最小値に対応するモデルを、各時刻に対する最適なモデルとして選択し出力する。 In this example, every time graph data and node importance information are input at each time, the dynamic node importance division optimization apparatus 610 predictively expressed by equation (20) for all candidate models. Calculate probabilistic complexity. Then, the dynamic node importance division optimization apparatus 610 selects and outputs the model corresponding to the minimum value as the optimum model for each time.

　本実施形態によれば、動的ノード重要度分割最適化装置６１０は、グラフデータが受け付けられるたびに、モデルパラメータ記憶装置６２０に格納された過去の最適化されたノード分布（ノード重要度の値域）の分割結果と、ノードごとのノード重要度情報と、に基づいて、ノード重要度の値域の分割を、任意の最適化基準にしたがって最適化する。このため、逐次的に、分割モデルを最適化できる。 According to the present embodiment, the dynamic node importance division optimization device 610 performs the past optimized node distribution (range of node importance values stored in the model parameter storage device 620) each time graph data is received. ) And the node importance value information for each node, the division of the node importance value range is optimized according to an arbitrary optimization criterion. For this reason, a division | segmentation model can be optimized sequentially.

　なお、上記各実施形態は、ブログやウェブページにおけるオピニオンリーダーやネットワーク構造の分析に適用可能である。 Note that each of the above embodiments is applicable to analysis of opinion leaders and network structures in blogs and web pages.

　ノード重要度分布（ノード重要度の値域）の分割の上位の領域は、リンクを多く保有するノードに相当する。この領域に入るノードを分析する事でオピニオンリーダーを発見したり、オピニオンリーダー同士の関係などを分析する事が可能である。特に有向グラフの場合には、送信の多いノードと受信の多いノードの分析をする事ができる。 The upper area of the node importance distribution (node importance value range) corresponds to a node having many links. By analyzing the nodes that enter this area, it is possible to discover opinion leaders and analyze relationships among opinion leaders. In particular, in the case of a directed graph, it is possible to analyze a node having a high transmission and a node having a high reception.

　論文、ニュース記事、ブログ記事の単語の共起をグラフで表現すると、リンクを多く持つ単語は、現在流行している話題に関連する語句であり、そのような関係や構造を分析する事が可能となる。 When the co-occurrence of words in papers, news articles, and blog articles is expressed in a graph, words with many links are phrases related to the topic that is currently popular, and it is possible to analyze such relationships and structures It becomes.

　以上、各実施形態を参照して本願発明を説明したが、本願発明は上記各実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 As mentioned above, although this invention was demonstrated with reference to each embodiment, this invention is not limited to said each embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

　この出願は、２００８年２月２７日に出願された日本出願特願２００８－４６０９７を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2008-46097 filed on Feb. 27, 2008, the entire disclosure of which is incorporated herein.

Claims

　複数のノードと、当該複数のノードのうちの各ノード間の関係の程度を示すリンクと、によって表されるグラフデータを受け付けた場合に、当該グラフデータに基づいて、前記ノードごとに、当該ノードの重要性の程度を示す重要度情報を計算する計算手段と、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、複数のグループのいずれかに分ける分割手段と、
　前記ノードのグループ分けの結果を、前記グラフデータのグラフ構造情報として出力する出力手段と、を含むグラフ構造推定装置。 When graph data represented by a plurality of nodes and a link indicating the degree of relationship between each of the plurality of nodes is received, for each node based on the graph data, A calculation means for calculating importance information indicating the degree of importance of
Dividing means for dividing each of the nodes into one of a plurality of groups based on importance information of the node;
An output unit that outputs a result of grouping the nodes as graph structure information of the graph data.
　前記リンクは、前記各ノード間の関係の程度を数値で示し、
　前記計算手段は、前記リンクが示す数値のうち同一のノードに関係する数値の総和を、前記ノードごとに計算し、当該総和を、当該ノードの重要度情報として用いる、請求の範囲第１項に記載のグラフ構造推定装置。 The link indicates the degree of relationship between the nodes as a numerical value,
The calculation means according to claim 1, wherein the calculation means calculates a sum of numerical values related to the same node among the numerical values indicated by the link, and uses the total as the importance level information of the node. The graph structure estimation apparatus described.
　前記計算手段は、ページランクアルゴリズムにしたがって、前記グラフデータに基づいて、前記ノードごとに、当該ノードのページランクを計算し、当該ページランクを、当該ノードの重要度情報として用いる、請求の範囲第１項に記載のグラフ構造推定装置。 The calculation means calculates a page rank of the node for each node based on the graph data according to a page rank algorithm, and uses the page rank as importance level information of the node. The graph structure estimation apparatus according to item 1.
　前記計算手段は、ＨＩＴＳアルゴリズムにしたがって、前記グラフデータに基づいて、前記ノードごとに、当該ノードのハブ指標を計算し、当該ハブ指標を、当該ノードの重要度情報として用いる、請求の範囲第１項に記載のグラフ構造推定装置。 The calculation means calculates a hub index of the node for each node based on the graph data according to a HITS algorithm, and uses the hub index as importance level information of the node. The graph structure estimation apparatus according to item.
　前記計算手段は、ＨＩＴＳアルゴリズムにしたがって、前記グラフデータに基づいて、前記ノードごとに、当該ノードのオーソリティ指標を計算し、当該オーソリティ指標を、当該ノードの重要度情報として用いる、請求の範囲第１項に記載のグラフ構造推定装置。 The calculation means calculates an authority index of the node for each of the nodes based on the graph data according to a HITS algorithm, and uses the authority index as importance level information of the node. The graph structure estimation apparatus according to item.
　前記分割手段は、さらに、前記ノード重要度情報の取りうる最小値から最大値までを表すノード重要度の値域を、１つまたは複数の分割用ノード重要度を用いて分割して得られた複数の分割領域を、前記複数のグループとする、請求の範囲第１項から第５項のいずれか１項に記載のグラフ構造推定装置。 The dividing means further includes a plurality of node importance values obtained by dividing a node importance value range representing a minimum value to a maximum value that can be taken by the node importance information by using one or a plurality of dividing node importance values. The graph structure estimation apparatus according to any one of claims 1 to 5, wherein the divided areas are the plurality of groups.
　前記分割手段は、前記ノード重要度の値域の分割結果を、任意の最適化基準にしたがって、前記グラフデータに基づき最適化する、請求の範囲第６項に記載のグラフ構造推定装置。 The graph structure estimation device according to claim 6, wherein the dividing means optimizes a result of dividing the range of node importance based on the graph data according to an arbitrary optimization criterion.
　前記分割手段は、前記任意の最適化基準として、最小記述長原理、赤池情報量基準、または、ベイズ情報量基準を用いる、請求の範囲第７項に記載のグラフ構造推定装置。 The graph structure estimation apparatus according to claim 7, wherein the dividing unit uses a minimum description length principle, an Akaike information criterion, or a Bayes information criterion as the arbitrary optimization criterion.
　前記分割手段は、
　前記ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、前記ノード重要度情報および前記分割モデルを符号化するために必要な分割記述長を算出する分割符号長計算手段と、
　前記互いに異なる分割モデルの中から、前記分割記述長が最小になる最適化分割モデルを特定し、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、当該最適化分割モデルにて特定される前記複数のグループのいずれかに分ける分割制御手段と、を含む、請求の範囲第７項または第８項に記載のグラフ構造推定装置。 The dividing means includes
A division code length calculation means for calculating a division description length necessary for encoding the node importance information and the division model for each of different division models, using a division result of the node importance value range as a division model; ,
An optimized partition model that minimizes the partition description length is identified from the different partition models, and each of the nodes is identified by the optimized partition model based on importance information of the node. The graph structure estimation apparatus according to claim 7, further comprising: a division control unit that divides the plurality of groups into any of the plurality of groups.
　前記分割手段は、
　前記ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、前記ノード重要度情報および前記分割モデルを符号化するために必要な分割記述長を算出する分割符号長計算手段と、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記分割モデルにて特定される前記複数のグループのいずれかに分けた際に、前記ノードのグループ分けによって分割されたグラフを符号化するためのグラフ記述長を、前記分割モデルごとに算出するグラフ符号長計算手段と、
　前記互いに異なる分割モデルの中から、前記分割記述長と前記グラフ記述長の和が最小になる最適化分割モデルを特定し、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、当該最適化分割モデルにて特定される前記複数のグループのいずれかに分ける分割制御手段と、を含む、請求の範囲第７項または第８項に記載のグラフ構造推定装置。 The dividing means includes
A division code length calculation means for calculating a division description length necessary for encoding the node importance information and the division model for each of different division models, using a division result of the node importance value range as a division model; ,
When each of the nodes is divided into any of the plurality of groups specified by the division model based on importance information of the node, the graph divided by the grouping of the nodes is encoded. Graph code length calculating means for calculating a graph description length for each of the division models;
From among the different partition models, an optimized partition model that minimizes the sum of the partition description length and the graph description length is specified, and each of the nodes is determined based on importance information of the node. The graph structure estimation apparatus according to claim 7, further comprising: a division control unit that divides the group into any of the plurality of groups specified by the generalized division model.
　前記分割制御手段は、動的計画法を用いて、前記最適化分割モデルを特定する、請求の範囲第９項または第１０項に記載のグラフ構造推定装置。 The graph structure estimation device according to claim 9 or 10, wherein the partition control means specifies the optimized partition model using dynamic programming.
　前記分割制御手段は、前記分割記述長を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第９項に記載のグラフ構造推定装置。 10. The graph structure estimation apparatus according to claim 9, wherein the division control unit identifies the optimized division model by repeatedly updating the division result in a direction to reduce the division description length.
　前記分割制御手段は、動的計画法を用いて計算された前記分割結果を初期値とし、前記初期値から前記分割記述長を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第９項に記載のグラフ構造推定装置。 The partition control means sets the partition result calculated using dynamic programming as an initial value, and repeats updating the partition result from the initial value in a direction to reduce the partition description length, thereby The graph structure estimation device according to claim 9, wherein an optimized division model is specified.
　前記分割制御手段は、前記分割記述長と前記グラフ記述長の和を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第１０項に記載のグラフ構造推定装置。 The said division | segmentation control means specifies the said optimal division | segmentation model by repeating updating the said division | segmentation result in the direction which makes the sum of the said division | segmentation description length and the said graph description length small. The graph structure estimation apparatus described.
　前記分割制御手段は、動的計画法を用いて計算された前記分割結果を初期値とし、前記初期値から前記分割記述長と前記グラフ記述長の和を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第１０項に記載のグラフ構造推定装置。 The partition control means sets the partition result calculated using dynamic programming as an initial value, and updates the partition result from the initial value in a direction of decreasing the sum of the partition description length and the graph description length. The graph structure estimation device according to claim 10, wherein the optimization division model is specified by repeating the above.
　前記分割手段は、
　前記ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、前記ノード重要度情報および前記分割モデルを符号化するために必要な分割記述長を算出する分割符号長計算手段と、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記分割モデルにて特定される前記複数のグループのいずれかに分けた際に生じる各グループを、当該グループ内のノード間の関係に基づいて再分割した際に、前記各グループでの再分割の状態を符号化するために必要な再分割符号長を、前記分割モデルごとに算出する部分グラフ分割符号長計算手段と、
　前記互いに異なる分割モデルの中から、前記分割記述長と前記再分割符号長の和が最小になる最適化分割モデルを特定し、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、当該最適化分割モデルにて特定される前記複数のグループのいずれかに分ける分割制御手段と、を含む請求の範囲第７項または第８項に記載のグラフ構造推定装置。 The dividing means includes
A division code length calculation means for calculating a division description length necessary for encoding the node importance information and the division model for each of different division models, using a division result of the node importance value range as a division model; ,
Each group generated when each of the nodes is divided into any of the plurality of groups specified by the division model based on importance information of the node is set as a relationship between the nodes in the group. A subgraph division code length calculation means for calculating a re-division code length necessary for encoding the re-division state in each group when re-division based on each division model;
From among the different division models, an optimized division model that minimizes the sum of the division description length and the subdivision code length is identified, and each of the nodes is determined based on importance information of the node. The graph structure estimation apparatus according to claim 7, further comprising: a division control unit that divides into any of the plurality of groups specified by the optimized division model.
　前記計算手段は、前記グラフデータを時系列で受け付け、当該グラフデータを受け付けるたびに、当該グラフデータに基づいて、当該グラフデータに表されたノードごとに前記重要度情報を計算し、
　前記分割手段は、前記グラフデータが受け付けられるたびに、前記ノード重要度の値域の分割結果を、前記任意の最適化基準にしたがって、時系列で受け付けられた各グラフデータのノード重要度情報に基づいて最適化する、請求の範囲第７項または第８項に記載のグラフ構造推定装置。 The calculation means receives the graph data in time series, and each time the graph data is received, based on the graph data, calculates the importance information for each node represented in the graph data,
Each time the graph data is received, the dividing unit determines the result of dividing the node importance value range based on the node importance information of each graph data received in time series according to the arbitrary optimization criterion. The graph structure estimation device according to claim 7 or 8, wherein the graph structure estimation device is optimized.
　前記計算手段は、前記グラフデータを時系列で受け付け、当該グラフデータを受け付けるたびに、当該グラフデータに基づいて、当該グラフデータに表されたノードごとに前記重要度情報を計算し、
　前記分割手段は、
　過去の最適化された前記ノード重要度の値域の分割結果を格納する格納手段と、
　前記グラフデータが受け付けられるたびに、前記ノード重要度の値域の分割結果を、前記格納手段に格納された前記過去の最適化されたノード重要度の値域の分割結果と、前記ノードごとの重要度情報と、に基づいて、前記任意の最適化基準にしたがって最適化する、分割制御手段と、を含む、請求の範囲第７項または第８項に記載のグラフ構造推定装置。 The calculation means receives the graph data in time series, and each time the graph data is received, based on the graph data, calculates the importance information for each node represented in the graph data,
The dividing means includes
Storage means for storing the result of dividing the range of the node importance values optimized in the past;
Each time the graph data is received, the division result of the node importance value range is divided into the past optimized node importance value range division result stored in the storage unit and the importance level for each node. The graph structure estimation apparatus according to claim 7, further comprising: a division control unit that optimizes based on the information according to the arbitrary optimization criterion.
　グラフ構造推定装置が行うグラフ構造推定方法であって、
　複数のノードと、当該複数のノードのうちの各ノード間の関係の程度を示すリンクと、によって表されるグラフデータを受け付けた場合に、当該グラフデータに基づいて、前記ノードごとに、当該ノードの重要性の程度を示す重要度情報を計算し、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、複数のグループのいずれかに分け、
　前記ノードのグループ分けの結果を、前記グラフデータのグラフ構造情報として出力する、グラフ構造推定方法。 A graph structure estimation method performed by a graph structure estimation device,
When graph data represented by a plurality of nodes and a link indicating the degree of relationship between each of the plurality of nodes is received, for each node based on the graph data, Calculate importance information indicating the degree of importance of
Each of the nodes is divided into one of a plurality of groups based on importance information of the node,
A graph structure estimation method for outputting a result of grouping the nodes as graph structure information of the graph data.
　前記リンクは、前記各ノード間の関係の程度を数値で示し、
　前記計算することでは、前記リンクが示す数値のうち同一のノードに関係する数値の総和を、前記ノードごとに計算し、当該総和を、当該ノードの重要度情報として用いる、請求の範囲第１９項に記載のグラフ構造推定方法。 The link indicates the degree of relationship between the nodes as a numerical value,
The calculation according to claim 19, wherein in the calculation, a sum of numerical values related to the same node among the numerical values indicated by the link is calculated for each of the nodes, and the sum is used as importance information of the node. The graph structure estimation method described in 1.
　前記計算することでは、ページランクアルゴリズムにしたがって、前記グラフデータに基づいて、前記ノードごとに、当該ノードのページランクを計算し、当該ページランクを、当該ノードの重要度情報として用いる、請求の範囲第１９項に記載のグラフ構造推定方法。 In the calculation, the page rank of the node is calculated for each node based on the graph data according to a page rank algorithm, and the page rank is used as importance information of the node. 20. The graph structure estimation method according to item 19.
　前記計算することでは、ＨＩＴＳアルゴリズムにしたがって、前記グラフデータに基づいて、前記ノードごとに、当該ノードのハブ指標を計算し、当該ハブ指標を、当該ノードの重要度情報として用いる、請求の範囲第１９項に記載のグラフ構造推定方法。 In the calculating, a hub index of the node is calculated for each of the nodes based on the graph data according to the HITS algorithm, and the hub index is used as importance information of the node. 20. The graph structure estimation method according to item 19.
　前記計算することでは、ＨＩＴＳアルゴリズムにしたがって、前記グラフデータに基づいて、前記ノードごとに、当該ノードのオーソリティ指標を計算し、当該オーソリティ指標を、当該ノードの重要度情報として用いる、請求の範囲第１９項に記載のグラフ構造推定方法。 In the calculation, the authority index of the node is calculated for each of the nodes based on the graph data according to the HITS algorithm, and the authority index is used as importance information of the node. 20. The graph structure estimation method according to item 19.
　前記分けることでは、さらに、前記ノード重要度情報の取りうる最小値から最大値までを表すノード重要度の値域を、１つまたは複数の分割用ノード重要度を用いて分割して得られた複数の分割領域を、前記複数のグループとする、請求の範囲第１９項から第２３項のいずれか１項に記載のグラフ構造推定方法。 In the dividing, a plurality of values obtained by dividing a node importance value range representing a minimum value to a maximum value that can be taken by the node importance information by using one or a plurality of dividing node importance values. The graph structure estimation method according to any one of claims 19 to 23, wherein the divided regions are the plurality of groups.
　前記分けることでは、前記ノード重要度の値域の分割結果を、任意の最適化基準にしたがって、前記グラフデータに基づき最適化する、請求の範囲第２４項に記載のグラフ構造推定方法。 25. The graph structure estimation method according to claim 24, wherein in the division, the division result of the node importance value range is optimized based on the graph data in accordance with an arbitrary optimization criterion.
　前記分けることでは、前記任意の最適化基準として、最小記述長原理、赤池情報量基準、または、ベイズ情報量基準を用いる、請求の範囲第２５項に記載のグラフ構造推定方法。 26. The graph structure estimation method according to claim 25, wherein said dividing uses the minimum description length principle, the Akaike information criterion, or the Bayes information criterion as the arbitrary optimization criterion.
　前記分けることは、
　前記ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、前記ノード重要度情報および前記分割モデルを符号化するために必要な分割記述長を算出することと、
　前記互いに異なる分割モデルの中から、前記分割記述長が最小になる最適化分割モデルを特定し、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、当該最適化分割モデルにて特定される前記複数のグループのいずれかに分けることと、を含む、請求の範囲第２５項または第２６項に記載のグラフ構造推定方法。 Said dividing is
The node importance value range division result is a division model, and for each of the different division models, calculating the node importance information and the division description length necessary for encoding the division model;
An optimized partition model that minimizes the partition description length is identified from the different partition models, and each of the nodes is identified by the optimized partition model based on importance information of the node. 27. The graph structure estimation method according to claim 25 or claim 26, comprising dividing into any of the plurality of groups.
　前記分けることは、
　前記ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、前記ノード重要度情報および前記分割モデルを符号化するために必要な分割記述長を算出することと、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記分割モデルにて特定される前記複数のグループのいずれかに分けた際に、前記ノードのグループ分けによって分割されたグラフを符号化するためのグラフ記述長を、前記分割モデルごとに算出することと、
　前記互いに異なる分割モデルの中から、前記分割記述長と前記グラフ記述長の和が最小になる最適化分割モデルを特定し、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、当該最適化分割モデルにて特定される前記複数のグループのいずれかに分けることと、を含む、請求の範囲第２５項または第２６項に記載のグラフ構造推定方法。 Said dividing is
The node importance value range division result is a division model, and for each of the different division models, calculating the node importance information and the division description length necessary for encoding the division model;
When each of the nodes is divided into any of the plurality of groups specified by the division model based on importance information of the node, the graph divided by the grouping of the nodes is encoded. Calculating a graph description length for each of the division models;
From among the different partition models, an optimized partition model that minimizes the sum of the partition description length and the graph description length is specified, and each of the nodes is determined based on importance information of the node. 27. The graph structure estimation method according to claim 25 or 26, comprising: dividing into any of the plurality of groups specified by the generalized division model.
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記最適化分割モデルにて特定される前記複数のグループのいずれかに分けることでは、動的計画法を用いて、前記最適化分割モデルを特定する、請求の範囲第２７項または第２８項に記載のグラフ構造推定方法。 By dividing each of the nodes into one of the plurality of groups specified by the optimized division model based on importance information of the node, the optimization division is performed using dynamic programming. 29. The graph structure estimation method according to claim 27 or 28, wherein a model is specified.
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記最適化分割モデルにて特定される前記複数のグループのいずれかに分けることでは、前記分割記述長を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第２７項に記載のグラフ構造推定方法。 By dividing each of the nodes into one of the plurality of groups specified by the optimized division model based on importance information of the node, the division result is reduced in the direction of reducing the division description length. 28. The graph structure estimation method according to claim 27, wherein the optimized division model is specified by repeating updating.
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記最適化分割モデルにて特定される前記複数のグループのいずれかに分けることでは、動的計画法を用いて計算された前記分割結果を初期値とし、前記初期値から前記分割記述長を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第２７項に記載のグラフ構造推定方法。 By dividing each of the nodes into one of the plurality of groups specified by the optimized division model based on importance information of the node, the division calculated using dynamic programming is used. 28. The graph according to claim 27, wherein the optimized division model is identified by repeatedly setting the result as an initial value and updating the division result in a direction of decreasing the division description length from the initial value. Structure estimation method.
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記最適化分割モデルにて特定される前記複数のグループのいずれかに分けることでは、前記分割記述長と前記グラフ記述長の和を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第２８項に記載のグラフ構造推定方法。 By dividing each of the nodes into one of the plurality of groups specified by the optimized division model based on importance information of the node, a sum of the division description length and the graph description length is obtained. 29. The graph structure estimation method according to claim 28, wherein the optimized division model is specified by repeatedly updating the division result in a direction of decreasing.
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記最適化分割モデルにて特定される前記複数のグループのいずれかに分けることでは、動的計画法を用いて計算された前記分割結果を初期値とし、前記初期値から前記分割記述長と前記グラフ記述長の和を小さくする方向へ前記分割結果を更新することを繰り返すことによって、前記最適化分割モデルを特定する、請求の範囲第２８項に記載のグラフ構造推定方法。 By dividing each of the nodes into one of the plurality of groups specified by the optimized division model based on importance information of the node, the division calculated using dynamic programming is used. The optimized partition model is specified by repeatedly setting the result as an initial value and updating the partition result in a direction to reduce the sum of the partition description length and the graph description length from the initial value. 29. The graph structure estimation method according to item 28.
　前記分けること、
　前記ノード重要度の値域の分割結果を分割モデルとし、互いに異なる分割モデルごとに、前記ノード重要度情報および前記分割モデルを符号化するために必要な分割記述長を算出することと、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、前記分割モデルにて特定される前記複数のグループのいずれかに分けた際に生じる各グループを、当該グループ内のノード間の関係に基づいて再分割した際に、前記各グループでの再分割の状態を符号化するために必要な再分割符号長を、前記分割モデルごとに算出することと、
　前記互いに異なる分割モデルの中から、前記分割記述長と前記再分割符号長の和が最小になる最適化分割モデルを特定し、前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、当該最適化分割モデルにて特定される前記複数のグループのいずれかに分けることと、を含む、請求の範囲第２５項または第２６項に記載のグラフ構造推定方法。 Said dividing,
The node importance value range division result is a division model, and for each of the different division models, calculating the node importance information and the division description length necessary for encoding the division model;
Each group generated when each of the nodes is divided into any of the plurality of groups specified by the division model based on importance information of the node is set as a relationship between the nodes in the group. A re-division code length necessary for encoding the re-division state in each group when re-division based on each division model;
From among the different division models, an optimized division model that minimizes the sum of the division description length and the subdivision code length is identified, and each of the nodes is determined based on importance information of the node. 27. The graph structure estimation method according to claim 25 or 26, comprising: dividing into any of the plurality of groups specified by the optimized division model.
　前記計算することでは、前記グラフデータを時系列で受け付け、当該グラフデータを受け付けるたびに、当該グラフデータに基づいて、当該グラフデータに表されたノードごとに前記重要度情報を計算し、
　前記分けることでは、前記グラフデータが受け付けられるたびに、前記ノード重要度の値域の分割結果を、前記任意の最適化基準にしたがって、時系列で受け付けられた各グラフデータのノード重要度情報に基づいて最適化する、請求の範囲第２５項または第２６項に記載のグラフ構造推定方法。 In the calculation, the graph data is received in time series, and each time the graph data is received, the importance information is calculated for each node represented in the graph data based on the graph data.
In the dividing, every time the graph data is received, the division result of the node importance value range is based on the node importance information of each graph data received in time series according to the arbitrary optimization criterion. 27. The graph structure estimation method according to claim 25 or 26, wherein the graph structure estimation method is optimized.
　前記計算することでは、前記グラフデータを時系列で受け付け、当該グラフデータを受け付けるたびに、当該グラフデータに基づいて、当該グラフデータに表されたノードごとに前記重要度情報を計算し、
　前記分けることは、
　過去の最適化された前記ノード重要度の値域の分割結果を格納手段に格納することと、
　前記グラフデータが受け付けられるたびに、前記ノード重要度の値域の分割結果を、前記格納手段に格納された前記過去の最適化されたノード重要度の値域の分割結果と、前記ノードごとの重要度情報と、に基づいて、前記任意の最適化基準にしたがって最適化することと、を含む、請求の範囲第２５項または第２６項に記載のグラフ構造推定方法。 In the calculation, the graph data is received in time series, and each time the graph data is received, the importance information is calculated for each node represented in the graph data based on the graph data.
Said dividing is
Storing in the storage means the result of the past optimization of the node importance value range;
Each time the graph data is received, the division result of the node importance value range is divided into the past optimized node importance value range division result stored in the storage unit and the importance level for each node. 27. The graph structure estimation method according to claim 25, further comprising: performing optimization according to the arbitrary optimization criterion based on the information.
　コンピュータを、
　複数のノードと、当該複数のノードのうちの各ノード間の関係の程度を示すリンクと、によって表されるグラフデータを受け付けた場合に、当該グラフデータに基づいて、前記ノードごとに、当該ノードの重要性の程度を示す重要度情報を計算する計算手段、
　前記ノードのそれぞれを、当該ノードの重要度情報に基づいて、複数のグループのいずれかに分ける分割手段、および、
　前記ノードのグループ分けの結果を、前記グラフデータのグラフ構造情報として出力する出力手段、として機能させるためのプログラム。 Computer
When graph data represented by a plurality of nodes and a link indicating the degree of relationship between each of the plurality of nodes is received, for each node based on the graph data, A calculation means for calculating importance information indicating the degree of importance of
A dividing unit that divides each of the nodes into any of a plurality of groups based on importance information of the nodes; and
A program for causing a result of grouping the nodes to function as output means for outputting graph structure information of the graph data.
　前記リンクは、前記各ノード間の関係の程度を数値で示し、
　前記計算手段は、前記リンクが示す数値のうち同一のノードに関係する数値の総和を、前記ノードごとに計算し、当該総和を、当該ノードの重要度情報として用いる、請求の範囲第３７項に記載のプログラム。
The link indicates the degree of relationship between the nodes as a numerical value,
The calculation means according to claim 37, wherein the calculation means calculates a sum of numerical values related to the same node among the numerical values indicated by the link, and uses the total as the importance level information of the node. The listed program.