JP6688240B2

JP6688240B2 - Distributed synchronous processing system and distributed synchronous processing method

Info

Publication number: JP6688240B2
Application number: JP2017028610A
Authority: JP
Inventors: 小林　弘明; 弘明小林; 岡本　光浩; 光浩岡本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-02-20
Filing date: 2017-02-20
Publication date: 2020-04-28
Anticipated expiration: 2037-02-20
Also published as: JP2018136588A

Description

本発明は、分散配置された複数のサーバを同期させて処理を実行する分散同期処理システムおよび分散同期処理方法に関する。 The present invention relates to a distributed synchronous processing system and a distributed synchronous processing method for synchronizing a plurality of distributedly arranged servers to execute processing.

ネットワーク上に複数のサーバを分散配置する分散処理システムのフレームワークとして、非特許文献１にはＭａｐＲｅｄｕｃｅが開示されている。但し、このＭａｐＲｅｄｕｃｅは、処理の度に、外部のデータストアからの入力データの読み込みや、結果の書き出し処理が必要であるため、ある処理の結果を次の処理で利用するようなイテレーティブな（反復する）処理には向いていない。この種の処理には、非特許文献２に開示されているＢＳＰ（Bulk Synchronous Parallel：バルク同期並列）が適している。 Non-Patent Document 1 discloses MapReduce as a framework of a distributed processing system in which a plurality of servers are distributedly arranged on a network. However, this MapReduce requires reading of input data from an external data store and writing of results each time processing is performed, and thus it is an iterative (repetitive) method in which the result of one processing is used in the next processing. Yes) is not suitable for processing. BSP (Bulk Synchronous Parallel) disclosed in Non-Patent Document 2 is suitable for this type of processing.

このＢＳＰは、「スーパーステップ（ＳＳ：superstep)」という処理単位を繰り返し実行することにより、分散環境でのデータ処理を実行する。図１は、ＢＳＰ計算モデルを説明するための図である。 This BSP executes data processing in a distributed environment by repeatedly executing a processing unit called "super step (SS)". FIG. 1 is a diagram for explaining the BSP calculation model.

１つのスーパーステップは、図１に示すように、次の３つのフェーズ（ＰＨ：phase）、「ローカル計算（ＬＣ：Local computation）」（フェーズＰＨ１）、「データ交換（Ｃｏｍ：Communication）」（フェーズＰＨ２）、「同期（Sync）」（フェーズＰＨ３）から構成される。
具体的には、複数のノード（ノード１〜ノード４）のうちのいずれかのノードがデータを受信すると、そのノード（例えば、ノード１）がフェーズＰＨ１において、そのデータについての計算処理（ローカル計算（ＬＣ））を実行する。続いて、フェーズＰＨ２において、各ノードが保持しているローカル計算の結果であるデータについて、ノード間でのデータ交換を実行する。次に、フェーズＰＨ３において、同期処理を行う、より詳細には、すべてのノード間でのデータ交換の終了を待つ。
そして、スーパーステップＳＳ１として、一連のスーパーステップの処理（ＰＨ１〜ＰＨ３）が終了すると、各ノードはその計算結果を保持した上で、次の一連の処理であるスーパーステップＳＳ２へと進む。 As shown in FIG. 1, one super step includes the following three phases (PH: phase), "local computation (LC)" (phase PH1), and "data exchange (Com: Communication)" (phase PH2) and "Sync" (phase PH3).
Specifically, when any one of the plurality of nodes (node 1 to node 4) receives the data, the node (for example, node 1) receives a calculation process (local calculation) for the data in phase PH1. (LC)). Subsequently, in phase PH2, data exchange between the nodes is executed for the data that is the result of the local calculation held by each node. Next, in phase PH3, synchronization processing is performed, more specifically, the end of data exchange between all nodes is waited for.
Then, when a series of super step processes (PH1 to PH3) is completed as super step SS1, each node holds the calculation result and proceeds to the next series of processes, super step SS2.

このＢＳＰを採用した分散処理フレームワークとして、非特許文献３にはＰｒｅｇｅｌが開示されている。このＰｒｅｇｅｌ等のフレームワークでは、全体の処理をグラフＧ＝（Ｖ，Ｅ）として表現し、これをＢＳＰに適用して実行する。ここで、Ｖは「バーテックス（vertex：頂点）の集合」であり、Ｅは「エッジ（edge：辺）の集合」を意味する。 Non-Patent Document 3 discloses Pregel as a distributed processing framework adopting the BSP. In this framework such as Pregel, the whole process is expressed as a graph G = (V, E), and this is applied to BSP and executed. Here, V means a “set of vertices” and E means a “set of edges”.

ここで、図２を参照し、交通シミュレーションにＢＳＰを適用した例を説明する。
図２においては、各交差点（ｖ）が頂点（vertex）に対応付けられる（図２のｖ_１〜ｖ_４）。また、各交差点を結ぶ道路（ｅ）が辺（edge）に対応付けられる（図２のｅ_１〜ｅ_６）。ここで、辺（edge）は一方通行であり、双方向の道路は２つの辺（edge）に対応付けられる。また、ある頂点（vertex）から見て、車両が出てゆく方向の辺（edge）を、「出力辺（outgoing edge）」と呼び、車両が流入する方向の辺（edge）を「入力辺（incoming edge）」と呼ぶ。例えば、図２において、頂点ｖ_２からみると、辺ｅ_１は入力辺であり、辺ｅ_２は出力辺になる。逆に、頂点ｖ_１からみると、辺ｅ_１は出力辺であり、辺ｅ_２は入力辺になる。 Here, an example in which the BSP is applied to the traffic simulation will be described with reference to FIG.
In FIG. 2, each intersection (v) is associated with a vertex (v ₁ to v _{4 in} FIG. 2). Further, the road (e) that connects each intersection is associated with an edge (e ₁ to e _{6 in} FIG. 2). Here, an edge is a one-way street, and a bidirectional road is associated with two edges. In addition, the edge (edge) in the direction in which the vehicle exits from a certain vertex is called the “outgoing edge”, and the edge in the direction in which the vehicle inflows is the “input edge (edge)”. incoming edge) ". For example, in FIG. 2, when viewed from the vertex v ₂ , the side e ₁ is the input side and the side e ₂ is the output side. Conversely, when viewed from the vertex v ₁ , the side e ₁ is an output side and the side e ₂ is an input side.

図１で示したスーパーステップでは、フェーズＰＨ１（ローカル計算）において、頂点（vertex）毎に、経過時間（Δｔ）における、各頂点ｖ_１〜ｖ_４に対応付けられている交差点の状態（例えば、信号の色（青、黄、赤）や交差点内の車両の動き等）と、それに付随する出力辺としての道路内の状態（車両の動き（台数・平均速度等））とをシミュレートする。フェーズＰＨ２（データ交換）では、ある頂点（vertex）は、出力辺を介して接する他の頂点（vertex）に対して、当該出力辺を介して出てゆく車両の動きの情報（台数等）を送信するとともに、入力辺を介して入ってくる車両の動きの情報（台数等）を受信する。フェーズＰＨ３（同期）では、頂点（vertex）間で、シミュレーション時刻ｔを同期する。つまり、全ての頂点（vertex）間でデータ交換の完了を待つ。
この交通シミュレーションにおいては、このように交差点（頂点（vertex））単位で、並列処理することにより、計算時間を短縮することが可能となる。 In the super step shown in FIG. 1, in the phase PH1 (local calculation), the state of the intersection associated with each vertex v _{1 to} v ₄ at the elapsed time (Δt) for each vertex (vertex) (for example, The color of the signal (blue, yellow, red) and the movement of the vehicle in the intersection) and the state of the road (vehicle movement (number of vehicles, average speed, etc.)) as an output side are simulated. In phase PH2 (data exchange), a certain vertex (vertex) sends information about the movement of the vehicle (the number of vehicles, etc.) that goes out through the output edge to another vertex (vertex) that contacts the output edge. While transmitting, it also receives information on the movement of the vehicle (the number of vehicles, etc.) coming in through the input side. In phase PH3 (synchronization), the simulation time t is synchronized between the vertices. In other words, wait for the completion of data exchange between all vertices.
In this traffic simulation, it is possible to reduce the calculation time by performing parallel processing on an intersection (vertex) basis.

しかしながら、上記のＢＳＰを採用した分散同期処理システムの構成では、スーパーステップ毎に、全頂点（vertex）を同期するため、最も処理が遅い頂点（vertex）にあわせることとなる。よって、たった一つでも全体から著しく遅い頂点（vertex）があると、その影響が全体に及ぶ。つまり、最も処理が遅い頂点（vertex）にあわせて、全体が著しく遅延してしまう。 However, in the configuration of the distributed synchronization processing system that employs the above BSP, all vertices are synchronized for each superstep, so that the vertices that are the slowest to process are matched. Therefore, if there is only one vertex that is significantly slower than the whole, the effect will be global. In other words, the whole process will be significantly delayed according to the slowest vertex.

上記の問題を解決する手法として、非特許文献４に記載の技術が提案されている。
非特許文献４では、全体同期ではなく、隣接する頂点（vertex）の間で局所的に同期処理を行う。具体的には、「自頂点（vertex）および入力辺で接する全ての頂点（vertex）の計算・送信処理（ローカル計算（フェーズＰＨ１）およびデータ交換（フェーズＰＨ２））が完了していること」（以下、「隣接同期」と称する。）を満たしている場合に、各頂点（vertex）が次のスーパーステップに移行する。これにより、同期（Sync）（フェーズＰＨ３）における待ち時間が削減されるため、システム全体としての処理速度の向上させることができる。 The technique described in Non-Patent Document 4 has been proposed as a method for solving the above problem.
In Non-Patent Document 4, synchronization processing is locally performed between adjacent vertices (vertex) instead of overall synchronization. Specifically, "Completion of calculation / transmission processing (local calculation (phase PH1) and data exchange (phase PH2)) of its own vertex (vertex) and all vertices (vertex) that are in contact with the input edge" ( In the following, each vertex will be moved to the next super step if "adjacent synchronization" is satisfied. As a result, the waiting time in the synchronization (phase PH3) is reduced, so that the processing speed of the entire system can be improved.

Dean, J., et al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI '04, 2004, p.137-149.Dean, J., et al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI '04, 2004, p.137-149. Valiant, L., et al., “A bridging model for parallel computation,” Communications of the ACM, 1990, vol.33, No.8, p.103-111.Valiant, L., et al., “A bridging model for parallel computation,” Communications of the ACM, 1990, vol.33, No.8, p.103-111. Malewicz, G., et al., “Pregel: A System for Large-Scale Graph Processing,” Proc. of ACM SIGMOD, 2010, p.136-145.Malewicz, G., et al., “Pregel: A System for Large-Scale Graph Processing,” Proc. Of ACM SIGMOD, 2010, p.136-145. 小林弘明，岡本光浩，「分散処理フレームワークの同期処理に関する一検討」，電子情報通信学会，2016年電子情報通信学会通信ソサイエティ大会講演論文集，Ｂ−７−１，2016年9月6日Hiroaki Kobayashi, Mitsuhiro Okamoto, "A Study on Synchronous Processing of Distributed Processing Framework", The Institute of Electronics, Information and Communication Engineers, Proceedings of the 2016 IEICE Communications Society Conference, B-7-1, September 6, 2016

この非特許文献４に記載の分散同期処理システムは、全ての頂点（vertex）のローカル計算（フェーズＰＨ１）を並列実行できるという前提で評価を行っていた。しかしながら、実際には、各頂点（vertex）のローカル計算（フェーズＰＨ１）を実行する処理サーバ（後記する「worker」）は、各頂点（vertex）に個別のスレッドを割り当てて処理するため、ＣＰＵ（Central Processing Unit）の１コアに複数のスレッドが割り当てられた場合（以下、「リソース制約」と称する。）に、予期せぬアイドル時間が生じる可能性がある。 The distributed synchronous processing system described in Non-Patent Document 4 has been evaluated on the assumption that local calculation (phase PH1) of all vertices can be executed in parallel. However, in reality, the processing server (“worker” described later) that executes the local calculation (phase PH1) of each vertex (vertex) allocates an individual thread to each vertex (vertex) and processes the CPU ( When a plurality of threads are assigned to one core of the Central Processing Unit (hereinafter referred to as “resource constraint”), an unexpected idle time may occur.

本発明は、前記した問題を鑑みてなされたものであり、分散同期処理を行うシステムにおいて、リソース制約による予期せぬアイドル時間を抑制することができる、分散同期処理システムおよび分散同期処理方法を提供することを課題とする。 The present invention has been made in view of the above problems, and provides a distributed synchronization processing system and a distributed synchronization processing method capable of suppressing an unexpected idle time due to resource constraints in a system that performs distributed synchronization processing. The task is to do.

前記した課題を解決するため、請求項１に記載の発明は、複数の処理サーバと、前記処理サーバ上で動作する複数の分散処理部と、を有する分散同期処理システムであって、前記分散処理部のそれぞれには、当該分散処理部自身が属する処理サーバのＣＰＵにおける１コアのスケジューリング対象として、個別の実行単位である１スレッドが割り当てられるものであり、前記処理サーバのそれぞれが、前記分散処理部による所定の計算ステップにおける計算処理、および、当該分散処理部の接続先となる他の分散処理部との間でのデータ送受信を完了し、当該分散処理部が、次の前記計算ステップについて実行可能な状態を示す実行可能状態であることを検出する隣接同期分散管理部と、前記実行可能状態となった分散処理部それぞれに、当該分散処理部の前記計算処理の遅れにより前記他の分散処理部のＣＰＵを処理待ちにさせる影響が大きいほど、前記計算処理の遅れる分散処理部に高い優先度を設定する所定の優先度算出アルゴリズムを用いて前記優先度を設定し、前記優先度の高い分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てる優先度設定部と、を備えることを特徴とする。 In order to solve the above-mentioned problems, the invention according to claim 1 is a distributed synchronous processing system having a plurality of processing servers and a plurality of distributed processing units operating on the processing servers, wherein the distributed processing One thread, which is an individual execution unit, is assigned to each of the units as a scheduling target of one core in the CPU of the processing server to which the distributed processing unit itself belongs. Completion of calculation processing in a predetermined calculation step by a unit and data transmission / reception with another distributed processing unit to which the distributed processing unit is connected, and the distributed processing unit executes the next calculation step. detects that an executable state indicative of possible states and the adjacent synchronous distributed management unit, to the distributed processing unit respectively became the executable state, those A predetermined priority calculation algorithm that sets a higher priority to a distributed processing unit that delays the calculation processing is more affected as the delay of the calculation processing of the distributed processing unit causes the CPU of the other distributed processing unit to wait for processing. setting the priority by using, characterized in that it comprises a priority setting unit that assigns a thread of the higher priority distributed processing unit in the processing schedule preferentially the CPU.

また、請求項７に記載の発明は、複数の処理サーバと、前記処理サーバ上で動作する複数の分散処理部と、を有する分散同期処理システムの分散同期処理方法であって、前記分散処理部のそれぞれには、当該分散処理部自身が属する処理サーバのＣＰＵにおける１コアのスケジューリング対象として、個別の実行単位である１スレッドが割り当てられるものであり、前記処理サーバのそれぞれが、前記分散処理部による所定の計算ステップにおける計算処理、および、当該分散処理部の接続先となる他の分散処理部との間でのデータ送受信を完了し、当該分散処理部が、次の前記計算ステップについて実行可能な状態を示す実行可能状態であることを検出する手順と、前記実行可能状態となった分散処理部それぞれに、当該分散処理部の前記計算処理の遅れにより前記他の分散処理部のＣＰＵを処理待ちにさせる影響が大きいほど、前記計算処理の遅れる分散処理部に高い優先度を設定する所定の優先度算出アルゴリズムを用いて前記優先度を設定し、前記優先度の高い分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てる手順と、を実行することを特徴とする。 The invention according to claim 7 is a distributed synchronization processing method for a distributed synchronization processing system, comprising: a plurality of processing servers; and a plurality of distributed processing units operating on the processing servers. One thread, which is an individual execution unit, is assigned as a scheduling target of one core in the CPU of the processing server to which the distributed processing unit itself belongs, and each of the processing servers has the distributed processing unit. Completion of the calculation processing in a predetermined calculation step by and the data transmission / reception to / from another distributed processing section to which the distributed processing section is connected, and the distributed processing section can execute the next calculation step. such a step of detecting that condition is executable state indicating, in distributed processing unit respectively became the executable state, the of the distributed processing unit The priority of the CPU of the other distributed processing units due to a delay of the calculation process the larger influence to the pending, using a predetermined priority calculation algorithm for setting a higher priority to the distributed processing unit delay of the calculation process Is set and the thread of the distributed processing unit having a high priority is preferentially assigned to the processing schedule of the CPU.

このようにすることで、分散処理システムの各処理サーバのリソース制約による予期せぬアイドル時間を抑制することができる。これにより、各処理サーバのリソース利用効率が向上し、同期処理計算の高速化（実行時間の短縮）を図ることができる。 By doing so, an unexpected idle time due to resource restrictions of each processing server of the distributed processing system can be suppressed. As a result, the resource utilization efficiency of each processing server is improved, and it is possible to speed up the synchronization processing calculation (reduce the execution time).

請求項２に記載の発明は、前記処理サーバのそれぞれが、計算対象となる各分散処理部の接続関係として、前記分散処理部と当該分散処理部が算出したデータの出力先となる他の分散処理部とが出力辺で繋がり、前記分散処理部と次の前記計算ステップで必要となるデータの入力元となる他の分散処理部とが入力辺で繋がることを示すグラフトポロジを記憶する記憶部を備えており、前記所定の優先度算出アルゴリズムは、前記実行可能状態となった分散処理部それぞれについての前記出力辺の数を算出し、前記算出した出力辺の数が多い前記分散処理部の優先度を高く設定するものであり、前記優先度設定部が、当該所定の優先度算出アルゴリズムを用いて前記分散処理部それぞれの優先度を設定し、設定した優先度のうち、より高い優先度が設定された分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てることを特徴とする。 According to a second aspect of the present invention, each of the processing servers has, as a connection relationship between the distributed processing units to be calculated, the distributed processing unit and another distribution that is an output destination of data calculated by the distributed processing unit. A storage unit that stores a graph topology indicating that the processing unit is connected by an output side, and the distributed processing unit and another distributed processing unit that is an input source of data required in the next calculation step are connected by an input side. The predetermined priority calculation algorithm calculates the number of the output sides for each of the distributed processing units in the executable state, and the predetermined number of output sides of the distributed processing units is large. The priority is set to a high priority, and the priority setting unit sets the priority of each of the distributed processing units using the predetermined priority calculation algorithm, and the higher priority among the set priorities is set. There and allocating threads configured distributed processing unit in the processing schedule preferentially the CPU.

このようにすることで、分散処理システムの処理サーバは、出力辺の数が多い、つまり、隣接する分散処理部が多く、他の分散処理部に影響を及ぼす分散処理部のスレッドを優先して処理することができる。これにより、リソース制約によるアイドル時間を抑制することができる。 By doing so, the processing server of the distributed processing system has a large number of output sides, that is, a large number of adjacent distributed processing units, and gives priority to threads of the distributed processing units that affect other distributed processing units. Can be processed. As a result, the idle time due to resource restrictions can be suppressed.

請求項３に記載の発明は、前記隣接同期分散管理部が、各前記分散処理部が実行している前記計算ステップのステップ番号を監視しており、前記所定の優先度算出アルゴリズムは、前記実行可能状態となった分散処理部それぞれについての前記計算ステップのステップ番号を参照し、他の前記分散処理部と比べ処理が遅延している分散処理部の優先度を高く設定するものであり、前記優先度設定部が、当該所定の優先度算出アルゴリズムを用いて前記分散処理部それぞれの優先度を設定し、設定した優先度のうち、より高い優先度が設定された分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てることを特徴とする。 In the invention according to claim 3, the adjacent synchronization distribution management unit monitors the step number of the calculation step executed by each distribution processing unit, and the predetermined priority calculation algorithm executes the execution. By referring to the step number of the calculation step for each distributed processing unit in the enabled state, the priority of the distributed processing unit whose processing is delayed compared to other distributed processing units is set to be high, and The priority setting unit sets the priority of each of the distributed processing units by using the predetermined priority calculation algorithm, and gives priority to the thread of the distributed processing unit having a higher priority among the set priorities. and allocating the processed schedule of the CPU.

このようにすることで、分散処理システムの処理サーバは、各分散処理部のうち、実行している計算ステップが遅い分散処理部のスレッドを優先して処理することができる。これにより、リソース制約によるアイドル時間を抑制することができる。 By doing so, the processing server of the distributed processing system can preferentially process the thread of the distributed processing unit that is executing the slowest calculation step among the distributed processing units. As a result, the idle time due to resource restrictions can be suppressed.

請求項４に記載の発明は、前記処理サーバのそれぞれが、計算対象となる各分散処理部の接続関係として、前記分散処理部と当該分散処理部が算出したデータの出力先となる他の分散処理部とが出力辺で繋がり、前記分散処理部と次の前記計算ステップで必要となるデータの入力元となる他の分散処理部とが入力辺で繋がることを示すグラフトポロジを記憶する記憶部を備えており、前記分散処理部のそれぞれが、前記入力元となる他の分散処理部からのデータの送信がないため、次の前記計算ステップに移行できず入力待ち状態となっている場合に、当該入力元となる他の分散処理部に対し、入力待ち情報を送信することにより投票し、前記入力待ち情報を受信した入力元となる他の分散処理部は、受信した前記入力待ち情報の数を優先度投票値として集計し、前記所定の優先度算出アルゴリズムは、前記入力元となる他の分散処理部のうち、前記優先度投票値の多い分散処理部の優先度を高く設定するものであり、前記処理サーバそれぞれの前記優先度設定部が、当該所定の優先度算出アルゴリズムを用いて前記分散処理部それぞれの優先度を設定し、設定した優先度のうち、より高い優先度が設定された分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てることを特徴とする。 In the invention according to claim 4, each of the processing servers has, as a connection relationship between the distributed processing units to be calculated, another distributed system that is an output destination of the distributed processing unit and data calculated by the distributed processing unit. A storage unit that stores a graph topology indicating that the processing unit is connected by an output side, and the distributed processing unit and another distributed processing unit that is an input source of data required in the next calculation step are connected by an input side. In the case where each of the distributed processing units cannot enter the next calculation step and is in the input waiting state because there is no data transmission from the other distributed processing units that are the input sources, , The other distributed processing unit serving as the input source votes by transmitting the input waiting information, and the other distributed processing unit serving as the input source that has received the input waiting information receives the input waiting information. Number priority The predetermined priority calculation algorithm collects as vote values, and among the other distributed processing units serving as the input source, sets the priority of the distributed processing unit with a large number of priority voting values to be high, The priority setting unit of each processing server sets the priority of each of the distributed processing units using the predetermined priority calculation algorithm, and the distributed processing in which the higher priority is set among the set priorities. It is characterized in that the threads of the group are preferentially assigned to the processing schedule of the CPU.

このようにすることで、分散処理システムの処理サーバは、実際に隣接している分散処理部を入力待ち状態にさせている分散処理部のスレッドを優先して処理することができる。これにより、リソース制約によるアイドル時間を抑制することができる。 By doing so, the processing server of the distributed processing system can preferentially process the threads of the distributed processing units that actually put the adjacent distributed processing units in the input waiting state. As a result, the idle time due to resource restrictions can be suppressed.

請求項５に記載の発明は、前記入力待ち情報を受信した入力元となる他の分散処理部が、自身も前記入力待ち状態である場合に、前記集計した優先度投票値に自身の遅延分を加えた優先度投票値を付した前記入力待ち情報を、前記入力元となる他の分散処理部のさらに入力元であり遅延の原因である他の分散処理部に送信することを特徴とする。 In the invention according to claim 5, when another distributed processing unit which is an input source that has received the input waiting information is also in the input waiting state, the delay amount of its own is added to the aggregated priority voting value. It is characterized in that the input waiting information to which the priority vote value added with is added is transmitted to another distributed processing unit which is an input source and causes a delay of the other distributed processing unit which is the input source. .

このようにすることで、分散処理システムの処理サーバは、隣接している分散処理部を入力待ち状態にさせている分散処理部だけでなく、その分散処理部をさらに入力待ち状態にさせている分散処理部も考慮して優先すべき分散処理部を決定することができる。これにより、リソース制約によるアイドル時間をさらに抑制することができる。 By doing so, the processing server of the distributed processing system puts the adjacent distributed processing unit in the input waiting state as well as the distributed processing unit in which the adjacent distributed processing unit is placed in the input waiting state. The distributed processing unit to be prioritized can be determined in consideration of the distributed processing unit. As a result, the idle time due to resource restrictions can be further suppressed.

請求項６に記載の発明は、複数の処理サーバと、前記処理サーバ上で動作する複数の分散処理部と、を有する分散同期処理システムであって、前記分散処理部のそれぞれには、当該分散処理部自身が属する処理サーバのＣＰＵにおける１コアのスケジューリング対象として、個別の実行単位である１スレッドが割り当てられるものであり、前記処理サーバのそれぞれが、前記分散処理部による所定の計算ステップにおける計算処理、および、当該分散処理部の接続先となる他の分散処理部との間でのデータ送受信を完了し、当該分散処理部が、次の前記計算ステップについて実行可能な状態を示す実行可能状態であることを検出する隣接同期分散管理部と、前記実行可能状態となった分散処理部に優先度を付し、前記優先度の高い分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てる優先度設定部と、計算対象となる各分散処理部の接続関係として、前記分散処理部と当該分散処理部が算出したデータの出力先となる他の分散処理部とが出力辺で繋がり、前記分散処理部と次の前記計算ステップで必要となるデータの入力元となる他の分散処理部とが入力辺で繋がることを示すグラフトポロジを記憶する記憶部とを、備え、前記優先度設定部が、前記実行可能状態となった分散処理部それぞれについての前記出力辺の数を算出し、前記算出した出力辺の数が多い前記分散処理部のスレッドを優先して前記ＣＰＵの処理スケジュールに割り当てることの指標となる第１の優先度を求め、前記隣接同期分散管理部が、各前記分散処理部が実行している前記計算ステップのステップ番号を監視しており、前記優先度設定部が、前記実行可能状態となった分散処理部それぞれについての前記計算ステップのステップ番号を参照し、他の前記分散処理部と比べ処理が遅延している分散処理部のスレッドを優先して前記ＣＰＵの処理スケジュールに割り当てることの指標となる第２の優先度を求め、前記優先度設定部が、前記第１の優先度および前記第２の優先度の双方を用いた各前記分散処理部の優先度を算出し、前記算出した優先度が高い前記分散処理部のスレッドを優先して前記ＣＰＵの処理スケジュールを割り当てることを特徴とする。 A sixth aspect of the present invention is a distributed synchronous processing system having a plurality of processing servers and a plurality of distributed processing units operating on the processing servers, wherein each of the distributed processing units has the distributed processing unit. One thread, which is an individual execution unit, is assigned as a scheduling target of one core in the CPU of the processing server to which the processing unit itself belongs, and each of the processing servers calculates in a predetermined calculation step by the distributed processing unit. Executable state indicating that the processing and data transmission / reception to / from another distributed processing unit to which the distributed processing unit is connected are completed, and the distributed processing unit indicates an executable state for the next calculation step. The priority is given to the adjacent synchronization distributed management unit that detects that the distributed processing unit is in the executable state, and the distributed processing unit with the high priority is attached. As a connection relationship between the priority setting unit that preferentially assigns threads to the processing schedule of the CPU and each distributed processing unit to be calculated, the distributed processing unit and the output destination of the data calculated by the distributed processing unit Of the distributed processing unit is connected by an output side, and the distributed processing unit and another distributed processing unit that is an input source of data required in the next calculation step are connected by an input side, and a graph topology is stored. A storage unit, wherein the priority setting unit calculates the number of output sides for each of the distributed processing units in the executable state, and the distributed processing unit having a large number of calculated output sides A first priority, which is an index for prioritizing threads to be assigned to the processing schedule of the CPU, is obtained, and the adjacent synchronization distribution management unit executes the calculation scan executed by each of the distributed processing units. The step number of the calculation step is monitored, and the priority setting unit refers to the step number of the calculation step for each of the distributed processing units in the executable state, and compares it with the other distributed processing units. The priority of the thread of the distributed processing unit that is delayed is assigned to the processing schedule of the CPU, the second priority is obtained, and the priority setting unit determines the first priority and the first priority. The priority of each of the distributed processing units is calculated using both of the two priorities, and the thread of the distributed processing unit with the calculated high priority is prioritized and the processing schedule of the CPU is assigned. .

このようにすることで、分散処理システムの処理サーバは、各分散処理部の出力辺の数および実行している計算ステップの双方を考慮して優先度を算出し、算出した優先度の高い分散処理部のスレッドを優先して処理することができる。これにより、リソース制約によるアイドル時間を抑制することができる。 By doing so, the processing server of the distributed processing system calculates the priority in consideration of both the number of output sides of each distributed processing unit and the calculation step being executed, and the distribution of the calculated high priority The thread of the processing unit can be processed with priority. As a result, the idle time due to resource restrictions can be suppressed.

本発明によれば、分散同期処理を行うシステムにおいて、リソース制約による予期せぬアイドル時間を抑制することができる、分散同期処理システムおよび分散同期処理方法を提供することができる。
また、上記効果に伴い、分散同期処理システムを構成する処理サーバのリソース利用効率を向上させ、同期処理計算の高速化（実行時間の短縮）を図ることができる。 According to the present invention, it is possible to provide a distributed synchronization processing system and a distributed synchronization processing method capable of suppressing an unexpected idle time due to resource constraints in a system that performs distributed synchronization processing.
Further, along with the above effects, it is possible to improve the resource utilization efficiency of the processing servers that configure the distributed synchronous processing system, and speed up the synchronous processing calculation (shorten the execution time).

ＢＳＰ計算モデルを説明するための図である。It is a figure for demonstrating a BSP calculation model. 交通シミュレーションにＢＳＰ計算モデルを適用した例を説明するための図である。It is a figure for explaining an example which applied a BSP calculation model to traffic simulation. 比較例に係る分散同期処理システムの構成を説明するための図である。It is a figure for explaining the composition of the distributed synchronous processing system concerning a comparative example. ＢＳＰ計算モデルにおける計算対象のグラフトポロジを例示する図である。It is a figure which illustrates the graph topology of a calculation object in a BSP calculation model. 比較例の分散同期処理システムおける処理の流れを説明するための図である。It is a figure for demonstrating the flow of the process in the distributed synchronous processing system of a comparative example. 非特許文献４の分散同期処理システムにおける処理の流れを説明するための図である。FIG. 11 is a diagram for explaining a processing flow in the distributed synchronization processing system of Non-Patent Document 4. 本発明の課題を説明するための図である。It is a figure for demonstrating the subject of this invention. 本実施形態に係る分散同期処理システムの処理概要を説明するための図である。It is a figure for explaining the processing outline of the distributed synchronous processing system concerning this embodiment. 本実施形態に係る分散同期処理システムの構成を示す図である。It is a figure which shows the structure of the distributed synchronous processing system which concerns on this embodiment. 出力辺数優先方式（ＬＥＦ）による優先度設定処理を説明するための図である。It is a figure for demonstrating the priority setting process by the number of output sides priority method (LEF). 遅延ステップ優先方式（ＬＳＦ）による優先度設定処理を説明するための図である。It is a figure for demonstrating the priority setting process by a delay step priority system (LSF). 複合優先方式（ＨＬ２）による優先度の算出式を示す図である。It is a figure which shows the calculation formula of the priority by a composite priority system (HL2). 遅延投票数優先方式（ＭＶＦ）による優先度設定処理を説明するための図である。It is a figure for demonstrating the priority setting process by a delay vote number priority system (MVF). 本実施形態に係る分散同期処理システムの処理の流れを示すフローチャートである。It is a flow chart which shows a flow of processing of a distributed synchronous processing system concerning this embodiment.

＜比較例の分散同期処理手法の内容と課題の詳細な説明＞
初めに、本実施形態に係る分散同期処理システム１および分散同期処理方法の特徴構成を説明するため、比較例として従来技術における分散同期処理システム１ａ，１ｂおよびその分散同期処理方法を、詳細に説明する。
まず、従来のＢＳＰを用いた分散同期処理システム１ａ（図３〜図５参照）を説明し、その後、非特許文献４に記載の分散同期処理システム１ｂ（図６参照）を説明する。 <Detailed description of the contents and problems of the distributed synchronization processing method of the comparative example>
First, in order to describe the characteristic configurations of the distributed synchronization processing system 1 and the distributed synchronization processing method according to the present embodiment, the distributed synchronization processing systems 1a and 1b in the related art and the distributed synchronization processing method thereof will be described in detail as comparative examples. To do.
First, a distributed synchronous processing system 1a (see FIGS. 3 to 5) using a conventional BSP will be described, and then a distributed synchronous processing system 1b (see FIG. 6) described in Non-Patent Document 4 will be described.

比較例の分散同期処理システム１ａは、図３に示すように、複数のworker（処理サーバ）１０ａそれぞれが、複数の頂点（vertex）２０ａを備えて構成される。そして、この構成にＢＳＰを適用するとき、各worker１０ａは、自身が備える全ての頂点（vertex）２０ａについて、現在のスーパーステップ（以下、単に「ステップ」と称することがある。）の状態を管理する。また、各worker１０ａは、自身が備える全ての頂点（vertex）２０ａの処理（フェーズＰＨ１，２）が完了すると、その完了情報を他のworker１０ａに報告し、それぞれのworker１０ａは、全worker１０ａからの報告を受けると、スーパーステップを次のスーパーステップに移行する。
なお、各worker１０ａのうちの１つを代表worker１０ａとし、各worker１０ａから完了情報を取得し、全worker１０ａからの報告を受け、次のスーパーステップへの移行指示を各worker１０ａに送信するようにしてもよい。また、１つのmaster（管理サーバ）を設けて、各worker１０ａから完了情報を取得し、全worker１０ａからの報告を受け、次のスーパーステップへの移行指示を各worker１０ａに送信するようにしてもよい。 In the distributed synchronous processing system 1a of the comparative example, as shown in FIG. 3, each of a plurality of workers (processing servers) 10a includes a plurality of vertices 20a. When the BSP is applied to this configuration, each worker 10a manages the state of the current super step (hereinafter, simply referred to as “step”) for all the vertices 20a included in itself. . When each worker 10a completes the processing (phases PH1 and PH2) of all the vertices 20a that it has, each worker 10a reports the completion information to each worker 10a, and each worker 10a reports from all workers 10a. When received, the super step moves to the next super step.
It should be noted that one of the workers 10a may be the representative worker 10a, completion information may be acquired from each worker 10a, reports from all the workers 10a may be received, and an instruction to move to the next super step may be transmitted to each worker 10a. . Alternatively, one master (management server) may be provided to obtain completion information from each worker 10a, receive reports from all the workers 10a, and send an instruction to shift to the next super step to each worker 10a.

ここで、頂点（vertex）に着目すると、各頂点（vertex）は、次に示す処理を実行する。
頂点（vertex）は、ＢＳＰのフェーズＰＨ１において、現在のその頂点（vertex）の状態、出力辺の状態、および、前スーパーステップの入力メッセージにより取得した情報（入力辺の状態）をパラメータとして計算を行い、頂点（vertex）の状態および出力辺の状態を更新する。そして、頂点（vertex）は、フェーズＰＨ２において、更新した出力辺の状態を出力メッセージとして、その出力辺に隣接する頂点（vertex）に送信する。また、頂点（vertex）は、入力辺で隣接する頂点（vertex）から、更新した入力辺の状態を入力メッセージとして受信する。
この各頂点（vertex）が実行する、フェーズＰＨ１における計算処理およびフェーズＰＨ２におけるデータ交換を、以下「計算・送信処理」と記載する。また、スーパーステップｎ（ｎは、正の整数）における計算・送信処理を、「計算・送信処理ｆ_ｎ」と記載する。 Here, focusing on the vertices, each of the vertices executes the following processing.
In the phase PH1 of the BSP, the vertex is calculated by using the current state of the vertex, the state of the output side, and the information (the state of the input side) acquired by the input message of the previous super step as parameters. Perform and update the state of the vertices and the state of the output edge. Then, in phase PH2, the vertex transmits the updated state of the output side to the vertex adjacent to the output side as an output message. In addition, the vertex receives the updated state of the input edge as an input message from the adjacent vertex.
The calculation process in phase PH1 and the data exchange in phase PH2 executed by each vertex will be referred to as “calculation / transmission process” below. Further, the calculation / transmission processing in the super step n (n is a positive integer) is described as “calculation / transmission processing f _n ”.

次に、図５を参照して、比較例の分散同期処理システム１ａが実行する処理の流れについて説明する。なお、ここでは、計算対象が、図４に示すグラフトポロジ（グラフＧ）であるものとして説明する。また、図５に示すように、分散同期処理システム１ａが２台のworker（worker１，worker２）で構成され、頂点（vertex）ｖ_１〜ｖ_６のうち、頂点（vertex）ｖ_１〜ｖ_３をworker１が担当し、頂点（vertex）ｖ_４〜ｖ_６をworker２が担当するものとする。以下、全体の処理の流れを通して説明する。 Next, a flow of processing executed by the distributed synchronous processing system 1a of the comparative example will be described with reference to FIG. Note that the description will be given here assuming that the calculation target is the graph topology (graph G) shown in FIG. Further, as shown in FIG. 5, is composed of a distributed synchronous processing system 1a is two worker (worker1, worker2), the apex _(vertex) v of 1 to v _6, the vertex _(vertex) v 1 ~v ₃ It is assumed that the worker 1 is in charge and the vertices v _{4 to} v ₆ are in charge of the worker 2. Hereinafter, the entire processing flow will be described.

まず、各worker（worker１，worker２）は、担当する頂点（vertex）のスーパーステップを実行する（ステップＳ１０１）。具体的には、フェーズＰＨ１のローカル計算を実行し、スーパーステップの処理を開始する。 First, each worker (worker1, worker2) executes the superstep of the vertex in charge (step S101). Specifically, the local calculation of the phase PH1 is executed, and the super step process is started.

次に、各workerは、自身が担当する頂点（vertex）の処理の進行を監視し、各頂点（vertex）が、フェーズＰＨ２のデータ交換まで完了したか否かを判定する。そして、各workerは、担当する全ての頂点（vertex）が、フェーズＰＨ２までの処理を完了したと確認した場合に、その完了情報を他のworkerに送信する（ステップＳ１０２）。 Next, each worker monitors the progress of the processing of the vertex (vertex) that it is in charge of, and determines whether or not each vertex (vertex) has completed the data exchange of phase PH2. Then, when each worker confirms that all the vertices in charge have completed the processing up to the phase PH2, the worker sends the completion information to other workers (step S102).

続いて、各workerは、自身を含む全てのworker（ここでは、worker１，worker２）から完了情報を受信したか否かを確認する。そして、各workerは、全てのworkerから完了情報を受信した場合に、スーパーステップを「＋１」に更新し（ステップＳ１０３）、担当する各頂点（vertex）に次のスーパーステップの実行を指示する（ステップＳ１０４）。
そして、各workerは、ステップＳ１０１〜Ｓ１０４を繰り返す。 Subsequently, each worker confirms whether or not the completion information has been received from all the workers (here, worker1 and worker2) including itself. Then, when each worker receives the completion information from all the workers, the worker updates the superstep to “+1” (step S103) and instructs each vertex in charge to execute the next superstep ( Step S104).
Then, each worker repeats steps S101 to S104.

比較例の分散同期処理システム１ａにおいては、スーパーステップ毎に、計算対象となる全ての頂点（vertex）を同期する。具体的には、図５に示す全体同期ポイントにおいて同期するため、最も遅い頂点（vertex）にあわせることとなる。例えば、図５のスーパーステップＳＳ１では、頂点（vertex）ｖ_１〜ｖ_６のうち、最も遅い頂点（vertex）ｖ_２にあわせることとなる。また、スーパーステップＳＳ２では、最も遅い頂点（vertex）ｖ_６にあわせることとなる。よって、著しく遅い頂点（vertex）があると、その頂点（vertex）にあわせるために、頂点（vertex）の処理全体が著しく遅延してしまう。 In the distributed synchronization processing system 1a of the comparative example, all the vertices to be calculated are synchronized for each superstep. Specifically, since the synchronization is performed at the entire synchronization point shown in FIG. 5, it is adjusted to the slowest vertex. For example, the Super step SS1 in FIG. 5, the vertex _(vertex) v of 1 to v _6, and thus to match the slowest vertex (vertex) _{v 2.} Furthermore, the Super step SS2, and thus fit the slowest vertex (vertex) _{v 6.} Therefore, if there is a remarkably slow vertex, the whole processing of the vertex will be significantly delayed in order to match that vertex.

このような問題を解決するため非特許文献４の分散同期処理システム１ｂでは、従来の全頂点（vertex）での同期を行わず、頂点（vertex）毎に次のスーパーステップへの移行を判断している。具体的には、非特許文献４の分散同期処理システム１ｂでは、「自頂点（vertex）および入力辺で接する全ての頂点（vertex）の計算・送信処理ｆ_ｎ（ローカル計算（フェーズＰＨ１）およびデータ交換（フェーズＰＨ２））が完了していること」（隣接同期）を満たしている場合に、各頂点（vertex）が次のスーパーステップに移行するものとしている。 In order to solve such a problem, in the distributed synchronization processing system 1b of Non-Patent Document 4, the conventional synchronization is not performed at all vertices, and the transition to the next super step is determined for each vertex. ing. Specifically, in the distributed synchronization processing system 1b of Non-Patent Document 4, “calculation / transmission processing f _n (local calculation (phase PH1) and data) of“ the own vertex (vertex) and all vertices (vertex) that are in contact with the input side ” When the exchange (phase PH2) is completed ”(adjacent synchronization), each vertex is assumed to move to the next super step.

図６は、図５において示した比較例の分散同期処理システム１ａが実行する処理（図６（ａ）参照）と、非特許文献４の分散同期処理システム１ｂが実行する処理（図６（ｂ）参照）とを示す図である。
非特許文献４の分散同期処理システム１ｂでは、上記のように、「自頂点（vertex）および入力辺で接する全ての頂点（vertex）の計算・送信処理ｆ_ｎ（ローカル計算（フェーズＰＨ１）およびデータ交換（フェーズＰＨ２））が完了していること」（隣接同期）により、次のスーパーステップに移行する。 6 is a process executed by the distributed synchronous processing system 1a of the comparative example shown in FIG. 5 (see FIG. 6A) and a process executed by the distributed synchronous processing system 1b of Non-Patent Document 4 (see FIG. 6B). ) Reference).
In the distributed synchronous processing system 1b of Non-Patent Document 4, as described above, “calculation / transmission processing f _n (local calculation (phase PH1) and data of the vertex (vertex) and all vertices (vertex) that are in contact with the input edge) is performed. "Exchange (Phase PH2)) has been completed" (adjacent synchronization), the process moves to the next super step.

例えば、図６（ｂ）の頂点（vertex）ｖ_２に着目すると、頂点（vertex）ｖ_２は、入力辺で接する頂点（vertex）ｖ_１，ｖ_３，ｖ_４の計算・送信処理ｆ_ｎと自身の計算・送信処理ｆ_ｎが終わった時点が隣接同期する隣接同期ポイントとなる。ここで頂点（vertex）ｖ_２は、スーパーステップＳＳ１のとき、自身の計算・送信処理ｆ_１の終了が頂点（vertex）ｖ_１，ｖ_３，ｖ_４より遅く一番後であったので、その時点が隣接同期ポイントとなっている。
頂点（vertex）ｖ_３に着目すると、頂点（vertex）ｖ_３は、入力辺で接する頂点（vertex）ｖ_２，ｖ_４の計算・送信処理ｆ_ｎと自身の計算・送信処理ｆ_ｎが終わった時点が隣接同期する隣接同期ポイントとなる。ここで頂点（vertex）ｖ_３は、スーパーステップＳＳ１のとき、自身の計算・送信処理ｆ_１が終わった時点では、頂点（vertex）ｖ_４の計算・送信処理ｆ_１は終わっているが、頂点（vertex）ｖ_２の計算・送信処理ｆ_１が終わっていないため、待機状態となり（図６（ｂ）の符号α）、頂点（vertex）ｖ_２の計算・送信処理ｆ_１が終わった時点が隣接同期する隣接同期ポイントとなる。
また、頂点（vertex）ｖ_１に着目すると、頂点（vertex）ｖ_１は、入力辺で接する頂点（vertex）は存在しない、よって、スーパーステップＳＳ１のとき、自頂点（vertex）の計算・送信処理ｆ_１が終了した時点が隣接同期する隣接同期ポイントとなる。 For example, paying attention to the vertex (vertex) v ₂ in FIG. 6B, the vertex (vertex) v ₂ is the calculation / transmission processing f _n of the vertices (vertex) v ₁ , v ₃ , v _{4 that} are in contact with the input side. The point at which the calculation / transmission process f _{n of} itself is completed is the adjacent synchronization point for adjacent synchronization. Here, since the end of the calculation / transmission process f ₁ of the vertex (vertex) v ₂ is later than that of the vertex (vertex) v ₁ , v ₃ , v ₄ in the super step SS1, The time point is the adjacent synchronization point.
Focusing on the vertex (vertex) _{v 3,} vertices (vertex) _{v 3} has finished apex _(vertex) _v 2, v ₄ of calculation and transmission processing _{f n} and their calculation and transmission processing _{f n} in contact with the input side The time point becomes an adjacent synchronization point for adjacent synchronization. Here apex (vertex) v _3, when the super step SS1, at the time the end of the calculation and transmission processing f ₁ itself, calculation and transmission processing f ₁ vertex (vertex) v ₄ is ending vertices Since the calculation / transmission process f ₁ of (vertex) v ₂ is not completed, the state becomes a standby state (reference numeral α in FIG. 6B), and the time when the calculation / transmission process f _{1 of} vertex (vertex) v ₂ is completed is It becomes an adjacent synchronization point for adjacent synchronization.
Further, paying attention to the vertex (vertex) v _1, vertex (vertex) v ₁ is the vertex in contact with the input side (vertex) is not present, therefore, when the super step SS1, calculation and transmission process of its own vertex (vertex) The time point when f ₁ ends is the adjacent synchronization point for adjacent synchronization.

図６（ｂ）に示すように、処理全体のある時点でみると、各頂点（vertex）間においてスーパーステップがずれる可能性がある。そのため、頂点（vertex）間でメッセージを送受信するときには上書きせずに、スーパーステップ毎に管理する。つまり、各workerは、スーパーステップの情報（ステップ番号）をあわせて記憶するようにする。 As shown in FIG. 6B, at some point in the entire process, there is a possibility that the supersteps will shift between the vertices. Therefore, when a message is sent and received between vertices, it is not overwritten but managed for each super step. That is, each worker stores the super step information (step number) together.

このようにすることで、図６（ａ）に示す、従来の分散同期処理システム１ａに比べ、フェーズ３の同期待ちをする時間が大幅に削減されるため、システム全体としての処理速度や各workerの利用効率を改善することが可能となった。 By doing so, compared to the conventional distributed synchronous processing system 1a shown in FIG. 6A, the time for waiting for the synchronization of the phase 3 is significantly reduced. Therefore, the processing speed of each system and each worker can be reduced. It has become possible to improve the usage efficiency of.

この図６（ｂ）に示した非特許文献４に記載の分散同期処理システム１ｂでは、全ての頂点（vertex）のローカル計算（フェーズＰＨ１）を並列実行できるという前提で評価を行っている。しかしながら、実際には、上記のように、各頂点（vertex）のローカル計算（フェーズＰＨ１）を実行するworkerは、各頂点（vertex）に個別のスレッドを割り当てて処理するため、ＣＰＵの１コアに複数のスレッドが割り当てられた場合に、予期せぬアイドル時間が生じる可能性がある。以下、具体的に説明する。 In the distributed synchronous processing system 1b described in Non-Patent Document 4 shown in FIG. 6B, the evaluation is performed on the assumption that local calculation (phase PH1) of all vertices can be executed in parallel. However, in reality, as described above, the worker executing the local calculation (phase PH1) of each vertex (vertex) allocates an individual thread to each vertex (vertex) and processes it. Unexpected idle time can occur when multiple threads are allocated. The details will be described below.

図７は、本発明の課題を説明するための図である。
ここでは、図７（ａ）に示すように、頂点（vertex）ｖ_１，ｖ_２，ｖ_３をworker１が担当し、頂点（vertex）ｖ_４，ｖ_５，ｖ_６をworker２が担当する。そして、頂点（vertex）間の矢印で示すようなグラフトポロジ（グラフＧ）であるものとする。
また、前提として、各頂点（vertex）には個別のスレッドが割り当てられるものとする。なお、ここでは、プロセスも含め、ＣＰＵの１コアのスケジューリング対象として、個別の実行単位が割り当てられる処理単位をスレッドと称する。 FIG. 7: is a figure for demonstrating the subject of this invention.
Here, as shown in FIG. 7A, the worker 1 takes charge of the vertices (vertex) v ₁ , v ₂ , v ₃ and the worker ₂ takes charge of the vertices (vertex) v ₄ , v ₅ , v ₆ . Then, it is assumed that the graph topology (graph G) is as indicated by the arrow between the vertices.
Further, as a premise, it is assumed that an individual thread is assigned to each vertex. In addition, here, a processing unit including a process, to which a single execution unit is assigned as a scheduling target of one core of the CPU is referred to as a thread.

図７（ａ）において、各頂点（vertex）それぞれには１つのスレッドが割り当てられる。図７等では、各workerにおけるスレッド（ここでは、３つのスレッド）を区別するために、濃いドット、薄いドット、ドットなしでスレッドを示している。また、各スレッドの矩形毎に付された番号は、スーパーステップの番号を示し、例えば、頂点（vertex）ｖ_１は、スーパーステップＳＳ１〜３を実行することを示す。そして、矩形の１つが、そのworkerのＣＰＵコアにおけるＣＰＵ時間の割り当て単位（Time Slice）を示す。例えば、頂点（vertex）ｖ_１は、スーパーステップＳＳ１において、３単位（Time Slice）のＣＰＵ時間の割り当てが必要であり、スーパーステップＳＳ２において、５単位（Time Slice）のＣＰＵ時間の割り当てが必要であり、スーパーステップＳＳ３において、３単位（Time Slice）のＣＰＵ時間の割り当てが必要であることを示している。 In FIG. 7A, one thread is assigned to each vertex. In FIG. 7 and the like, in order to distinguish the threads (here, three threads) in each worker, threads are shown with dark dots, light dots, and no dots. Further, the number given to each rectangle of each thread indicates the super step number, and for example, the vertex v ₁ indicates that the super steps SS1 to SS3 are executed. Then, one of the rectangles indicates a CPU time allocation unit (Time Slice) in the CPU core of the worker. For example, the vertex v ₁ needs to be assigned CPU time of 3 units (Time Slice) in the super step SS1, and needs to be assigned CPU time of 5 units (Time Slice) in the super step SS2. Yes, it indicates that the CPU time of 3 units (Time Slice) needs to be allocated in the super step SS3.

図７（ｂ）は、各workerのＣＰＵが１コアである場合のスレッド処理のスケジューリングにおけるＣＰＵ消費を示している。
スレッド処理のスケジューリングは、通常、実行可能なスレッドの割り当て単位（Time Slice）毎にラウンドロビン等で決定されるため、例えば、図７（ａ）の処理を、worker１，２が非特許文献４に記載の隣接同期を用いて実行すると、図７（ｂ）の符号αで示すように、頂点（vertex）ｖ_３のスーパーステップＳＳ１が終わらないと、入力辺で接する頂点（vertex）ｖ_４，ｖ_５，ｖ_６は、次のスーパーステップＳＳ２に進めない。よって、アイドル時間（矩形の中を斜線で示す）が発生してしまう。
また、図７（ｂ）の符号βで示すように、頂点（vertex）ｖ_３のスーパーステップＳＳ２が終わらないと、入力辺で接する頂点（vertex）ｖ_４，ｖ_５，ｖ_６は、次のスーパーステップＳＳ３に進めない。よって、アイドル時間が発生してしまう。 FIG. 7B shows the CPU consumption in the thread processing scheduling when the CPU of each worker has one core.
Since the scheduling of thread processing is normally determined by a round robin or the like for each allocation unit (Time Slice) of an executable thread, for example, the processing of FIG. When the execution is performed using the described adjacency synchronization, as shown by the symbol α in FIG. 7B, if the superstep SS1 of the vertex (vertex) v ₃ is not completed, the vertices (vertex) v ₄ , v contacting with the input side _5, _{v 6} does not proceed to the next super step SS2. Therefore, idle time (indicated by diagonal lines in the rectangle) occurs.
Further, as indicated by reference numeral β in FIG. 7 (b), when the super step SS2 vertices (vertex) _{v 3} does not end, the apex being in contact with the input side _{_{(vertex) v 4, v 5}} , v 6 is the following I can't proceed to Super Step SS3. Therefore, idle time occurs.

図７（ｂ）においては、説明を容易にするため、各workerがシングルコアの例で説明しているが、１コアに複数のスレッドが割り当てられる場合には、マルチコアでも同様の問題が発生する。つまり、各workerのリソース制約により、予期せぬアイドル時間が生じる可能性がある。
本実施形態に係る分散同期処理システム１では、後記するように、各頂点（vertex）のスレッドに適切な優先度を付与し、優先度が高いスレッドから優先的にＣＰＵ時間を割り当てることにより、アイドル時間を削減することができる。 In FIG. 7B, each worker is described as an example of a single core in order to facilitate the description. However, when a plurality of threads are assigned to one core, the same problem occurs in multicore. . In other words, due to resource constraints of each worker, unexpected idle time may occur.
In the distributed synchronous processing system 1 according to the present embodiment, as will be described later, an appropriate priority is given to a thread at each vertex, and a CPU time is preferentially assigned to a thread having a high priority, so that the idle You can save time.

≪本実施形態に係る分散同期処理システム≫
次に、本実施形態に係る分散同期処理システム１について説明する。
まず、分散同期処理システム１の処理概要について図８を参照して説明する。 << Distributed synchronous processing system according to the present embodiment >>
Next, the distributed synchronous processing system 1 according to the present embodiment will be described.
First, a processing outline of the distributed synchronous processing system 1 will be described with reference to FIG.

図８は、本実施形態に係る分散同期処理システム１の処理概要を説明するための図である。図８（ａ），（ｂ）は、図７（ａ），（ｂ）において示した図と同様の図であり、非特許文献４の分散同期処理システム１ｂにおいて各頂点（vertex）毎に隣接同期処理を行った場合でのアイドル時間の発生を示している。
図８（ｃ）は、本実施形態に係る分散同期処理システム１のworker（後記する「処理サーバ１０」）が、各頂点（vertex）のスレッドに優先度を付与し、優先度が高いスレッドからＣＰＵ時間を割り当てることにより、アイドル時間を削減した例を示している。 FIG. 8 is a diagram for explaining the processing outline of the distributed synchronization processing system 1 according to the present embodiment. 8A and 8B are diagrams similar to the diagrams shown in FIGS. 7A and 7B, and are adjacent to each vertex in the distributed synchronization processing system 1b of Non-Patent Document 4. The figure shows the occurrence of idle time when synchronous processing is performed.
In FIG. 8C, the worker (the “processing server 10” described later) of the distributed synchronous processing system 1 according to the present embodiment gives priority to the threads of each vertex (vertex). An example is shown in which the idle time is reduced by allocating the CPU time.

図８（ｃ）においては、worker１が担当する各頂点（vertex）ｖ_１，ｖ_２，ｖ_３の優先度を、優先度の高い順にｖ_３，ｖ_２，ｖ_１とし（以下、「ｖ_３＞ｖ_２＞ｖ_１」のように記載する。）、worker２が担当する各頂点（vertex）ｖ_４，ｖ_５，ｖ_６の優先度を、ｖ_５＞ｖ_４＞ｖ_６とした例を示している。各worker（処理サーバ１０）がこの優先度を導出するための所定のアルゴリズムは後記して詳細に説明する。 In FIG. 8C, the priorities of the vertices v ₁ , v ₂ , v ₃ that the worker 1 is in charge of are set to v ₃ , v ₂ , v _{1 in} descending order of priority (hereinafter, “v ₃ > V ₂ > v ₁ ”), and an example in which the priorities of the vertices v ₄ , v ₅ , v ₆ that the worker 2 is in charge are v ₅ > v ₄ > v ₆ are shown. ing. A predetermined algorithm for each worker (processing server 10) to derive this priority will be described later in detail.

各worker（処理サーバ１０）のＣＰＵのスレッド処理のスケジュールにおいて、優先度の高いスレッドからＣＰＵ時間に割り当てることにより、ＣＰＵのアイドル時間を削減することができる。図８（ｃ）では、頂点（vertex）ｖ_３の処理待ちのためのアイドル時間をなくして、頂点（vertex）ｖ_４が処理を実行することができる（図８（ｃ）の符号α）。これにより、優先度による制御をしていない図８（ｂ）の処理に比べ、worker（処理サーバ１０）のリソース利用効率を向上させ、システム全体として更なる高速化（実行時間の短縮）を図ることができる。 In the thread processing schedule of the CPU of each worker (processing server 10), the CPU idle time can be reduced by allocating the CPU time from the thread with the highest priority. In FIG. 8C, the idle time for waiting for the processing of the vertex v ₃ can be eliminated, and the processing of the vertex v ₄ can be performed (reference numeral α in FIG. 8C). As a result, the resource utilization efficiency of the worker (processing server 10) is improved compared to the processing of FIG. 8 (b) in which control is not performed by priority, and further speedup (reduction of execution time) is achieved for the entire system. be able to.

＜分散同期処理システムの構成＞
次に、本実施形態に係る分散同期処理システム１の構成について説明する。
図９は、本実施形態に係る分散同期処理システム１の構成を示す図である。
図９に示すように、分散同期処理システム１は、複数の処理サーバ１０（worker）と、各処理サーバ１０上で動作する複数の分散処理部２０（頂点（vertex））と、を備える。 <Structure of distributed synchronous processing system>
Next, the configuration of the distributed synchronization processing system 1 according to this embodiment will be described.
FIG. 9 is a diagram showing the configuration of the distributed synchronization processing system 1 according to the present embodiment.
As shown in FIG. 9, the distributed synchronous processing system 1 includes a plurality of processing servers 10 (workers) and a plurality of distributed processing units 20 (vertex) that operate on each processing server 10.

処理サーバ１０は、ＣＰＵ、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）等、一般的なコンピュータとしてのハードウエアを備えており、ＨＤＤには、ＯＳ（Operating System）、アプリケーションプログラム、各種データ等が格納されている。ＯＳおよびアプリケーションプログラムは、ＲＡＭに展開され、ＣＰＵによって実行される。なお、図９において、処理サーバ１０および分散処理部２０の内部は、ＲＡＭに展開されたアプリケーションプログラム等によって実現される機能（特徴構成）を、ブロックとして示している。
各処理サーバ１０（worker）上では、個々の計算処理（ローカル計算）に対応した複数の分散処理部２０がＢＳＰ（バルク同期並列）における頂点（vertex）として動作する。
以下、分散同期処理システム１を構成する各装置について詳細に説明する。 The processing server 10 includes hardware as a general computer such as a CPU, a RAM (Random Access Memory), a ROM (Read Only Memory), and an HDD (Hard Disk Drive). The HDD has an OS (Operating System). ), Application programs, various data, etc. are stored. The OS and the application program are expanded in the RAM and executed by the CPU. Note that, in FIG. 9, inside the processing server 10 and the distributed processing unit 20, the functions (characteristic configurations) realized by the application programs expanded in the RAM are shown as blocks.
On each processing server 10 (worker), a plurality of distributed processing units 20 corresponding to individual calculation processing (local calculation) operate as vertices in BSP (bulk synchronous parallel).
Hereinafter, each device constituting the distributed synchronous processing system 1 will be described in detail.

＜分散処理部（頂点（vertex））＞
分散処理部２０（頂点（vertex））は、ＢＳＰにおける所定の単位に区分された計算処理を実行し、数値計算部２１、メッセージ送受信部２２および状態通知部２３を含んで構成される。また、後記するように、処理サーバ１０（worker）の優先度算出アルゴリズムの設定に伴い、処理待ち投票部２４を備える場合がある。 <Distributed processing unit (vertex)>
The distributed processing unit 20 (vertex) executes calculation processing divided into predetermined units in the BSP, and includes a numerical calculation unit 21, a message transmission / reception unit 22, and a status notification unit 23. Further, as will be described later, the processing waiting voting unit 24 may be provided in accordance with the setting of the priority calculation algorithm of the processing server 10 (worker).

数値計算部２１は、ＢＳＰにおけるフェーズＰＨ１（ローカル計算）の処理を実行する。この数値計算部２１は、メッセージ送受信部２２を介して処理サーバ１０（worker）から受信する、次のスーパーステップへの移行指示（以下、「次ステップ移行指示」と称する。）に従い、次のスーパーステップへの移行を行う。なお、数値計算部２１は、自身の計算・送信処理ｆ_ｎが完了した後、次ステップ移行指示を受信するまで待機する。 The numerical calculation unit 21 executes the process of phase PH1 (local calculation) in BSP. The numerical calculation unit 21 follows the next super step transition instruction (hereinafter, referred to as “next step transition instruction”) received from the processing server 10 (worker) via the message transmission / reception unit 22 to the next super step. Move to step. It should be noted that the numerical calculation unit 21 waits until it receives the instruction to move to the next step after its own calculation / transmission process f _n is completed.

メッセージ送受信部２２は、他の分散処理部２０や処理サーバ１０（worker）との間での情報の送受信を行う。具体的には、メッセージ送受信部２２は、ＢＳＰにおけるフェーズＰＨ２（データ交換）において、自身の出力辺の状態を出力メッセージとして、その出力辺で接続する分散処理部２０（頂点（vertex））へ向けて送信する。なお、この出力メッセージには、その出力辺の状態に対応付けてその時点でのスーパーステップのステップ番号が付される。また、メッセージ送受信部２２は、入力辺で接続する分散処理部２０（頂点（vertex））から入力辺の状態を入力メッセージとして受信する。また、この入力メッセージには、その入力辺に状態に対応付けてその時点でのスーパーステップのステップ番号が付される。なお、メッセージ送受信部２２は、この出力メッセージおよび入力メッセージを、処理サーバ（worker）１０のメッセージ処理部１２を介して送受信する。
また、このメッセージ送受信部２２は、自身が属する処理サーバ１０（worker）から、次ステップ移行指示を受信し、数値計算部２１に出力する。 The message transmitting / receiving unit 22 transmits / receives information to / from the other distributed processing units 20 and the processing server 10 (worker). Specifically, in the phase PH2 (data exchange) in the BSP, the message transmitting / receiving unit 22 uses the state of its own output side as an output message and directs it to the distributed processing unit 20 (vertex) connected by the output side. To send. It should be noted that this output message is associated with the step number of the super step at that time in association with the state of the output side. Further, the message transmitting / receiving unit 22 receives the state of the input side as an input message from the distributed processing unit 20 (vertex) connected by the input side. In addition, the step number of the super step at that time is added to this input message in association with the state on the input side. The message transmitting / receiving unit 22 transmits / receives the output message and the input message via the message processing unit 12 of the processing server (worker) 10.
Further, the message transmission / reception unit 22 receives a next step transition instruction from the processing server 10 (worker) to which the message transmission / reception unit 22 belongs, and outputs it to the numerical calculation unit 21.

状態通知部２３は、ＢＳＰにおける自身の分散処理部２０（頂点（vertex））の状態を監視し、処理サーバ１０（worker）に状態情報として通知する。
具体的には、状態通知部２３は、後記する処理サーバ１０の優先度設定部１４に設定された所定の優先度設定アルゴリズムで必要となる自身の状態情報を検出し、処理サーバ１０に送信する。
例えば、状態通知部２３は、「計算・送信処理」（フェーズＰＨ１における計算処理およびフェーズＰＨ２におけるデータ交換）の際、自身の計算処理と出力メッセージ（出力辺の状態）の送信が終わった時点において、入力辺で接する他の分散処理部２０（頂点（vertex））からの入力メッセージがない状態（入力辺で接する全ての頂点（vertex）から入力メッセージを受信できていない状態）（以下、「入力待ち状態」と称する。）である場合に、自身が「入力待ち状態」であることを、処理サーバ１０（worker）に通知する。
状態通知部２３は、入力待ち状態であることを処理サーバ１０に通知する際に、現在の自身のスーパーステップｎのステップ番号を付して、通知するようにする。 The status notification unit 23 monitors the status of its own distributed processing unit 20 (vertex) in the BSP and notifies the processing server 10 (worker) of the status as status information.
Specifically, the state notification unit 23 detects its own state information required by the predetermined priority setting algorithm set in the priority setting unit 14 of the processing server 10 described later, and transmits it to the processing server 10. .
For example, the state notification unit 23, at the time of completing the calculation process of itself and the transmission of the output message (state of the output side) during the “calculation / transmission process” (the calculation process in the phase PH1 and the data exchange in the phase PH2). , A state in which there is no input message from another distributed processing unit 20 (vertex) that is in contact with the input side (a state in which an input message cannot be received from all vertices (vertex) that are in contact with the input side) (hereinafter, “input In the "waiting state"), the processing server 10 (worker) is notified that it is in the "input waiting state".
When notifying the processing server 10 that it is in the input waiting state, the state notifying unit 23 adds the step number of the current super step n of itself and notifies the processing server 10.

処理サーバ１０の後記する優先度設定部１４に、優先度算出アルゴリズムとして「遅延頂点投票方式」が設定された場合、分散処理部２０（頂点（vertex））には、処理待ち投票部２４が設けられる。 When the “delayed vertex voting method” is set as the priority calculation algorithm in the priority setting unit 14 (described later) of the processing server 10, the distributed processing unit 20 (vertex) is provided with the processing waiting voting unit 24. To be

処理待ち投票部２４は、フェーズＰＨ１における計算処理およびフェーズＰＨ２における出力メッセージの送信処理が終わった時点において、入力辺で接する他の分散処理部２０（頂点（vertex））からの入力メッセージがないために入力待ち状態となっている場合に、その入力辺で接する他の分散処理部２０（頂点（vertex））に「入力待ち情報」を送信する。各分散処理部２０（頂点（vertex））の処理待ち投票部２４は、スレッド処理の優先順を決定する指標となる優先度投票値を状態情報（初期値＝０）として保持しており、他の分散処理部２０（頂点（vertex））から、入力待ち情報を１つ受信する度に、自身の優先度投票値を「＋１」に更新する。そして、処理待ち投票部２４は、更新した自身の優先度投票値を処理サーバ１０に送信する。
なお、処理待ち投票部２４の上記処理（詳細は、図１３（ａ）を参照し後記する。）は、処理サーバ１０の優先度設定部１４に、優先度算出アルゴリズムとして「遅延頂点投票方式（パターン１）」が設定された場合の処理である。 Since the processing waiting voting unit 24 has no input message from another distributed processing unit 20 (vertex) that is in contact with the input side at the time when the calculation processing in the phase PH1 and the transmission processing of the output message in the phase PH2 are completed. When it is in the input waiting state, the “input waiting information” is transmitted to the other distributed processing units 20 (vertex) that are in contact with the input side. The process waiting voting unit 24 of each distributed processing unit 20 (vertex) holds a priority voting value, which is an index for determining the priority order of thread processing, as state information (initial value = 0). Every time one input waiting information is received from the distributed processing unit 20 (vertex), the priority voting value of itself is updated to “+1”. Then, the processing waiting voting unit 24 transmits the updated own priority voting value to the processing server 10.
Note that the above-described processing of the processing waiting voting unit 24 (details will be described later with reference to FIG. 13A) is performed by the priority setting unit 14 of the processing server 10 as the priority calculation algorithm “delayed vertex voting method ( This is the processing when "Pattern 1)" is set.

処理待ち投票部２４は、入力辺で接する他の分散処理部２０（頂点（vertex））から入力待ち情報を受信した場合に、自身が「入力待ち状態」または「実行可能状態」（フェーズＰＨ１のローカル計算とフェーズＰＨ２のデータ交換が完了し、次のスーパーステップｎ＋１のローカル計算を実行可能である状態）に該当するか否かを判定する。
そして、処理待ち投票部２４は、自身が入力待ち状態である場合に、受信した入力待ち情報に付された他の分散処理部２０（頂点（vertex））の優先度投票値を、自身が保持する優先度投票値に加える。そして、処理待ち投票部２４は、自身の遅延分（「＋１」）を加えて自身の優先度投票値を更新し、入力待ちとなっている他の分散処理部２０（頂点（vertex））に、更新した優先度投票値を入力待ち情報に付して送信する。
また、処理待ち投票部２４は、自身が実行可能状態である場合に、受信した入力待ち情報に付された他の分散処理部２０（頂点（vertex））の優先度投票値を、自身が保持する優先度投票値に加える。そして、処理待ち投票部２４は、自身が保持する優先度投票値を処理サーバ１０に送信する。
なお、処理待ち投票部２４の上記処理（詳細は、図１３（ｂ）参照し後記する。）は、処理サーバ１０の優先度設定部１４に、優先度算出アルゴリズムとして「遅延頂点投票方式（パターン２）」が設定された場合の処理である。 When the processing waiting voting unit 24 receives the input waiting information from another distributed processing unit 20 (vertex) that is in contact with the input side, the processing waiting voting unit 24 itself is in the “input waiting state” or the “executable state” (of the phase PH1). Whether the local calculation and the data exchange of the phase PH2 have been completed and the local calculation of the next super step n + 1 can be executed) are determined.
Then, when the processing waiting voting unit 24 is in the input waiting state, the processing waiting voting unit 24 holds the priority voting value of the other distributed processing unit 20 (vertex) attached to the received input waiting information. Add to the priority voting value. Then, the processing waiting voting unit 24 updates its own priority voting value by adding its own delay amount (“+1”) to other distributed processing units 20 (vertex) waiting for input. , The updated priority voting value is attached to the input waiting information and transmitted.
Also, the processing waiting voting unit 24 holds the priority voting value of the other distributed processing unit 20 (vertex) attached to the received input waiting information when the processing waiting voting unit 24 is in the executable state. Add to the priority voting value. Then, the processing waiting voting unit 24 transmits the priority voting value held by itself to the processing server 10.
Note that the above-described processing of the processing waiting voting unit 24 (details will be described later with reference to FIG. 13B) is performed by the priority setting unit 14 of the processing server 10 as the priority calculation algorithm “delayed vertex voting method (pattern 2) ”is set.

この処理待ち投票部２４は、他の分散処理部２０（頂点（vertex））から入力待ち情報を受信した時点におけるスーパーステップでの処理が完了すると、自身の優先度投票値をゼロに初期化する。 When the processing in the super step at the time of receiving the input waiting information from the other distributed processing unit 20 (vertex) is completed, the processing waiting voting unit 24 initializes its own priority voting value to zero. .

＜処理サーバ（worker）＞
処理サーバ１０（worker）（図９参照）は、他の処理サーバ１０（worker）と通信接続され、処理単位となる分散処理部２０（頂点（vertex））を複数備える。この処理サーバ１０（worker）は、自身が備える分散処理部２０（vertex）の処理の進行状態等を管理するとともに、他の処理サーバ１０（worker）との間での情報の送受信を行う。また、この処理サーバ１０（worker）は、仮想化制御部１１、メッセージ処理部１２、隣接同期分散管理部１３、優先度設定部１４および記憶部１５を含んで構成される。なお、この記憶部１５には、計算対象となる各分散処理部２０（頂点（vertex））の接続関係を示すグラフトポロジが記憶される。 <Processing server (worker)>
The processing server 10 (worker) (see FIG. 9) is communicatively connected to another processing server 10 (worker) and includes a plurality of distributed processing units 20 (vertex) that are processing units. The processing server 10 (worker) manages the progress of processing of the distributed processing unit 20 (vertex) included in the processing server 10 (worker) and transmits / receives information to / from another processing server 10 (worker). The processing server 10 (worker) includes a virtualization control unit 11, a message processing unit 12, an adjacent synchronization distribution management unit 13, a priority setting unit 14, and a storage unit 15. It should be noted that the storage unit 15 stores a graph topology indicating a connection relationship of each distributed processing unit 20 (vertex) to be calculated.

仮想化制御部１１は、仮想化技術に基づき、処理サーバ１０上に仮想化プラットホームを構築し、複数の分散処理部２０（仮想マシン）を配置する制御を行う。 The virtualization control unit 11 builds a virtualization platform on the processing server 10 based on the virtualization technology, and controls the placement of a plurality of distributed processing units 20 (virtual machines).

メッセージ処理部１２は、自身に属する各分散処理部２０（頂点（vertex））から、ＢＳＰにおけるフェーズＰＨ２（データ交換）の際に、出力辺の状態を示す出力メッセージを受け取り、計算対象のグラフＧのグラフトポロジに基づき、その出力辺で接続する分散処理部２０（頂点（vertex））に、受信した出力メッセージを、入力辺の状態を示す入力メッセージとして出力する。なお、以降、出力メッセージと入力メッセージとを特に区別しない場合、単に「メッセージ」と称する場合がある。 The message processing unit 12 receives an output message indicating the state of the output side from each distributed processing unit 20 (vertex) belonging to itself at the time of the phase PH2 (data exchange) in the BSP, and the calculation target graph G Based on the graph topology of (1), the received output message is output to the distributed processing unit 20 (vertex) connected by the output side as an input message indicating the state of the input side. In addition, hereinafter, when the output message and the input message are not particularly distinguished, they may be simply referred to as “message”.

隣接同期分散管理部１３は、自身に属する分散処理部２０（頂点（vertex））を監視し、各分散処理部２０（頂点（vertex））が、フェーズＰＨ１における計算処理およびフェーズＰＨ２における出力メッセージの送信処理が終わった時点において、入力辺で接する全ての分散処理部２０（頂点（vertex））からの入力メッセージ（入力辺の状態）が揃っているか否かを判定する。 The adjacent synchronization distribution management unit 13 monitors the distributed processing units 20 (vertex) belonging to itself, and each distributed processing unit 20 (vertex) monitors the calculation processing in phase PH1 and the output message in phase PH2. At the time when the transmission process is completed, it is determined whether or not the input messages (states of the input sides) from all the distributed processing units 20 (vertex) that are in contact with the input side are complete.

そして、隣接同期分散管理部１３は、自身に属する分散処理部２０（頂点（vertex））それぞれについて、入力メッセージ（入力辺の状態）が揃っている場合、つまり、隣接する分散処理部２０（頂点（vertex））から、次の計算ステップ（次のスーパーステップ）において必要な計算結果の取得が完了している場合には、本実施形態における隣接同期の条件「自頂点（vertex）および入力辺で接する全ての頂点（vertex）の計算・送信処理ｆ_ｎ（ローカル計算（フェーズＰＨ１）およびデータ交換（フェーズＰＨ２））が完了していること」（隣接同期）を満たすものとする。隣接同期分散管理部１３は、この場合に、次のスーパーステップに移行する（スーパーステップを「＋１」する。）ように、「次ステップ移行指示」を、その分散処理部２０（頂点（vertex））に出力する。
また、隣接同期分散管理部１３は、その分散処理部２０（頂点（vertex））が隣接同期の条件を満たしたこと、つまり、実行可能状態であることを優先度設定部１４に出力する。 Then, the adjacent synchronization distribution management unit 13 has input messages (states of input sides) of the respective distributed processing units 20 (vertex) belonging to itself, that is, the adjacent distributed processing units 20 (vertices). (Vertex)), when the acquisition of the necessary calculation result in the next calculation step (next super step) has been completed, the condition of the adjacent synchronization “in this embodiment (vertex) and input side It is assumed that the calculation / transmission processing f _n (local calculation (phase PH1) and data exchange (phase PH2)) of all the adjacent vertices is completed ”(adjacent synchronization). In this case, the adjacent synchronization distribution management unit 13 issues the "next step transition instruction" to the distribution processing unit 20 (vertex) so as to move to the next super step (the super step is incremented by "+1"). ) Is output.
In addition, the adjacent synchronization distribution management unit 13 outputs to the priority setting unit 14 that the distribution processing unit 20 (vertex) has satisfied the condition of adjacent synchronization, that is, is in the executable state.

隣接同期分散管理部１３は、入力辺で接する他の分散処理部２０（頂点（vertex））からの入力メッセージの入力待ち状態となっている分散処理部２０が、いずれかの分散処理部２０）から入力メッセージ（入力辺の状態）を受信した場合には、その受信を契機として、再度、入力メッセージ（入力辺の状態）が揃っているか否か、つまり、隣接同期の条件を満たすか否かの判定を実行する。 In the adjacent synchronization distribution management unit 13, one of the distributed processing units 20 waiting for input of an input message from another distributed processing unit 20 (vertex) that is in contact with the input side is one of the distributed processing units 20). When an input message (state of the input side) is received from, whether or not the input message (state of the input side) is complete again, that is, whether or not the condition of the adjacent synchronization is satisfied, upon receiving the input message The judgment of is executed.

優先度設定部１４は、実行可能状態となった各分散処理部２０（頂点（vertex））のスレッドに所定の優先度算出アルゴリズムの方式で算出した優先度を付し、優先度の高いスレッドから順にＣＰＵ時間を割り当てるスケジューリングを行う。優先度設定部１４がスケジューリングしたＣＰＵ時間の割り当てにそって、各分散処理部２０（頂点（vertex））のＣＰＵによる処理が実行させる。 The priority setting unit 14 attaches a priority calculated by the method of a predetermined priority calculation algorithm to the thread of each distributed processing unit 20 (vertex) in the executable state, Scheduling for sequentially allocating CPU time is performed. The CPU of each distributed processing unit 20 (vertex) executes processing according to the CPU time allocation scheduled by the priority setting unit 14.

優先度設定部１４には、例えば、以下に示す優先度算出アルゴリズムのいずれかが設定される。
（１）出力辺数優先方式（ＬＥＦ：Largest Edge First）
（２）遅延ステップ優先方式（ＬＳＦ：Latest Step First）
（３）出力辺数および遅延ステップの複合優先方式（ＨＬ２：Hybrid of LEF and LSF）
（４）遅延投票数優先方式（ＭＶＦ：Most Voted First）
以下、各方式について、具体的に説明する。 In the priority setting unit 14, for example, one of the following priority calculation algorithms is set.
(1) Output edge priority method (LEF: Largest Edge First)
(2) Delay step priority method (LSF: Latest Step First)
(3) Composite priority method of the number of output edges and delay steps (HL2: Hybrid of LEF and LSF)
(4) Delayed vote count priority method (MVF: Most Voted First)
Hereinafter, each method will be specifically described.

［出力辺数優先方式（ＬＥＦ）］
出力辺数優先方式（ＬＥＦ）は、実行可能状態（フェーズＰＨ１のローカル計算とフェーズＰＨ２のデータ交換が完了し、次のスーパーステップのローカル計算を実行可能である状態）の分散処理部２０（頂点（vertex））のうち、出力辺の数が最も多い分散処理部２０（頂点（vertex））にＣＰＵ時間を優先して割り当てる方式である。この出力辺数優先方式（ＬＥＦ）は、出力辺の数が多い、つまり、隣接する頂点（vertex）が多いほど、より多くの頂点（vertex）に影響を及ぼすため、優先的にＣＰＵ時間を割り当てるべき、というロジックに基づく。 [Output edge priority method (LEF)]
The output edge number priority method (LEF) is a distributed processing unit 20 (vertex) in an executable state (a state in which the local calculation of the phase PH1 and the data exchange of the phase PH2 are completed and the local calculation of the next super step can be executed). (Vertex)), the CPU is preferentially allocated to the distributed processing unit 20 (vertex) having the largest number of output sides. In this output side number priority method (LEF), the more the number of output sides is, that is, the more adjacent vertices are, the more vertices are affected. Therefore, the CPU time is preferentially allocated. It should be based on the logic that it should.

例えば、図１０に示すグラフトポロジ（グラフＧ）の場合では、worker１（処理サーバ１０）の３つの頂点（vertex）ｖ_１，ｖ_２，ｖ_３の各出力辺の数は、「１：１：４」となるため、優先度を「ｖ_３＞ｖ_２＝ｖ_１」と設定する。また、worker２（処理サーバ１０）の３つの頂点（vertex）ｖ_４，ｖ_５，ｖ_６の各出力辺の数は、「１：３：０」となるため、優先度を「ｖ_５＞ｖ_４＞ｖ_６」と設定する。
ここで、優先度設定部１４は、算出した優先度が同じである場合には、ランダムにどちらかの頂点（vertex）を優先させて設定してもよいし、同じ優先度の頂点（vertex）のスレッドを、ＣＰＵ時間の割り当て単位（Time Slice）毎に交互に実行するように設定してもよい。 For example, in the case of the graph topology (graph G) shown in FIG. 10, the number of output sides of the _three vertices v ₁ , v ₂ , v ₃ of worker 1 (processing server 10) is “1: 1: 4 ”, the priority is set as“ v ₃ > v ₂ = v ₁ ”. Further, the number of output sides of the three vertices v ₄ , v ₅ , v ₆ of the worker 2 (processing server 10) is “1: 3: 0”, so the priority is “v ₅ > v ₄ > v ₆ ”.
Here, when the calculated priorities are the same, the priority setting unit 14 may randomly set one of the vertices (vertex) to be preferentially set, or the vertices of the same priority (vertex). The threads may be set to be executed alternately for each CPU time allocation unit (Time Slice).

図８（ｃ）に示すＣＰＵ消費の優先度制御の結果は、出力辺数優先方式（ＬＥＦ）において、図１０に示すように、頂点ｖ_２と頂点ｖ_１の優先度が同じ（ｖ_２＝ｖ_１）であった場合に、頂点ｖ_２を頂点ｖ_１より優先するように設定し、全体の優先度を「ｖ_３＞ｖ_２＞ｖ_１」として設定した例を示している。
このように、優先度設定部１４が、優先度を適切に設定することにより、ＣＰＵのアイドル時間を削減することが可能となる。 As shown in FIG. 10, the result of the priority control of the CPU consumption shown in FIG. 8C is that the vertexes v ₂ and v ₁ have the same priority (v ₂ = v ₁ ), the vertex v ₂ is set to have priority over the vertex v ₁ , and the overall priority is set as “v ₃ > v ₂ > v ₁ ”.
In this way, the priority setting unit 14 can reduce the idle time of the CPU by appropriately setting the priority.

［遅延ステップ優先方式（ＬＳＦ）］
遅延ステップ優先方式（ＬＥＦ）は、実行可能状態の分散処理部２０（頂点（vertex））のうち、実行ステップ（現時点のスーパーステップ）が最も遅い分散処理部２０（頂点（vertex））にＣＰＵ時間を優先して割り当てる方式である。この遅延ステップ優先方式（ＬＥＦ）は、スーパーステップの処理が遅れている頂点（vertex）ほど、隣接する頂点（vertex）により大きな影響を及ぼすため、優先的にＣＰＵ時間を割り当てるべき、というロジックに基づく。 [Delay step priority method (LSF)]
In the delay step priority method (LEF), among the distributed processing units 20 (vertex) in the executable state, the distributed processing unit 20 (vertex) having the slowest execution step (current super step) has the CPU time. Is a method of prioritizing allocation. The delay step priority method (LEF) is based on the logic that the CPU time should be preferentially allocated because the vertex having the later superstep processing has a greater effect on the adjacent vertex. .

例えば、図１１に示すグラフトポロジ（グラフＧ）の場合では、worker１（処理サーバ１０）の３つの頂点（vertex）ｖ_１，ｖ_２，ｖ_３それぞれの現時点のスーパーステップｎは、「４：５：５」となるため、優先度を「ｖ_１＞ｖ_２＝ｖ_３」と設定する。また、worker２（処理サーバ１０）の３つの頂点（vertex）ｖ_４，ｖ_５，ｖ_６それぞれのスーパーステップｎは、「５：５：４」となるため、優先度を「ｖ_６＞ｖ_４＝ｖ_５」と設定する。
ここで、優先度設定部１４は、算出した優先度が同じである場合には、ランダムにどちらかの頂点（vertex）を優先させて設定してもよいし、同じ優先度の頂点（vertex）のスレッドを、ＣＰＵ時間の割り当て単位（Time Slice）毎に交互に実行するように設定してもよい。 For example, in the case of the graph topology (graph G) shown in FIG. 11, the current super step n of each of the _three vertices (vertex) v ₁ , v ₂ , v ₃ of the worker 1 (processing server 10) is “4: 5. : 5 ”, the priority is set as“ v ₁ > v ₂ = v ₃ ”. Further, since the superstep n of each of the three vertices v ₄ , v ₅ , v _{6 of} the worker 2 (processing server 10) is “5: 5: 4”, the priority is “v ₆ > v ₄ = V ₅ ”.
Here, when the calculated priorities are the same, the priority setting unit 14 may randomly set one of the vertices (vertex) to be preferentially set, or the vertices of the same priority (vertex). The threads may be set to be executed alternately for each CPU time allocation unit (Time Slice).

優先度設定部１４は、実行可能状態の分散処理部２０（頂点（vertex））について、その実行ステップ（現時点のスーパーステップ）が変わる度に、動的に優先度を算出し直して更新する。 The priority setting unit 14 dynamically recalculates and updates the priority of the distributed processing unit 20 (vertex) in the executable state each time the execution step (current superstep) changes.

［出力辺数および遅延ステップの複合優先方式（ＨＬ２）］
出力辺数および遅延ステップの複合優先方式（ＨＬ２）（以下「複合優先方式（ＨＬ２）」と略す。）は、実行可能状態の分散処理部２０（頂点（vertex））のうち、出力辺の数と実行ステップの双方に基づき、各頂点（vertex）の優先度を算出し、優先度が最も高い頂点（vertex）にＣＰＵ時間を優先して割り当てる方式である。この複合優先方式（ＨＬ２）は、どの頂点（vertex）にＣＰＵ時間を優先して割り当てるべきかは、出力辺の数、実行ステップのいずれか一方のみではなく双方を考慮すべき、というロジックに基づく。 [Composite priority method of number of output edges and delay step (HL2)]
The composite priority method (HL2) of the number of output edges and the delay step (hereinafter abbreviated as “composite priority method (HL2)”) is the number of output edges of the executable distributed processing unit 20 (vertex). And the execution step, the priority of each vertex is calculated, and the CPU time is preferentially assigned to the vertex having the highest priority. This composite priority method (HL2) is based on the logic that which CPU should be preferentially allocated to which CPU time should be considered, not only the number of output edges or the execution step, but both of them. .

優先度設定部１４は、出力辺数に基づく優先度（第１の優先度）、実行ステップに基づく優先度（第２の優先度）のそれぞれに重みをかけ、その和を算出する。複合優先方式（ＨＬ２）の優先度の算出式（式１）を、図１２に示す。優先度設定部１４は、（式１）を用いて、各頂点（vertex）の優先度を算出し、その算出した値が高い頂点（vertex）を優先してＣＰＵ時間を割り当てる。 The priority setting unit 14 weights each of the priority based on the number of output sides (first priority) and the priority based on execution steps (second priority), and calculates the sum. FIG. 12 shows a priority calculation formula (Formula 1) of the composite priority system (HL2). The priority setting unit 14 calculates the priority of each vertex (vertex) using (Equation 1), and preferentially allocates the CPU time to the vertex (vertex) having the higher calculated value.

［遅延投票数優先方式（ＭＶＦ）］
遅延投票数優先方式は、遅延している頂点（vertex）に対し、その頂点（vertex）からの入力メッセージ待ちで待機状態（入力待ち状態）にある頂点（vertex）が投票し、投票数が最も多い頂点（vertex）にＣＰＵ時間を優先して割り当てる方式である。この遅延投票数優先方式（ＭＶＦ）は、実際に隣接する頂点（vertex）を入力待ち状態にさせている頂点（vertex）に対し、優先的にＣＰＵ時間を割り当てるべき、というロジックに基づく。 [Delayed voting count priority method (MVF)]
In the delayed voting count priority method, a vertex (vertex) that is in a waiting state (input waiting state) waits for an input message from that vertex (vertex) to vote for the delayed vertex (vertex). In this method, CPU time is preferentially assigned to a large number of vertices. This delayed voting number priority method (MVF) is based on the logic that CPU time should be preferentially assigned to vertices that actually put adjacent vertices in an input waiting state.

この遅延投票数優先方式（ＭＶＦ）では、各分散処理部２０（頂点（vertex））の処理待ち投票部２４から上記の優先度投票値を受信し、その値がより高い頂点（vertex）を優先してＣＰＵ時間を割り当てる。
具体的には、優先度設定部１４は、各分散処理部２０（頂点（vertex））の処理待ち投票部２４と連携して以下の処理を実行する。 In this delayed voting count priority method (MVF), the above-mentioned priority voting value is received from the processing waiting voting unit 24 of each distributed processing unit 20 (vertex), and the higher vertex value is given priority. And allocate CPU time.
Specifically, the priority setting unit 14 executes the following processing in cooperation with the processing waiting voting unit 24 of each distributed processing unit 20 (vertex).

（パターン１：自身の遅延のみを投票）
各分散処理部２０（頂点（vertex））の処理待ち投票部２４は、入力待ち状態となった、つまり、遅延の原因となっている頂点（vertex）に入力待ち情報を送信する。つまり、処理待ち投票部２４が、遅延の原因となっている頂点（vertex）に１票を投票する。これにより、入力待ち情報を受信した分散処理部２０（頂点（vertex））の処理待ち投票部２４は、自身の優先度投票値（Ｐｖ）を「＋１」に更新する。
図１３（ａ）に示す例では、頂点ｖ_２が、頂点ｖ_１からの入力メッセージ待ちで待機状態（入力待ち状態）になると、遅延の原因となっている頂点ｖ_１に、入力待ち情報を送信（１票を投票）する（図１３（ａ）の符号α_１）。これにより、頂点ｖ_１の処理待ち投票部２４は、優先度投票値（Ｐｖ_１）を「＋１」に更新する（図１３（ａ）の符号α_２）。 (Pattern 1: Vote for own delay only)
The process waiting voting unit 24 of each distributed processing unit 20 (vertex) transmits the input waiting information to the input waiting state, that is, the vertex causing the delay. That is, the processing waiting voting unit 24 votes one vote to the vertex that is the cause of the delay. As a result, the processing waiting voting unit 24 of the distributed processing unit 20 (vertex) receiving the input waiting information updates its own priority voting value (Pv) to “+1”.
In the example shown in FIG. 13A, when the vertex v ₂ enters a waiting state (input waiting state) waiting for an input message from the vertex v ₁ , input waiting information is sent to the vertex v ₁ causing the delay. Transmission (voting 1 vote) (reference numeral α _{1 in} FIG. 13A). As a result, the processing waiting voting unit 24 of the vertex v ₁ updates the priority voting value (Pv ₁ ) to “+1” (reference numeral α _{2 in} FIG. 13A).

各分散処理部２０（頂点（vertex））の処理待ち投票部２４は、優先度投票値が更新されると、処理サーバ１０の優先度設定部１４に、その更新した優先度投票値を送信する。そして、優先度設定部１４が、受信した優先度投票値が高い分散処理部２０（頂点（vertex））のスレッドを優先するように優先度の設定を行い、ＣＰＵ時間を割り当てる。 When the priority voting value is updated, the processing waiting voting unit 24 of each distributed processing unit 20 (vertex) transmits the updated priority voting value to the priority setting unit 14 of the processing server 10. . Then, the priority setting unit 14 sets the priority so as to give priority to the thread of the distributed processing unit 20 (vertex) having a high priority vote value received, and allocates the CPU time.

（パターン２：遅延の連鎖を投票）
各分散処理部２０（頂点（vertex））の処理待ち投票部２４は、自頂点（vertex）が遅延の原因となっており、入力待ち情報を受信して自身の優先度投票値を更新した場合、その更新した優先度投票値に、自身の遅延分（＋１）を加えて、遅延の原因となっている頂点（vertex）に入力待ち情報を送信（優先度投票値を投票）する（図１３の符号β_１）。 (Pattern 2: Vote for delay chain)
When the processing waiting voting unit 24 of each distributed processing unit 20 (vertex) updates the priority voting value of itself by receiving the input waiting information because the own vertex (vertex) causes the delay. , The delay amount (+1) of its own is added to the updated priority voting value, and input waiting information is transmitted to the vertex causing the delay (voting the priority voting value) (FIG. 13). Sign β ₁ ).

図１３（ｂ）に示す例では、頂点ｖ_６が遅延の原因になっている頂点とすると、頂点ｖ_６からの入力メッセージ待ちで、頂点ｖ_４が遅延する。さらに、頂点ｖ_４からの入力メッセージ待ちで、頂点ｖ_２，ｖ_３，ｖ_５が遅延する（図１３（ｂ）の符号β_２）。
よって、頂点ｖ_２は、頂点ｖ_４に入力待ち情報（１票）を送信する。また、頂点ｖ_２は、頂点ｖ_３にも入力待ち情報（１票）を送信する。頂点ｖ_３は、優先度投票値を更新（＋１）し、自身の遅延分（＋１）を加えた入力待ち情報（２票）を頂点ｖ_４に送信する。頂点ｖ_５は、頂点ｖ_４に入力待ち情報（１票）を送信する。そして、頂点ｖ_４は、受信した入力待ち情報を合計（１＋２＋１＝４票）に、自身の遅延分（＋１）を加えた入力待ち情報（５票）を、頂点ｖ_６に送信する。 In the example shown in FIG. 13B, assuming that the vertex v ₆ is the cause of the delay, the vertex v ₄ is delayed while waiting for an input message from the vertex v ₆ . Furthermore, in the input message queue from the vertex _{v 4,} vertex _{_v} _2, _v _3, _v ₅ is delayed (sign beta ₂ in FIG. 13 (b)).
Therefore, the vertex v ₂ transmits the input waiting information (1 vote) to the vertex v ₄ . The vertex v ₂ also transmits the input waiting information (1 vote) to the vertex v ₃ . The vertex v ₃ updates the priority voting value (+1) and transmits the input waiting information (2 votes) to which the delay amount (+1) of itself is added to the vertex v ₄ . The vertex v ₅ transmits the input waiting information (1 vote) to the vertex v ₄ . Then, the vertex v ₄ transmits the input wait information (5 votes) obtained by adding the received delay information (1 + 2 + 1 = 4 votes) and its own delay (+1) to the vertex v ₆ .

各分散処理部２０（頂点（vertex））の処理待ち投票部２４は、自身が実行可能状態である場合において、入力待ち情報を他の分散処理部２０（頂点（vertex））から受信したときに、受信した入力待ち情報に付された優先度投票値を用いて自身の優先度投票値を更新する。そして、その処理待ち投票部２４は、処理サーバ１０の優先度設定部１４に、その更新した優先度投票値を送信する。
図１３（ｂ）では、頂点ｖ_５の優先度投票値（初期値＝０）の場合に、頂点ｖ_４から入力待ち情報（優先度投票値＝５票）を受信したとき、頂点ｖ_５の処理待ち投票部２４は、自身の優先度投票値を更新（０＋５＝５票）し、その更新した優先度投票値（５票）を、処理サーバ１０の優先度設定部１４に送信する。そして、優先度設定部１４は、受信した優先度投票値が高い分散処理部２０（頂点（vertex））のスレッドを優先するように優先度の設定を行い、ＣＰＵ時間を割り当てる。 The process waiting voting unit 24 of each distributed processing unit 20 (vertex) receives the input waiting information from another distributed processing unit 20 (vertex) when it is in the executable state. , Updates its own priority voting value using the priority voting value attached to the received input waiting information. Then, the processing waiting voting unit 24 transmits the updated priority voting value to the priority setting unit 14 of the processing server 10.
In FIG. 13 (b), when the priority vote value of vertex _{v 5} (initial value = 0), when receiving an input waiting information (priority voting value = 5 votes) from the vertex _{v 4,} vertex _{v 5} The processing waiting voting unit 24 updates its own priority voting value (0 + 5 = 5 votes), and sends the updated priority voting value (5 votes) to the priority setting unit 14 of the processing server 10. Then, the priority setting unit 14 sets the priority so as to give priority to the thread of the distributed processing unit 20 (vertex) having a high priority vote value received, and allocates the CPU time.

このように、遅延している頂点（vertex）に対し、その頂点（vertex）からの入力メッセージ待ちで待機状態（入力待ち状態）にある頂点（vertex）が投票し、投票数が最も多い頂点（vertex）にＣＰＵ時間を優先して割り当てる。これにより、より影響範囲の広い（影響度の高い）頂点（vertex）ほど優先度を高くすることができる。 In this way, for a delayed vertex (vertex), a vertex (vertex) in a waiting state (input waiting state) waits for an input message from that vertex (vertex), and the vertex with the highest number of votes ( CPU time is preferentially assigned to (vertex). As a result, the priority can be set higher for a vertex having a wider influence range (higher influence degree).

≪分散同期処理システムの動作≫
次に、本実施形態に係る分散同期処理システム１の動作について説明する。
図１４は、本実施形態に係る分散同期処理システム１の処理の流れを示すフローチャートである。
なお、ここでは、予め対象とする計算処理に必要な個々の計算処理（頂点（vertex））の設定と、その個々の計算処理（頂点（vertex））の各処理サーバ１０（worker）への割り振りが終わっているものとして説明する。この個々の計算処理（頂点）の設定と、各処理サーバ１０（worker）への割り振りとは、例えば、これらの機能を、システム全体の管理サーバを備えさせたり、処理サーバ１０の中の代表サーバに備えさせたりすることにより、事前に実行しておけばよい。 << Operation of distributed synchronous processing system >>
Next, the operation of the distributed synchronization processing system 1 according to this embodiment will be described.
FIG. 14 is a flowchart showing the flow of processing of the distributed synchronization processing system 1 according to this embodiment.
In addition, here, the setting of the individual calculation process (vertex (vertex)) necessary for the target calculation process in advance and the allocation of the individual calculation process (vertex (vertex)) to each processing server 10 (worker) are performed. Will be explained as if it has ended. The setting of each individual calculation process (vertex) and the allocation to each processing server 10 (worker) may be performed, for example, by providing these functions with a management server for the entire system or a representative server in the processing server 10. It may be executed in advance by preparing for.

まず、処理サーバ１０の隣接同期分散管理部１３は、自身に属する分散処理部２０（頂点）を監視することにより、ある分散処理部２０（頂点）について、フェーズＰＨ１における計算処理およびフェーズＰＨ２における出力メッセージの送信処理が完了したことを検出する（ステップＳ１）。 First, the adjacent synchronization distribution management unit 13 of the processing server 10 monitors the distributed processing units 20 (vertices) belonging to itself, thereby performing calculation processing in phase PH1 and output in phase PH2 for a certain distributed processing unit 20 (vertex). It is detected that the message transmission process is completed (step S1).

続いて、隣接同期分散管理部１３は、その分散処理部２０（頂点）について、入力辺で接する全ての分散処理部２０（頂点）からの入力メッセージ（入力辺の状態）が揃っているか否かを判定する（ステップＳ２）。つまり、隣接同期分散管理部１３は、隣接同期の条件を満たすか否かを判定する。 Next, the adjacent synchronization distribution management unit 13 determines whether or not all the input messages (states of the input sides) from all the distribution processing units 20 (vertices) that are in contact with the input side for the distribution processing unit 20 (vertices) are complete. Is determined (step S2). That is, the adjacent synchronization distribution management unit 13 determines whether or not the conditions for adjacent synchronization are satisfied.

ここで、隣接同期分散管理部１３は、入力メッセージ（入力辺の状態）が揃っていないと判定した場合には（ステップＳ２→Ｎｏ）、その分散処理部２０（頂点）が入力待ち状態であると把握する（ステップＳ３）。なお、隣接同期分散管理部１３は、各分散処理部２０（頂点）から入力待ち状態であることの通知を受信することにより、その分散処理部２０（頂点）が入力待ち状態であることを把握してもよい。 Here, when the adjacent synchronization distribution management unit 13 determines that the input messages (states of the input sides) are not complete (step S2 → No), the distribution processing unit 20 (vertex) is in the input waiting state. (Step S3). Note that the adjacent synchronization distribution management unit 13 recognizes that the distributed processing unit 20 (vertex) is in the input waiting state by receiving the notification that the distributed processing unit 20 (vertex) is in the input waiting state. You may.

続いて、隣接同期分散管理部１３は、入力待ち状態で待機している分散処理部２０（頂点）に対し、入力辺で接するいずれかの分散処理部２０（頂点）から、当該分散処理部２０（頂点）が入力メッセージ（入力辺の状態）を受信したか否かを判定する（ステップＳ４）。
そして、隣接同期分散管理部１３は、入力待ち状態で待機している分散処理部２０（頂点）が入力メッセージを受信していなければ（ステップＳ４→Ｎｏ）、受信するまで待つ。一方、隣接同期分散管理部１３は、入力待ち状態で待機している分散処理部２０（頂点）が入力メッセージを受信した場合には（ステップＳ４→Ｙｅｓ）、そのことを契機として、ステップＳ２に戻る。 Next, the adjacent synchronization distribution management unit 13 selects the distributed processing unit 20 (vertex) that is in an input waiting state from any of the distributed processing units 20 (vertex) that are in contact with the distributed processing unit 20 (vertex). It is determined whether the (vertex) has received the input message (state of the input side) (step S4).
Then, if the distributed processing unit 20 (vertex) waiting in the input waiting state has not received the input message (step S4 → No), the adjacent synchronization distribution management unit 13 waits until the message is received. On the other hand, when the distributed processing unit 20 (vertex) waiting in the input waiting state receives the input message (step S4 → Yes), the adjacent synchronization distribution management unit 13 proceeds to step S2 as a trigger. Return.

ステップＳ２において、隣接同期分散管理部１３は、入力メッセージ（入力辺の状態）が揃っている（隣接同期の条件を満たす）と判定した場合には（ステップＳ２→Ｙｅｓ）、ステップＳ５に進む。 When it is determined in step S2 that the adjacent synchronization distribution management unit 13 has all the input messages (input side states) (conditions for adjacent synchronization) (step S2 → Yes), the process proceeds to step S5.

ステップＳ５において、優先度設定部１４は、実行可能状態となった分散処理部２０（頂点）のスレッドに、所定の優先度算出アルゴリズムの方式で優先度を算出し、その算出した優先度の高いスレッドから順にＣＰＵ時間を割り当てるスケジューリングを行う（優先度設定処理を実行）。 In step S5, the priority setting unit 14 calculates the priority of the thread of the distributed processing unit 20 (vertex) in the executable state by the method of the predetermined priority calculation algorithm, and the calculated priority is high. Scheduling is performed to allocate CPU time in order from threads (priority setting processing is executed).

なお、優先度設定部１４に、上記した、（１）出力辺数優先方式（ＬＥＦ）、（２）遅延ステップ優先方式（ＬＳＦ）、（３）出力辺数および遅延ステップの複合優先方式（ＨＬ２）、（４）遅延投票数優先方式（ＭＶＦ）のいずれかが、予め設定されることにより、ステップＳ５の優先度設定処理が実行される。 In the priority setting unit 14, the above-mentioned (1) output edge number priority method (LEF), (2) delay step priority method (LSF), (3) output edge number and delay step composite priority method (HL2). ) And (4) the delayed vote number priority method (MVF) is set in advance, the priority setting process of step S5 is executed.

また、隣接同期分散管理部１３は、ステップＳ２において隣接同期の条件を満たした分散処理部２０（頂点）について、次のスーパーステップに移行するように、「次ステップ移行指示」を出力する（ステップＳ６）。これにより、「次ステップ移行指示」を受信した分散処理部２０（頂点）の数値計算部２１は、次のスーパーステップ（ｎ＋１）の計算・送信処理ｆ_ｎ＋１を実行する。 Further, the adjacent synchronization distribution management unit 13 outputs a "next step transition instruction" so as to move to the next super step for the distributed processing unit 20 (vertex) that satisfies the conditions for adjacent synchronization in step S2 (step). S6). As a result, the numerical calculation unit 21 of the distributed processing unit 20 (vertex) that has received the “next step transition instruction” executes the calculation / transmission processing f _{n + 1} of the next super step (n + 1).

以上説明したように、本実施形態に係る分散同期処理システム１および分散同期処理方法によれば、処理サーバ１０（worker）のリソース制約による予期せぬアイドル時間を抑制することができる。
これにより、各処理サーバ１０（worker）のリソース利用効率が向上し、同期処理計算の高速化（実行時間の短縮）を図ることができる。 As described above, according to the distributed synchronization processing system 1 and the distributed synchronization processing method according to the present embodiment, it is possible to suppress an unexpected idle time due to the resource constraint of the processing server 10 (worker).
As a result, the resource utilization efficiency of each processing server 10 (worker) is improved, and it is possible to speed up the synchronization processing calculation (reduce the execution time).

なお、本実施形態においては、隣接同期の条件を満たすか否かの判定を、各処理サーバ１０（worker）において、自律分散的に実行するものとした。しかしながら、分散同期処理システム１において、各処理サーバ１０（worker）と通信接続される管理サーバ（master）を設け、その管理サーバ（master）が、各分散処理部２０（頂点（vertex））について、隣接同期の条件を満たすか否かの判定を行うようにしてもよい。 In the present embodiment, each processing server 10 (worker) determines whether or not the adjacent synchronization condition is satisfied in an autonomously distributed manner. However, in the distributed synchronous processing system 1, a management server (master) that is communicatively connected to each processing server 10 (worker) is provided, and the management server (master) is provided with respect to each distributed processing unit 20 (vertex). You may make it determine whether the conditions of adjacent synchronization are satisfied.

１分散同期処理システム
１０処理サーバ（worker）
１１仮想化処理部
１２メッセージ処理部
１３隣接同期分散管理部
１４優先度設定部
１５記憶部
２０分散処理部２０（頂点（vertex））
２１数値計算部
２２メッセージ送受信部
２３状態通知部
２４処理待ち投票部
1 distributed synchronous processing system 10 processing server (worker)
11 Virtualization Processing Unit 12 Message Processing Unit 13 Adjacent Synchronization Distributed Management Unit 14 Priority Setting Unit 15 Storage Unit 20 Distributed Processing Unit 20 (Vertex)
21 Numerical Calculation Section 22 Message Transmitting / Receiving Section 23 Status Notification Section 24 Pending Voting Section

Claims

複数の処理サーバと、前記処理サーバ上で動作する複数の分散処理部と、を有する分散同期処理システムであって、
前記分散処理部のそれぞれには、当該分散処理部自身が属する処理サーバのＣＰＵにおける１コアのスケジューリング対象として、個別の実行単位である１スレッドが割り当てられるものであり、
前記処理サーバのそれぞれは、
前記分散処理部による所定の計算ステップにおける計算処理、および、当該分散処理部の接続先となる他の分散処理部との間でのデータ送受信を完了し、当該分散処理部が、次の前記計算ステップについて実行可能な状態を示す実行可能状態であることを検出する隣接同期分散管理部と、
前記実行可能状態となった分散処理部それぞれに、当該分散処理部の前記計算処理の遅れにより前記他の分散処理部のＣＰＵを処理待ちにさせる影響が大きいほど、前記計算処理の遅れる分散処理部に高い優先度を設定する所定の優先度算出アルゴリズムを用いて前記優先度を設定し、前記優先度の高い分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てる優先度設定部と、
を備えることを特徴とする分散同期処理システム。 A distributed synchronous processing system comprising: a plurality of processing servers; and a plurality of distributed processing units operating on the processing servers,
One thread, which is an individual execution unit, is assigned to each of the distributed processing units as a scheduling target of one core in the CPU of the processing server to which the distributed processing unit itself belongs,
Each of the processing servers
Completion of calculation processing in a predetermined calculation step by the distributed processing unit and data transmission / reception with another distributed processing unit to which the distributed processing unit is connected, and the distributed processing unit performs the next calculation. An adjacent synchronization distribution management unit that detects an executable state indicating an executable state for a step,
A distributed processing unit that delays the calculation process as the effect of causing the CPU of the other distributed processing unit to wait for the processing of the distributed processing unit in the executable state is delayed due to the delay of the calculation process of the distributed processing unit. A priority setting unit that sets the priority using a predetermined priority calculation algorithm that sets a high priority to, and preferentially allocates the threads of the distributed processing unit having the high priority to the processing schedule of the CPU;
A distributed synchronous processing system comprising:

前記処理サーバのそれぞれは、
計算対象となる各分散処理部の接続関係として、前記分散処理部と当該分散処理部が算出したデータの出力先となる他の分散処理部とが出力辺で繋がり、前記分散処理部と次の前記計算ステップで必要となるデータの入力元となる他の分散処理部とが入力辺で繋がることを示すグラフトポロジを記憶する記憶部を備えており、
前記所定の優先度算出アルゴリズムは、前記実行可能状態となった分散処理部それぞれについての前記出力辺の数を算出し、前記算出した出力辺の数が多い前記分散処理部の優先度を高く設定するものであり、
前記優先度設定部は、当該所定の優先度算出アルゴリズムを用いて前記分散処理部それぞれの優先度を設定し、設定した優先度のうち、より高い優先度が設定された分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てること
を特徴とする請求項１に記載の分散同期処理システム。 Each of the processing servers
As a connection relation of each distributed processing unit to be calculated, the distributed processing unit and another distributed processing unit that is an output destination of the data calculated by the distributed processing unit are connected by an output side, and the distributed processing unit and the following A storage unit is provided for storing a graph topology indicating that another distributed processing unit serving as an input source of data required in the calculation step is connected by an input side,
The predetermined priority calculation algorithm calculates the number of the output sides for each of the distributed processing units in the executable state, and sets the priority of the distributed processing unit having a large number of the calculated output sides to be high. Is what
The priority setting unit sets the priority of each of the distributed processing units by using the predetermined priority calculation algorithm, and among the set priorities, sets the threads of the distributed processing unit to which the higher priority is set. distributed synchronization system according to claim 1, characterized in that assigned to the preferential treatment schedule of the CPU.

前記隣接同期分散管理部は、各前記分散処理部が実行している前記計算ステップのステップ番号を監視しており、
前記所定の優先度算出アルゴリズムは、前記実行可能状態となった分散処理部それぞれについての前記計算ステップのステップ番号を参照し、他の前記分散処理部と比べ処理が遅延している分散処理部の優先度を高く設定するものであり、
前記優先度設定部は、当該所定の優先度算出アルゴリズムを用いて前記分散処理部それぞれの優先度を設定し、設定した優先度のうち、より高い優先度が設定された分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てること
を特徴とする請求項１に記載の分散同期処理システム。 The adjacent synchronization distribution management unit monitors the step number of the calculation step executed by each distributed processing unit,
The predetermined priority calculation algorithm refers to the step number of the calculation step for each of the distributed processing units in the executable state, and the processing of the distributed processing unit that is delayed compared to the other distributed processing units. To set a high priority,
The priority setting unit sets the priority of each of the distributed processing units by using the predetermined priority calculation algorithm, and among the set priorities, sets the threads of the distributed processing unit to which the higher priority is set. distributed synchronization system according to claim 1, characterized in that assigned to the preferential treatment schedule of the CPU.

前記処理サーバのそれぞれは、
計算対象となる各分散処理部の接続関係として、前記分散処理部と当該分散処理部が算出したデータの出力先となる他の分散処理部とが出力辺で繋がり、前記分散処理部と次の前記計算ステップで必要となるデータの入力元となる他の分散処理部とが入力辺で繋がることを示すグラフトポロジを記憶する記憶部を備えており、
前記分散処理部のそれぞれは、前記入力元となる他の分散処理部からのデータの送信がないため、次の前記計算ステップに移行できず入力待ち状態となっている場合に、当該入力元となる他の分散処理部に対し、入力待ち情報を送信することにより投票し、
前記入力待ち情報を受信した入力元となる他の分散処理部は、受信した前記入力待ち情報の数を優先度投票値として集計し、
前記所定の優先度算出アルゴリズムは、前記入力元となる他の分散処理部のうち、前記優先度投票値の多い分散処理部の優先度を高く設定するものであり、
前記処理サーバそれぞれの前記優先度設定部は、当該所定の優先度算出アルゴリズムを用いて前記分散処理部それぞれの優先度を設定し、設定した優先度のうち、より高い優先度が設定された分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てること
を特徴とする請求項１に記載の分散同期処理システム。 Each of the processing servers
As a connection relation of each distributed processing unit to be calculated, the distributed processing unit and another distributed processing unit that is an output destination of the data calculated by the distributed processing unit are connected by an output side, and the distributed processing unit and the following A storage unit is provided for storing a graph topology indicating that another distributed processing unit serving as an input source of data required in the calculation step is connected by an input side,
Since each of the distributed processing units does not transmit data from the other distributed processing unit serving as the input source, if it cannot enter the next calculation step and is in the input waiting state, Vote by sending input waiting information to other distributed processing units,
The other distributed processing unit that is the input source that has received the input waiting information aggregates the number of the received input waiting information as a priority vote value,
The predetermined priority calculation algorithm is to set a high priority to a distributed processing unit having a large number of the priority vote values among other distributed processing units serving as the input source ,
The priority setting unit of each of the processing servers sets the priority of each of the distributed processing units by using the predetermined priority calculation algorithm, and among the set priorities, the distribution in which the higher priority is set. The distributed synchronous processing system according to claim 1 , wherein threads of the processing unit are preferentially assigned to the processing schedule of the CPU.

前記入力待ち情報を受信した入力元となる他の分散処理部は、自身も前記入力待ち状態である場合に、前記集計した優先度投票値に自身の遅延分を加えた優先度投票値を付した前記入力待ち情報を、前記入力元となる他の分散処理部のさらに入力元であり遅延の原因である他の分散処理部に送信すること
を特徴とする請求項４に記載の分散同期処理システム。 The other distributed processing unit that is the input source that has received the input waiting information attaches a priority voting value obtained by adding its own delay amount to the aggregated priority voting value when it is also in the input waiting state. The distributed synchronization processing according to claim 4 , wherein the input waiting information is transmitted to another distributed processing unit that is an input source and causes a delay of the other distributed processing unit that is the input source. system.

複数の処理サーバと、前記処理サーバ上で動作する複数の分散処理部と、を有する分散同期処理システムであって、
前記分散処理部のそれぞれには、当該分散処理部自身が属する処理サーバのＣＰＵにおける１コアのスケジューリング対象として、個別の実行単位である１スレッドが割り当てられるものであり、
前記処理サーバのそれぞれは、
前記分散処理部による所定の計算ステップにおける計算処理、および、当該分散処理部の接続先となる他の分散処理部との間でのデータ送受信を完了し、当該分散処理部が、次の前記計算ステップについて実行可能な状態を示す実行可能状態であることを検出する隣接同期分散管理部と、
前記実行可能状態となった分散処理部に優先度を付し、前記優先度の高い分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てる優先度設定部と、
計算対象となる各分散処理部の接続関係として、前記分散処理部と当該分散処理部が算出したデータの出力先となる他の分散処理部とが出力辺で繋がり、前記分散処理部と次の前記計算ステップで必要となるデータの入力元となる他の分散処理部とが入力辺で繋がることを示すグラフトポロジを記憶する記憶部とを、備え、
前記優先度設定部は、前記実行可能状態となった分散処理部それぞれについての前記出力辺の数を算出し、前記算出した出力辺の数が多い前記分散処理部のスレッドを優先して前記ＣＰＵの処理スケジュールに割り当てることの指標となる第１の優先度を求め、
前記隣接同期分散管理部は、各前記分散処理部が実行している前記計算ステップのステップ番号を監視しており、
前記優先度設定部は、前記実行可能状態となった分散処理部それぞれについての前記計算ステップのステップ番号を参照し、他の前記分散処理部と比べ処理が遅延している分散処理部のスレッドを優先して前記ＣＰＵの処理スケジュールに割り当てることの指標となる第２の優先度を求め、
前記優先度設定部は、前記第１の優先度および前記第２の優先度の双方を用いた各前記分散処理部の優先度を算出し、前記算出した優先度が高い前記分散処理部のスレッドを優先して前記ＣＰＵの処理スケジュールを割り当てること
を特徴とする分散同期処理システム。 A distributed synchronous processing system comprising: a plurality of processing servers; and a plurality of distributed processing units operating on the processing servers,
One thread, which is an individual execution unit, is assigned to each of the distributed processing units as a scheduling target of one core in the CPU of the processing server to which the distributed processing unit itself belongs,
Each of the processing servers
Completion of calculation processing in a predetermined calculation step by the distributed processing unit and data transmission / reception with another distributed processing unit to which the distributed processing unit is connected, and the distributed processing unit performs the next calculation. An adjacent synchronization distribution management unit that detects an executable state indicating an executable state for a step,
A priority setting unit that assigns a priority to the distributed processing unit in the executable state and preferentially allocates a thread of the distributed processing unit having a high priority to the processing schedule of the CPU;
As a connection relation of each distributed processing unit to be calculated, the distributed processing unit and another distributed processing unit that is an output destination of the data calculated by the distributed processing unit are connected by an output side, and the distributed processing unit and the following A storage unit that stores a graph topology indicating that another distributed processing unit that is an input source of data required in the calculation step is connected by an input side;
The priority setting unit calculates the number of the output sides for each of the distributed processing units in the executable state, and gives priority to the thread of the distributed processing unit having a large number of the calculated output sides, the CPU The first priority, which is an index for assigning to the processing schedule of
The adjacent synchronization distribution management unit monitors the step number of the calculation step executed by each distributed processing unit,
The priority setting unit refers to the step number of the calculation step for each of the distributed processing units in the executable state, and determines the threads of the distributed processing unit whose processing is delayed as compared with the other distributed processing units. The second priority, which is an index for giving priority to the processing schedule of the CPU, is obtained,
The priority setting unit calculates a priority of each of the distributed processing units using both the first priority and the second priority, and a thread of the distributed processing unit having a high calculated priority. Allocating the CPU processing schedule with priority to
Distributed synchronization processing system according to claim.

複数の処理サーバと、前記処理サーバ上で動作する複数の分散処理部と、を有する分散同期処理システムの分散同期処理方法であって、
前記分散処理部のそれぞれには、当該分散処理部自身が属する処理サーバのＣＰＵにおける１コアのスケジューリング対象として、個別の実行単位である１スレッドが割り当てられるものであり、
前記処理サーバのそれぞれは、
前記分散処理部による所定の計算ステップにおける計算処理、および、当該分散処理部の接続先となる他の分散処理部との間でのデータ送受信を完了し、当該分散処理部が、次の前記計算ステップについて実行可能な状態を示す実行可能状態であることを検出する手順と、
前記実行可能状態となった分散処理部それぞれに、当該分散処理部の前記計算処理の遅れにより前記他の分散処理部のＣＰＵを処理待ちにさせる影響が大きいほど、前記計算処理の遅れる分散処理部に高い優先度を設定する所定の優先度算出アルゴリズムを用いて前記優先度を設定し、前記優先度の高い分散処理部のスレッドを優先的に前記ＣＰＵの処理スケジュールに割り当てる手順と、
を実行することを特徴とする分散同期処理方法。 A distributed synchronous processing method for a distributed synchronous processing system, comprising: a plurality of processing servers; and a plurality of distributed processing units operating on the processing servers.
One thread, which is an individual execution unit, is assigned to each of the distributed processing units as a scheduling target of one core in the CPU of the processing server to which the distributed processing unit itself belongs,
Each of the processing servers
Completion of calculation processing in a predetermined calculation step by the distributed processing unit and data transmission / reception with another distributed processing unit to which the distributed processing unit is connected, and the distributed processing unit performs the next calculation. A procedure for detecting that the step is in an executable state, which indicates an executable state,
A distributed processing unit that delays the calculation process as the effect of causing the CPU of the other distributed processing unit to wait for the processing of the distributed processing unit in the executable state is delayed due to the delay of the calculation process of the distributed processing unit. A procedure of setting the priority using a predetermined priority calculation algorithm that sets a high priority to, and preferentially assigning the thread of the distributed processing unit having the high priority to the processing schedule of the CPU,
A distributed synchronous processing method characterized by executing.