JP5672521B2

JP5672521B2 - Computer system and checkpoint restart method thereof

Info

Publication number: JP5672521B2
Application number: JP2010049182A
Authority: JP
Inventors: 葵川原
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2010-03-05
Filing date: 2010-03-05
Publication date: 2015-02-18
Anticipated expiration: 2030-03-05
Also published as: JP2011186606A

Description

本発明は、コンピュータシステム、およびそのチェックポイントリスタート方法に関する。 The present invention relates to a computer system and a checkpoint restart method thereof.

コンピュータシステムにおいて、障害回復やプロセスのマイグレーションのため、ある時点でのプロセス群の状態を保存し、後に、保存した状態からプロセスを再開する方法として、チェックポイント／リスタート機能が知られている。 In a computer system, a checkpoint / restart function is known as a method for saving the state of a process group at a certain point in time for failure recovery and process migration, and then restarting the process from the saved state.

チェックポイント機能は、カーネルレベルで実現するものとして、プロセスで使用するデータ、プログラムテキスト等のユーザレベルの情報と、プロセス管理、ジョブ管理データ等のカーネルレベルの情報とをリスタートファイルとしてまとめて保存する。また、リスタート機能は、リスタートファイルとして保存した時点におけるプロセスの状態をシステムに復元する。このようなチェックポイント／リスタート機能の動作例は、下記特許文献１に記載されている。 The checkpoint function is realized at the kernel level. User level information such as process data and program text, and kernel level information such as process management and job management data are saved together as a restart file. To do. The restart function restores the process state at the time of saving as a restart file to the system. An example of the operation of such a checkpoint / restart function is described in Patent Document 1 below.

しかしながら、特許文献１に記載のチェックポイント／リスタート機能は、ＳＭＰ（ＳｙｍｍｅｔｒｉｃａｌＭｕｌｔｉ−Ｐｒｏｃｅｓｓｉｎｇ）システムでの利用を前提としており、ＮＵＭＡ（Ｎｏｎ−ＵｎｉｆｏｒｍＭｅｍｏｒｙＡｃｃｅｓｓ）システムでのノード構成の復元までは考慮されていない。 However, the checkpoint / restart function described in Patent Document 1 is premised on use in an SMP (Symmetric Multi-Processing) system, and is considered until restoration of a node configuration in a NUMA (Non-Uniform Memory Access) system. It has not been.

ＮＵＭＡシステムとは、１つ以上のＣＰＵとメモリが対になったノードを１つ以上有するアーキテクチャであり、自ノード内のＣＰＵとメモリの通信速度は速く、他のノード上のＣＰＵとメモリへの通信速度は著しく遅いという特徴を有する。そのため、ＮＵＭＡシステムは、ジョブを実行する際、関連するプロセスはできるだけ１つのノード内で処理を行うようにノードを割り当てるノード構成制御を行うことが、高実行効率を実現する上で重要である。 The NUMA system is an architecture having one or more nodes in which one or more CPUs and memory are paired, and the communication speed between the CPU and the memory in the own node is high, and the CPU and memory on other nodes are connected to the memory. The communication speed is remarkably slow. Therefore, in the NUMA system, when executing a job, it is important to implement node configuration control that assigns nodes so that related processes are processed within one node as much as possible.

よって、特許文献１に記載のチェックポイント／リスタート機能を、ＮＵＭＡシステムにおける障害対応等のために使用した場合、上記ノード構成制御などにより、ジョブ生成時とは異なるノード構成でリスタートする可能性があり、プロセスが実行されるノードの識別番号と、プロセスが保持しているノードの識別番号とが異なってしまう。この場合、プロセスは、自ノード以外の他ノードにアクセスすることになるため、システム上の通信速度が低下し、システム性能は著しく低下してしまう。また、複数のＮＵＭＡシステム間で処理を実行するマルチノードシステムでは、処理自体がエラーになる可能性もある。これらの問題を回避するためには、ジョブ生成時と同じノード構成で復元する必要があるが、その場合は、障害時のノード管理やスケジュール管理の柔軟性に欠けてしまうという問題が生じる。 Therefore, when the checkpoint / restart function described in Patent Document 1 is used to deal with a failure in the NUMA system, there is a possibility of restarting with a node configuration different from that at the time of job generation due to the node configuration control described above Therefore, the identification number of the node on which the process is executed differs from the identification number of the node held by the process. In this case, since the process accesses a node other than its own node, the communication speed on the system is lowered, and the system performance is significantly lowered. In a multi-node system that executes processing between a plurality of NUMA systems, the processing itself may cause an error. In order to avoid these problems, it is necessary to restore with the same node configuration as that at the time of job generation. However, in this case, there is a problem that the flexibility of node management and schedule management at the time of failure is lacked.

よって、ＮＵＭＡシステムに適したチェックポイント／リスタート方法の開発が望まれる。 Therefore, it is desired to develop a checkpoint / restart method suitable for the NUMA system.

特開平６−２３０９８１号公報JP-A-6-230981 特開２００２−２８８１４９号公報JP 2002-288149 A

ＮＵＭＡシステムを前提としたチェックポイント／リスタート機能としては、ＳＧＩ／ＩＲＩＸのチェックポイント／リスタート機能が知られている。しかし、該チェックポイント／リスタート機能は、リスタートファイル保存時のノードや、任意のノードにプロセスを復元することは可能であっても、２つ以上のノードを指定してプロセスを割り当てて復元することはできないという問題点があった。 An SGI / IRIX checkpoint / restart function is known as a checkpoint / restart function based on the NUMA system. However, even if the checkpoint / restart function can restore the process to the node at the time of saving the restart file or any node, the process can be assigned by specifying two or more nodes. There was a problem that it was not possible.

また、特許文献２にも、ＮＵＭＡシステムを前提としたチェックポイント／リスタート機能に関する技術が開示されている。特許文献２に記載の技術は、システムコールの前後において、ノード座標変換テーブルを用いて、ジョブの初回起動時（生成時）に割り当てられたノード座標と、リスタート時に再度割り当てるノード座標の変換を行うものである。しかし、該チェックポイント／リスタート機能は、ノード座標変換テーブルがシステム内に１つしかないため、あるジョブがノード座標変換テーブルの操作のために排他制御でロックした場合、他のジョブは実行を待たされてしまい、ジョブの実行効率が落ちるという問題があった。また、システムコール、チェックポイント、リスタートのたびに、ノード座標変換テーブルにエントリがあるか否かを確認する必要があり、ＣＰＵの投機実行が無駄になる可能性もある。 Patent Document 2 also discloses a technique related to a checkpoint / restart function based on a NUMA system. The technique described in Patent Document 2 uses a node coordinate conversion table before and after a system call to convert between the node coordinates assigned at the first startup (generation) of the job and the node coordinates assigned again at the restart. Is what you do. However, since this checkpoint / restart function has only one node coordinate conversion table in the system, when a job is locked by exclusive control for the operation of the node coordinate conversion table, other jobs are executed. There was a problem that the execution efficiency of the job was lowered due to waiting. Further, it is necessary to check whether or not there is an entry in the node coordinate conversion table every time a system call, checkpoint, or restart is performed, and there is a possibility that speculative execution of the CPU is wasted.

したがって、本発明は上記問題点を解決し、ＮＵＭＡシステムにおけるチェックポイント／リスタート機能として、リスタート時に、プロセスの復元を任意の２つ以上のノードを指定して行うことができ、また、他のジョブからの影響を受けずに実行できる新しいコンピュータシステム、およびチェックポイントリスタート方法を提供することを目的とする。 Therefore, the present invention solves the above problems, and as a checkpoint / restart function in the NUMA system, at the time of restart, process restoration can be performed by designating any two or more nodes. It is an object of the present invention to provide a new computer system that can be executed without being affected by a job, and a checkpoint restart method.

本発明によるコンピュータシステムは、少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムであって、ジョブのプロセスごとに割り当て可能な論理ノード番号と、前記ジョブのプロセスを実行する前記ノードに固有の物理ノード番号との対応関係を示すノード番号変換テーブルの情報を含むジョブ管理情報を記憶するジョブ管理情報記憶手段と、前記ジョブのプロセスごとに割り当てられた前記論理ノード番号の情報を含むプロセス管理情報を記憶するプロセス管理情報記憶手段と、チェックポイント要求を受け付けると、実行中のジョブに関する前記ジョブ管理情報、および該ジョブの各プロセスに関する前記プロセス管理情報をそれぞれ前記ジョブ管理情報記憶手段および前記プロセス管理情報記憶手段から取得して、該取得した前記ジョブ管理情報および前記ジョブ管理情報から前記ジョブをリスタートするためのリスタートファイルを作成するリスタートファイル作成手段と、リスタート要求を受け付けると、前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元する復元手段と、前記リスタート要求時において、前記物理ノード番号の更新要求を受け付けると、前記復元した前記ジョブ管理情報に含まれる前記ノード番号変換テーブルを更新する更新手段と、前記復元したプロセス管理情報に含まれる前記ジョブのプロセスごとに割り当てられた前記論理ノード番号に対応する前記物理ノード番号を、前記物理ノード番号の更新要求を受け付けた場合、前記更新手段により更新された前記ノード番号変換テーブルを参照して決定し、前記物理ノード番号の更新要求を受け付けていない場合、前記復元手段により復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを参照して決定する決定手段と、前記決定された前記物理ノード番号を有する前記ノード上で前記ジョブのプロセスを復元するプロセス復元手段と、を備える。 A computer system according to the present invention is a computer system including a plurality of nodes each having at least one or more processors and a memory shared by the at least one or more processors, and can be assigned to each job process. Job management information storage means for storing job management information including information of a node number conversion table indicating a correspondence relationship between a logical node number and a physical node number unique to the node executing the job process; Process management information storage means for storing process management information including information on the logical node number assigned for each process; when receiving a checkpoint request, the job management information relating to a job being executed; and each process of the job Process management information regarding Are respectively obtained from the job management information storage means and the process management information storage means, and a restart file for creating a restart file for restarting the job from the acquired job management information and the job management information Upon receipt of the restart request, the creation means, the restore means for restoring the job management information and the process management information from the restart file, and the physical node number update request at the time of the restart request Updating means for updating the node number conversion table included in the restored job management information; and the physical corresponding to the logical node number assigned to each process of the job included in the restored process management information. The node number is the physical node number When a new request is received, it is determined by referring to the node number conversion table updated by the updating unit. When a physical node number update request is not received, it is included in the job management information restored by the restoring unit. Determining means for referring to the node number conversion table, and process restoring means for restoring the process of the job on the node having the determined physical node number.

本発明によるチェックポイント／リスタート方法は、少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムにおけるチェックポイントリスタート方法であって、ジョブのプロセスごとに割り当て可能な論理ノード番号と、前記ジョブのプロセスを実行する前記ノードに固有の物理ノード番号との対応関係を示すノード番号変換テーブルの情報を含むジョブ管理情報をジョブ管理情報記憶手段に記憶する段階と、前記ジョブのプロセスごとに割り当てられた前記論理ノード番号の情報を含むプロセス管理情報をプロセス管理情報記憶手段に記憶する段階と、チェックポイント要求を受け付けると、実行中のジョブに関する前記ジョブ管理情報、および該ジョブの各プロセスに関する前記プロセス管理情報をそれぞれ前記ジョブ管理情報記憶手段および前記プロセス管理情報記憶手段から取得して、該取得した前記ジョブ管理情報および前記ジョブ管理情報から前記ジョブをリスタートするためのリスタートファイルを作成する段階と、リスタート要求を受け付けると、前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元手段により復元する段階と、前記リスタート要求時において、前記物理ノード番号の更新要求を受け付けると、前記復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを更新手段により更新する段階と、前記復元したプロセス管理情報に含まれる前記ジョブのプロセスごとに割り当てられた前記論理ノード番号に対応する前記物理ノード番号を、前記物理ノード番号の更新要求を受け付けた場合、前記更新手段により更新された前記ノード番号変換テーブルを参照して決定し、前記物理ノード番号の更新要求を受け付けていない場合、前記復元手段により復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを参照して決定する段階と、前記決定された物理ノード番号を有する前記ノード上で前記ジョブのプロセスを復元する段階と、を備える。

A checkpoint / restart method according to the present invention is a checkpoint restart method in a computer system including at least one processor and a plurality of nodes each having a memory shared by the at least one processor. Te, job management job management information including a logical node number that can be assigned to each job in the process, the information of the node number conversion table showing the correspondence between the unique physical node number to the node performing the process of the job Storing in the information storage means, storing process management information including information on the logical node number assigned to each process of the job in the process management information storage means, and executing a checkpoint request, Said job related to other jobs The management information and the process management information related to each process of the job are acquired from the job management information storage means and the process management information storage means, respectively, and the job is obtained from the acquired job management information and the job management information. A step of creating a restart file for restarting; a step of restoring the job management information and the process management information from the restart file by a restoring means upon receipt of the restart request; and the time of the restart request When the physical node number update request is received, the node number conversion table included in the restored job management information is updated by an update unit, and each process of the job included in the restored process management information The logical node assigned to When the physical node number update request is received, the physical node number corresponding to the signal is determined by referring to the node number conversion table updated by the update unit, and the physical node number update request is received. If not, the step of making a decision with reference to the node number conversion table included in the job management information restored by the restoration means, and restoring the job process on the node having the determined physical node number A stage.

以上のように構成された本発明によれば、リスタート処理時に、必要に応じて、ノード番号変更テーブルを更新することで、ジョブのプロセスの復元を任意のノードに対して行うことができる。 According to the present invention configured as described above, it is possible to restore a job process to an arbitrary node by updating the node number change table as necessary during the restart process.

本実施形態のＮＵＭＡシステムのハードウェア構成を例示する図である。It is a figure which illustrates the hardware constitutions of the NUMA system of this embodiment. 本実施形態のチェックポイント／リスタート機能を有するノードの概略構成を例示する図である。It is a figure which illustrates schematic structure of the node which has a checkpoint / restart function of this embodiment. 本実施形態のノード番号変換テーブルの遷移を説明するための図である。It is a figure for demonstrating the transition of the node number conversion table of this embodiment. 本実施形態のチェックポイント／リスタート方法におけるチェックポイントの処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the checkpoint in the checkpoint / restart method of this embodiment. 本実施形態のチェックポイント／リスタート方法におけるリスタートの処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the restart in the checkpoint / restart method of this embodiment. 本実施形態の変形例のノード番号変換テーブルの遷移を説明するための図である。It is a figure for demonstrating the transition of the node number conversion table of the modification of this embodiment.

以下、本発明を実施するための好適な実施形態を、図面を参照しながら説明する。なお、以下の実施形態では、コンピュータシステムとして、１つ以上のノードがインターコネクトにより相互に接続され、二次記憶装置を共有するＮＵＭＡシステムを用いる場合を例にとって説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments for carrying out the invention will be described with reference to the drawings. In the following embodiments, a case where a NUMA system in which one or more nodes are interconnected by an interconnect and share a secondary storage device will be described as an example of a computer system.

図１は、本発明の実施形態のＮＵＭＡシステム（以下、「システム」と称する）１の概略構成を例示する図であり、図２は、本実施形態のチェックポイント／リスタート機能を有するノードの概略構成を例示する図である。 FIG. 1 is a diagram illustrating a schematic configuration of a NUMA system (hereinafter referred to as “system”) 1 according to an embodiment of the present invention, and FIG. 2 illustrates a node having a checkpoint / restart function according to the present embodiment. It is a figure which illustrates schematic structure.

システム１は、図１に示すとおり、複数のノード１００，１１０，・・１５０がインターコネクトにより相互に接続され、二次記憶装置２００を共有する。 As shown in FIG. 1, in the system 1, a plurality of nodes 100, 110,... 150 are interconnected by an interconnect and share a secondary storage device 200.

ノード１００は、１つ以上のＣＰＵと、メインメモリ（たとえば、ＲＯＭやＲＡＭなどのメモリ）とから構成される。ノード１１０，・・・，１５０も、ノード１００と同様に構成される。二次記憶装置２００は、各ノード１００，・・・１５０に共有のデータベースであって、後述するリスタートファイルを記憶する。二次記憶装置２００は、たとえば、ＨＤＤなどの記憶装置を用いることができる。なお、ノード１００自体のハードウェア構成は、原則として、従来のＮＵＭＡシステムで用いられるノードと同様とすることができるので、ここでの詳細な説明は省略する。また、ノードの数は、６つに限られず、適宜変更することができる。 The node 100 includes one or more CPUs and a main memory (for example, a memory such as a ROM or a RAM). The nodes 110,..., 150 are also configured in the same manner as the node 100. The secondary storage device 200 is a database shared by the nodes 100,... 150, and stores a restart file to be described later. As the secondary storage device 200, for example, a storage device such as an HDD can be used. Note that the hardware configuration of the node 100 itself can be basically the same as that of the node used in the conventional NUMA system, and thus detailed description thereof will be omitted. The number of nodes is not limited to six and can be changed as appropriate.

ノード１００は、カーネル部１０１やジョブ部１０２などの実行イメージを格納し、本実施形態のチェックポイント／リスタート機能を有する。 The node 100 stores execution images of the kernel unit 101 and the job unit 102, and has the checkpoint / restart function of this embodiment.

ジョブ部１０２は、１つ以上のプロセス群で構成され、１つ以上のノードに分散して実行される多数のプロセスをまとめて管理する。 The job unit 102 is composed of one or more process groups, and collectively manages a large number of processes distributed and executed on one or more nodes.

カーネル部１０１は、図２に示すように、ジョブ管理情報記憶手段１０、プロセス管理情報記憶手段１１、リスタートファイル作成手段１２、リスタートファイル復元手段１３、更新手段１４、物理ノード番号決定手段１５、プロセス復元手段１６、およびリスタートファイル入出力手段１７を含む。なお、本実施形態では、カーネル部１０１の実行イメージがノード１００に格納されている場合を例として説明するが、カーネル部がどのノードに格納されるかはＯＳの実装に依存するものとする。また、これら各手段は、たとえば、主にＣＰＵがメインメモリに格納されるプログラムを実行し、各ハードウェアを制御することにより、実現することができる。 As illustrated in FIG. 2, the kernel unit 101 includes a job management information storage unit 10, a process management information storage unit 11, a restart file creation unit 12, a restart file restoration unit 13, an update unit 14, and a physical node number determination unit 15. Process restoration means 16 and restart file input / output means 17. In the present embodiment, the case where the execution image of the kernel unit 101 is stored in the node 100 will be described as an example. However, the node in which the kernel unit is stored depends on the implementation of the OS. Each of these means can be realized, for example, by the CPU executing a program stored in the main memory and controlling each hardware.

ジョブ管理情報記憶手段１０は、プロセス群が属するジョブの実行に必要な情報を保持し、論理ノード番号と物理ノード番号との対応関係を表すノード番号変換テーブルを含むジョブ管理情報を記憶する。ノード番号変換テーブルは、ジョブが生成されると作成され、ジョブ管理情報記憶手段１０に記憶される。なお、ノード番号変換テーブルは、複数のジョブごとに作成され、ジョブ管理情報記憶手段１０に記憶することができる。 The job management information storage unit 10 stores information necessary for executing a job to which a process group belongs, and stores job management information including a node number conversion table that represents a correspondence relationship between a logical node number and a physical node number. The node number conversion table is created when a job is generated and stored in the job management information storage unit 10. The node number conversion table is created for each of a plurality of jobs and can be stored in the job management information storage unit 10.

ここで、本実施形態において、論理ノード番号とは、システム１内で利用可能であるノードの識別番号であって、ジョブのプロセスごとに割り当てられる。論理ノード番号は、０から始まり、使用するノード数分、たとえば、ノード数が６であれば、０〜５の番号を用意する。一方、物理ノード番号とは、各ノード１００，・・・１５０のシステム１内で一意かつ固定に割り当てられるノード識別番号である。物理ノード番号は、たとえば、障害等により利用できないノードの場合、欠番となる。 In this embodiment, the logical node number is an identification number of a node that can be used in the system 1 and is assigned for each process of a job. The logical node number starts from 0, and for the number of nodes to be used, for example, if the number of nodes is 6, numbers 0 to 5 are prepared. On the other hand, the physical node number is a node identification number that is uniquely and fixedly assigned in the system 1 of each node 100. The physical node number is a missing number in the case of a node that cannot be used due to a failure, for example.

プロセス管理情報記憶手段１１は、プロセスの実行に必要な情報を保持し、論理ノード番号および物理ノード番号を含むプロセス管理情報を記憶する。プロセス管理情報は、ジョブに属するプロセスの生成時に、ジョブ管理情報記憶手段１０のノード番号変換テーブルを参照して、使用するノードの論理ノード番号と物理ノード番号の情報を得て、プロセス管理情報記憶手段１１に記憶される。 The process management information storage unit 11 stores information necessary for execution of a process, and stores process management information including a logical node number and a physical node number. The process management information refers to the node number conversion table in the job management information storage means 10 when generating a process belonging to a job, and obtains information on the logical node number and physical node number of the node to be used. Stored in the means 11.

リスタートファイル作成手段１２は、チェックポイント要求を受け付けると、実行中のジョブに関するジョブ管理情報、および該ジョブの各プロセスに関するプロセス管理情報をそれぞれジョブ管理情報記憶手段１０およびプロセス管理情報記憶手段１１から取得して、該取得したジョブ管理情報およびプロセス管理情報から、ジョブをリスタートするためのリスタートファイルを作成する。すなわち、リスタートファイル作成手段１２は、チェックポイント機能（実行中のジョブの状態のスナップショットを取る機能）を有し、作成したリスタートファイルを、リスタートファイル入出力手段１７を介して二次記憶装置２００に格納する。また、リスタートファイル作成手段１２は、チェックポイント要求を所定のタイミング（たとえば、定期的に）で受け付けるごとに、前記リスタートファイルを作成することもできる。なお、チェックポイント要求の発行は、ユーザによって発行の回数・タイミングを適宜決めることができる。 Upon receiving the checkpoint request, the restart file creation unit 12 receives job management information related to the job being executed and process management information related to each process of the job from the job management information storage unit 10 and the process management information storage unit 11, respectively. A restart file for restarting the job is created from the acquired job management information and process management information. In other words, the restart file creation means 12 has a checkpoint function (a function for taking a snapshot of the status of the job being executed), and the created restart file is secondarily sent via the restart file input / output means 17. Store in the storage device 200. The restart file creation means 12 can also create the restart file every time a checkpoint request is received at a predetermined timing (for example, periodically). It should be noted that the number and timing of issuance of checkpoint requests can be appropriately determined by the user.

ここで、リスタートファイル復元手段１３、更新手段１４、物理ノード番号決定手段１５、およびプロセス復元手段１６は、本実施形態において、リスタート機能として機能する。リスタート機能は、リスタートファイルから各管理情報、実行イメージを復元し、チェックポイント機能の実行時のジョブの状態から実行を再開する機能である。 Here, the restart file restoring unit 13, the updating unit 14, the physical node number determining unit 15, and the process restoring unit 16 function as a restart function in this embodiment. The restart function is a function that restores each management information and execution image from the restart file, and resumes execution from the job state when the checkpoint function is executed.

リスタートファイル復元手段（復元手段）１３は、リスタート要求を受け付けると、リスタートファイル入出力手段１７を通じて二次記憶装置２００からリスタートファイルを取得し、該取得したリスタートファイルからジョブ管理情報およびプロセス管理情報を復元する。 Upon receiving the restart request, the restart file restoring means (restoring means) 13 acquires the restart file from the secondary storage device 200 through the restart file input / output means 17 and uses the acquired restart file for job management information. And restore process management information.

更新手段１４は、リスタート要求時において、ジョブのプロセスを実行するノードの更新要求を受け付けると、復元したジョブ管理情報に含まれるノード番号変換テーブルを更新する。一例として、復元されたノード番号変換テーブルの更新前のテーブルおよび該テーブルで示す場合の各プロセス群の各ノードへの配置の関係を図３（ａ）に示し、更新後のテーブルおよび該テーブルで示す場合の各プロセス群の各ノードへの配置の関係を図３（ｂ）に示す。 Upon receiving a request for updating a node that executes a job process at the time of a restart request, the updating unit 14 updates the node number conversion table included in the restored job management information. As an example, FIG. 3 (a) shows the relationship between the restored node number conversion table before update and the arrangement of each process group in each node shown in the table. FIG. 3B shows the relationship of arrangement of each process group to each node in the case shown.

物理ノード番号決定手段（決定手段）１５は、復元したプロセス管理情報に含まれるジョブのプロセスごとに割り当てられた論理ノード番号に対応する物理ノード番号を、ノード番号変換テーブルを参照して決定する。物理ノード番号決定手段１５は、上記更新手段１４によってノード番号変換テーブルが更新された場合、更新されたノード番号変換テーブルを参照して決定し、一方、更新手段１４によってノード番号変換テーブルが更新されていない場合、リスタートファイル復元手段１３によって復元したジョブ管理情報に含まれるノード番号変換テーブルを参照して決定する。 The physical node number determination unit (determination unit) 15 determines a physical node number corresponding to the logical node number assigned to each process of the job included in the restored process management information with reference to the node number conversion table. When the node number conversion table is updated by the update unit 14, the physical node number determination unit 15 determines the physical node number by referring to the updated node number conversion table, while the update unit 14 updates the node number conversion table. If not, it is determined with reference to the node number conversion table included in the job management information restored by the restart file restoring means 13.

プロセス復元手段１６は、物理ノード番号決定手段１５によって決定された物理ノード番号を有するノード上でジョブのプロセスを復元する。 The process restoration unit 16 restores the job process on the node having the physical node number determined by the physical node number determination unit 15.

リスタートファイル入出力手段１７は、上述のように、ノード１００の各手段と二次記憶装置２００とでリスタートファイルの入出力を実行する。 As described above, the restart file input / output unit 17 executes restart file input / output between each unit of the node 100 and the secondary storage device 200.

以下、図４，５に示すフローチャートを参照して、システム１を用いて実施される本実施形態のチェックポイント／リスタート方法を説明する。なお、システム１は、ノード１〜６（各物理ノード番号は「１」〜「６」とする）を含んで構成され、ノード１がチェックポイント／リスタート方法を実行する上述した各手段を有しているものとする。また、チェックポイント／リスタート方法を実行する前処理として、ジョブ管理情報記憶手段１０には、ジョブ生成時のノード番号変換テーブルが記憶されており、また、プロセス管理情報記憶手段１１には、実行中のジョブの各プロセスの論理ノード番号および物理ノード番号が記憶されているものとする。 Hereinafter, the checkpoint / restart method of this embodiment implemented using the system 1 will be described with reference to the flowcharts shown in FIGS. The system 1 includes nodes 1 to 6 (each physical node number is “1” to “6”), and the node 1 has the above-described units for executing the checkpoint / restart method. Suppose you are. Further, as pre-processing for executing the checkpoint / restart method, the job management information storage means 10 stores a node number conversion table at the time of job generation, and the process management information storage means 11 executes It is assumed that the logical node number and physical node number of each process of the middle job are stored.

図４を用いて、本実施形態のチェックポイント方法について説明する。なお、図４に示す各処理は、処理内容に矛盾を生じない範囲で任意に順番を変更して又は並列に実行することができる。 The checkpoint method according to this embodiment will be described with reference to FIG. Note that the processes shown in FIG. 4 can be executed in any order or in parallel as long as the process contents do not contradict each other.

まず、ノード１は、チェックポイント要求を受け付ける（ステップＳ１００）。チェックポイント要求は、たとえば、一定時間経過後、定期的に発行される。 First, the node 1 receives a checkpoint request (step S100). For example, the checkpoint request is periodically issued after a predetermined time has elapsed.

チェックポイント要求を受け付けると、ノード１は、ジョブを構成するプロセス群の実行状態を停止にし（ステップＳ１０１）、全プロセスの停止を確認後、ジョブ管理情報記憶手段１０から、ノード番号変換テーブルを含むジョブ管理情報を取得する（ステップＳ１０２）。ノード番号変換テーブルは、たとえば、各プロセスの実行が、図３（ａ）の右図に示すように、ノード２，３，５（物理ノード番号「２」，「３」，「５」）で実行されていた場合、図３（ａ）の左図に示すテーブルとなる。 When receiving the checkpoint request, the node 1 stops the execution state of the process group constituting the job (step S101), and after confirming the stop of all the processes, includes the node number conversion table from the job management information storage unit 10. Job management information is acquired (step S102). In the node number conversion table, for example, the execution of each process is performed at nodes 2, 3, and 5 (physical node numbers “2”, “3”, and “5”) as shown in the right diagram of FIG. If it has been executed, the table shown in the left diagram of FIG.

次いで、ノード１は、全プロセスについて、論理ノード番号を含むプロセス管理情報、およびプロセス実行イメージを取得する（ステップＳ１０３，Ｓ１０４）。 Next, the node 1 acquires process management information including a logical node number and a process execution image for all processes (steps S103 and S104).

次いで、ノード１は、上記取得した各情報と、リスタートに必要な情報をまとめ、リスタートファイルを作成する（ステップＳ１０５）。 Next, the node 1 collects the acquired information and information necessary for restart, and creates a restart file (step S105).

次いで、ノード１は、作成したリスタートファイルを、リスタートファイル入出力手段１７を通じて二次記憶装置２００に出力する（ステップＳ１０６）。 Next, the node 1 outputs the created restart file to the secondary storage device 200 through the restart file input / output unit 17 (step S106).

なお、上記ステップＳ１００〜Ｓ１０６の各処理は、チェックポイント採取手段として、リスタートファイル作成手段１２によって処理され得る。 Note that the processes in steps S100 to S106 can be processed by the restart file creating unit 12 as a checkpoint collecting unit.

次に、図５を用いて、本実施形態のリスタート方法について説明する。なお、図５に示す各処理は、処理内容に矛盾を生じない範囲で任意に順番を変更して又は並列に実行することができる。 Next, the restart method of this embodiment will be described with reference to FIG. Note that the processes shown in FIG. 5 can be executed in any order or in parallel as long as the process contents do not contradict each other.

まず、ノード１は、リスタート要求を受け付ける（ステップＳ２００）。 First, the node 1 receives a restart request (step S200).

リスタート要求を受け付けると、ノード１は、リスタートファイル入出力手段１７を通じて、二次記憶装置２００からリスタートファイルを取得して読み込む（ステップＳ２０１）。 When receiving the restart request, the node 1 acquires and reads the restart file from the secondary storage device 200 through the restart file input / output unit 17 (step S201).

次いで、ノード１は、リスタートファイルに含まれるジョブ管理情報を復元し、ノード番号変換テーブルなどジョブ実行の再開に必要な各情報を復元する（ステップＳ２０２）。 Next, the node 1 restores the job management information included in the restart file, and restores each piece of information necessary for resuming job execution such as a node number conversion table (step S202).

リスタート要求時に、物理ノード番号の更新要求を受け付けた場合（ステップＳ２０３：Ｙｅｓ）、ノード１は、復元したノード番号変換テーブルを更新する（ステップＳ２０４）。たとえば、復元されたノード番号変換テーブルの更新前のテーブルを図３（ａ）に示し、更新後のテーブルおよび該テーブルの時の各プロセス群の各ノードへの配置の関係を図３（ｂ）に示す。図３（ａ），（ｂ）に示すように、論理ノード「０」に対応する物理ノードを「２」→「１」、論理ノード１に対応する物理ノードを「３」→「４」に更新している。なお、物理ノード番号の更新要求を受け付けていない場合（ステップＳ２０３：Ｎｏ）、ノード１は、ノード番号変換テーブルの更新をせずに、ステップＳ２０５の処理に移る。 When a physical node number update request is received at the time of the restart request (step S203: Yes), the node 1 updates the restored node number conversion table (step S204 ). For example, FIG. 3A shows a table before the restoration of the restored node number conversion table, and FIG. 3B shows the relationship between the updated table and the placement of each process group at each node in the table. Shown in As shown in FIGS. 3A and 3B, the physical node corresponding to the logical node “0” is changed from “2” to “1”, and the physical node corresponding to the logical node 1 is changed from “3” to “4”. It has been updated. If a physical node number update request has not been received (step S203: No), the node 1 proceeds to the process of step S205 without updating the node number conversion table.

次いで、ノード１は、リスタートファイルに含まれる全プロセスについて、プロセス管理情報を復元し（ステップＳ２０５）、チェックポイント要求時に利用していた論理ノード番号から対応する物理ノード番号をノード番号変換テーブルから決定し（ステップＳ２０６）、物理ノード番号を持つノード上にプロセスを復元する（ステップＳ２０７）。たとえば、復元した論理ノード番号が「０」のとき、物理ノード番号の更新要求を受け付けていない場合では、図３（ａ）のノード番号変換テーブルを用いるため、プロセスを復元するノードの物理ノード番号は「２」となる。一方、物理ノード番号の更新要求を受け付けた場合では、図３（ｂ）のノード番号変換テーブルを用いるため、プロセスを復元するノードの物理ノード番号は「１」となる。 Next, the node 1 restores the process management information for all processes included in the restart file (step S205), and obtains the corresponding physical node number from the logical node number used at the time of the checkpoint request from the node number conversion table. Determine (step S206) and restore the process on the node having the physical node number (step S207). For example, when the restored logical node number is “0” and the physical node number update request is not accepted, the node number conversion table in FIG. Becomes “2”. On the other hand, when a physical node number update request is received, the node number conversion table shown in FIG. 3B is used, so that the physical node number of the node for restoring the process is “1”.

なお、上記ステップＳ２００〜Ｓ２０２，ステップＳ２０５の各処理は、リスタートファイル復元手段によって処理され、ステップＳ２０３，Ｓ２０４の各処理は、更新手段１４によって処理され、ステップＳ２０６の処理は、物理ノード番号決定手段１５によって処理され、ステップＳ２０７の処理は、プロセス復元手段１６によって処理され得る。 The processes in steps S200 to S202 and step S205 are processed by the restart file restoring unit, the processes in steps S203 and S204 are processed by the updating unit 14, and the process in step S206 is a physical node number determination process. Processed by the means 15 and the process of step S207 can be processed by the process restoring means 16.

以上のように、本実施形態のシステム１およびそのチェックポイント／リスタート方法によれば、プロセスの復元を任意の２つ以上のノードに対して行うことができる。その理由は、リスタート時にノード番号変換テーブルを必要に応じて更新することで、論理ノード番号に対応する物理ノード番号を変更することができるためである。 As described above, according to the system 1 and its checkpoint / restart method of the present embodiment, process restoration can be performed on any two or more nodes. The reason is that the physical node number corresponding to the logical node number can be changed by updating the node number conversion table as necessary at the time of restart.

また、物理的なノード構成を意識することなくジョブを実行することができる。その理由は、ユーザは論理ノード番号を使って処理を行うため、リスタート時の物理ノードの構成変化に影響されないためである。 Also, the job can be executed without being aware of the physical node configuration. This is because the user performs processing using the logical node number and is not affected by the change in the configuration of the physical node at the time of restart.

さらに、リスタート時に、論理ノード番号と物理ノード番号の対応表であるノード番号変換テーブルをジョブごとに用いることで、任意の物理ノード群にプロセスを復元し、他のジョブからの影響を受けずに実行することができる。
＜変形例＞ In addition, at the time of restart, the node number conversion table, which is a correspondence table of logical node numbers and physical node numbers, is used for each job, so that the process can be restored to any physical node group and not affected by other jobs. Can be executed.
<Modification>

以上のように本発明の好適な実施形態について説明したが、本発明は、以上の実施形態に限定されるべきものではなく、特許請求の範囲に表現された思想および範囲を逸脱することなく、種々の変形、追加、および省略が当業者によって可能である。 The preferred embodiments of the present invention have been described above. However, the present invention should not be limited to the above embodiments, and does not depart from the spirit and scope expressed in the claims. Various modifications, additions, and omissions are possible by those skilled in the art.

たとえば、上記本実施形態のシステム１は、定期的にチェックポイントを採取して障害に備えているものとし、障害によってジョブの各プロセスの実行処理の続行が不能になった場合に、障害直前に採取したリスタートファイルを復元してジョブの復旧を実施するようにしてもよい。たとえば、図６（ａ）の右図に示すように、６つのノードを備えるＮＵＭＡシステムにおいて、各ノードに１つずつプロセスが実行されているとすると、この時のノード番号変換テーブルは、図６（ａ）の左図のようになる。ノード２が障害によってプロセスの実行処理の続行が不能になった場合、障害直前に採取したリスタートファイルを復元しジョブの復旧を行う。すなわち、ノード２は使用できないため、リスタート時に更新手段１４によって物理ノード番号を更新する。障害前に論理ノード番号「０」に割り当てられていた物理ノード番号「２」を、物理ノード番号「５」に更新した場合のノード番号変換テーブルを図６（ｂ）の左図に示す。ノード２（物理ノード番号「２」）で動いていたプロセス３の復元は、更新されたノード番号変換テーブルを用いるため、論理ノード番号「０」に対応する物理ノード番号「５」のノード５で実行される。 For example, it is assumed that the system 1 according to the present embodiment periodically collects checkpoints and prepares for a failure. If the failure makes it impossible to continue execution of each process of the job, the system 1 immediately before the failure. The collected restart file may be restored to restore the job. For example, as shown in the right diagram of FIG. 6A, in a NUMA system having six nodes, assuming that one process is executed for each node, the node number conversion table at this time is shown in FIG. It becomes like the left figure of (a). When the node 2 cannot continue the process execution process due to the failure, the restart file collected immediately before the failure is restored to restore the job. That is, since the node 2 cannot be used, the physical node number is updated by the updating unit 14 at the time of restart. The node number conversion table when the physical node number “2” assigned to the logical node number “0” before the failure is updated to the physical node number “5” is shown in the left diagram of FIG. Restoration of the process 3 that was running on the node 2 (physical node number “2”) uses the updated node number conversion table, so that the node 5 of the physical node number “5” corresponding to the logical node number “0” is used. Executed.

また、上記実施形態では、システムを構成する１つのノードにおいて、チェックポイント／リスタート機能を有する各手段が備えられている構成を説明したが、本発明はこれに限られず、たとえば、各ノードに通信可能に接続される別途のノード管理装置を設けて、該ノード管理装置にチェックポイント／リスタート機能を実行させる構成とすることもできる。また、１つのノードに備えられている各手段は、そのいくつかを一纏めにして構成されていてもよいし、一つの手段をさらに複数の手段に分割して構成されていてもよい。 In the above embodiment, the configuration in which each means having the checkpoint / restart function is provided in one node constituting the system is described. However, the present invention is not limited to this, and for example, each node has A separate node management device connected so as to be communicable may be provided, and the node management device may be configured to execute a checkpoint / restart function. Further, each means provided in one node may be configured as a group, or may be configured by dividing one means into a plurality of means.

さらに、上記実施形態では、ＮＵＭＡシステムを例にとって説明したが、少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムにおいて適用することができる。 Furthermore, in the above embodiment, the NUMA system has been described as an example. However, the present invention is applied to a computer system including at least one processor and a plurality of nodes each having a memory shared by the at least one processor. be able to.

上記の実施形態の一部または全部は、以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムであって、ジョブのプロセスごとに割り当て可能な論理ノード番号と、前記ジョブのプロセスを実行する前記ノードに固有の物理ノード番号との対応関係を示すノード番号変換テーブルの情報を含むジョブ管理情報を記憶するジョブ管理情報記憶手段と、前記ジョブのプロセスごとに割り当てられた前記論理ノード番号の情報を含むプロセス管理情報を記憶するプロセス管理情報記憶手段と、チェックポイント要求を受け付けると、実行中のジョブに関する前記ジョブ管理情報、および該ジョブの各プロセスに関する前記プロセス管理情報をそれぞれ前記ジョブ管理情報記憶手段および前記プロセス管理情報記憶手段から取得して、該取得した前記ジョブ管理情報および前記ジョブ管理情報から前記ジョブをリスタートするためのリスタートファイルを作成するリスタートファイル作成手段と、リスタート要求を受け付けると、前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元する復元手段と、前記リスタート要求時において、前記物理ノード番号の更新要求を受け付けると、前記復元した前記ジョブ管理情報に含まれる前記ノード番号変換テーブルを更新する更新手段と、前記復元したプロセス管理情報に含まれる前記ジョブのプロセスごとに割り当てられた前記論理ノード番号に対応する前記物理ノード番号を、前記物理ノード番号の更新要求を受け付けた場合、前記更新手段により更新された前記ノード番号変換テーブルを参照して決定し、前記物理ノード番号の更新要求を受け付けていない場合、前記復元手段により復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを参照して決定する決定手段と、前記決定された前記物理ノード番号を有する前記ノード上で前記ジョブのプロセスを復元するプロセス復元手段と、を備えるコンピュータシステム。 (Supplementary note 1) A computer system comprising a plurality of nodes each having at least one or more processors and a memory shared by the at least one or more processors, and can be assigned a logical node number for each job process A job management information storage unit that stores job management information including information of a node number conversion table indicating a correspondence relationship with a physical node number unique to the node that executes the job process; and for each process of the job Process management information storage means for storing process management information including information on the assigned logical node number, and upon receipt of a checkpoint request, the job management information relating to a job being executed, and the process relating to each process of the job Management information for each job management A restart file creation means for creating a restart file for restarting the job from the obtained job management information and the job management information, acquired from the information storage means and the process management information storage means; When a start request is accepted, a restoration unit that restores the job management information and the process management information from the restart file; and when the physical node number update request is accepted at the time of the restart request, Update means for updating the node number conversion table included in job management information, and the physical node number corresponding to the logical node number assigned to each process of the job included in the restored process management information, When a physical node number update request is received, The node number conversion table determined by referring to the node number conversion table updated by the update unit and not receiving the physical node number update request, the node number conversion table included in the job management information restored by the restoration unit A computer system comprising: determining means for determining by reference; and process restoring means for restoring a process of the job on the node having the determined physical node number.

（付記２）前記リスタートファイル作成手段は、前記チェックポイント要求を所定のタイミングで受け付けるごとに、前記リスタートファイルを作成する付記１に記載のコンピュータシステム。 (Supplementary note 2) The computer system according to supplementary note 1, wherein the restart file creation unit creates the restart file every time the checkpoint request is received at a predetermined timing.

（付記３）前記複数のノードのうち少なくとも１つのノードにおいて障害が発生した場合、前記復元手段は、前記リスタートファイル作成手段により作成された最新の前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元し、前記更新手段は、前記復元したジョブ管理情報に含まれる前記ノード番号変換テーブルにおいて、前記障害が発生したノードの物理ノード番号を他の物理ノード番号に更新する、請求項２に記載のコンピュータシステム。 (Supplementary Note 3) When a failure occurs in at least one of the plurality of nodes, the restoration unit uses the latest restart file created by the restart file creation unit, the job management information, and the The process management information is restored, and the updating unit updates a physical node number of the node in which the failure has occurred to another physical node number in the node number conversion table included in the restored job management information. 3. The computer system according to 2.

（付記４）少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムにおけるチェックポイントリスタート方法であって、ジョブのプロセスごとに割り当て可能な論理ノード番号と、前記ジョブのプロセスを実行する前記ノードに固有の物理ノード番号との対応関係を示すノード番号変換テーブルの情報を含むジョブ管理情報を記憶する段階と、前記ジョブのプロセスごとに割り当てられた前記論理ノード番号の情報を含むプロセス管理情報を記憶する段階と、チェックポイント要求を受け付けると、実行中のジョブに関する前記ジョブ管理情報、および該ジョブの各プロセスに関する前記プロセス管理情報をそれぞれ前記ジョブ管理情報記憶手段および前記プロセス管理情報記憶手段から取得して、該取得した前記ジョブ管理情報および前記ジョブ管理情報から前記ジョブをリスタートするためのリスタートファイルを作成する段階と、リスタート要求を受け付けると、前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元する段階と、前記リスタート要求時において、前記物理ノード番号の更新要求を受け付けると、前記復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを更新する段階と、前記復元したプロセス管理情報に含まれる前記ジョブのプロセスごとに割り当てられた前記論理ノード番号に対応する前記物理ノード番号を、前記物理ノード番号の更新要求を受け付けた場合、前記更新手段により更新された前記ノード番号変換テーブルを参照して決定し、前記物理ノード番号の更新要求を受け付けていない場合、前記復元手段により復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを参照して決定する段階と、前記決定された物理ノード番号を有する前記ノード上で前記ジョブのプロセスを復元する段階と、を備えるチェックポイントリスタート方法。 (Supplementary Note 4) A checkpoint restart method in a computer system including at least one processor and a plurality of nodes each having a memory shared by the at least one processor, for each job process Storing job management information including information of a node number conversion table indicating a correspondence relationship between an assignable logical node number and a physical node number unique to the node executing the job process; and the job process Storing process management information including information on the logical node number assigned to each job, and receiving a checkpoint request, the job management information relating to a job being executed, and the process management information relating to each process of the job Each said job management And a step of creating a restart file for restarting the job from the acquired job management information and the job management information, and receiving a restart request. And the step of restoring the job management information and the process management information from the restart file, and when the physical node number update request is received at the time of the restart request, the job management information is included in the restored job management information A step of updating the node number conversion table, the physical node number corresponding to the logical node number assigned to each process of the job included in the restored process management information, and an update request for the physical node number. If accepted, the node updated by the updating means Determining with reference to a node number conversion table, and when not receiving an update request for the physical node number, determining with reference to the node number conversion table included in the job management information restored by the restoration unit; Restoring the process of the job on the node having the determined physical node number.

１ＮＵＭＡシステム、
１０プロセス管理情報、
１１ジョブ管理情報、
１２リスタートファイル作成手段、
１３リスタートファイル復元手段、
１４更新手段、
１５物理ノード番号決定手段、
１６プロセス復元手段、
１７リスタートファイル入出力手段、
１００ノード
１０１カーネル部、
１０２ジョブ部、
２００二次記憶装置。 1 NUMA system,
10 Process management information,
11 Job management information,
12 Restart file creation means,
13 Restart file restoration means,
14 Update means,
15 Physical node number determination means,
16 process restoration means,
17 Restart file input / output means,
100 node 101 kernel part,
102 Job part,
200 Secondary storage device.

Claims

少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムであって、
前記ノードのいずれかが、又は、前記ノードに通信可能に接続されるノード管理装置を前記コンピュータシステムが含む場合において当該ノード管理装置が、
ジョブのプロセスごとに割り当て可能な論理ノード番号と、前記ジョブのプロセスを実行する前記ノードに固有の物理ノード番号との対応関係を示すノード番号変換テーブルの情報を含むジョブ管理情報を記憶するジョブ管理情報記憶手段と、
前記ジョブのプロセスごとに割り当てられた前記論理ノード番号の情報を含むプロセス管理情報を記憶するプロセス管理情報記憶手段と、
チェックポイント要求を受け付けると、実行中のジョブに関する前記ジョブ管理情報、および該ジョブの各プロセスに関する前記プロセス管理情報をそれぞれ前記ジョブ管理情報記憶手段および前記プロセス管理情報記憶手段から取得して、該取得した前記ジョブ管理情報および前記ジョブ管理情報から前記ジョブをリスタートするためのリスタートファイルを作成するリスタートファイル作成手段と、
リスタート要求を受け付けると、前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元する復元手段と、
前記リスタート要求時において、前記物理ノード番号の更新要求を受け付けると、前記復元した前記ジョブ管理情報に含まれる前記ノード番号変換テーブルを更新する更新手段と、
前記復元したプロセス管理情報に含まれる前記ジョブのプロセスごとに割り当てられた前記論理ノード番号に対応する前記物理ノード番号を、前記物理ノード番号の更新要求を受け付けた場合、前記更新手段により更新された前記ノード番号変換テーブルを参照して決定し、前記物理ノード番号の更新要求を受け付けていない場合、前記復元手段により復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを参照して決定する決定手段と、
前記決定された前記物理ノード番号を有する前記ノード上で前記ジョブのプロセスを復元するプロセス復元手段と、
を備えるコンピュータシステム。 A computer system comprising a plurality of nodes each having at least one or more processors and a memory shared by the at least one or more processors,
When the computer system includes a node management device that is communicably connected to any of the nodes or the node, the node management device is
Job management storing job management information including information of a node number conversion table indicating a correspondence relationship between a logical node number that can be assigned for each job process and a physical node number unique to the node that executes the job process Information storage means;
Process management information storage means for storing process management information including information on the logical node number assigned to each process of the job;
Upon receipt of the checkpoint request, the job management information relating to the job being executed and the process management information relating to each process of the job are obtained from the job management information storage means and the process management information storage means, respectively. Restart file creation means for creating a restart file for restarting the job from the job management information and the job management information;
When a restart request is received, a restoration unit that restores the job management information and the process management information from the restart file;
An update unit that updates the node number conversion table included in the restored job management information when receiving an update request for the physical node number at the time of the restart request;
The physical node number corresponding to the logical node number assigned to each process of the job included in the restored process management information is updated by the updating unit when the physical node number update request is received Determining means for making a decision with reference to the node number conversion table and for making a decision with reference to the node number conversion table included in the job management information restored by the restoration means when the physical node number update request is not accepted When,
Process restoring means for restoring the process of the job on the node having the determined physical node number;
A computer system comprising:

前記リスタートファイル作成手段は、前記チェックポイント要求を所定のタイミングで受け付けるごとに、前記リスタートファイルを作成する請求項１に記載のコンピュータシステム。 The computer system according to claim 1, wherein the restart file creation unit creates the restart file every time the checkpoint request is received at a predetermined timing.

前記複数のノードのうち少なくとも１つのノードにおいて障害が発生した場合、
前記復元手段は、前記リスタートファイル作成手段により作成された最新の前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を復元し、
前記更新手段は、前記復元したジョブ管理情報に含まれる前記ノード番号変換テーブルにおいて、前記障害が発生したノードの物理ノード番号を他の物理ノード番号に更新する、請求項２に記載のコンピュータシステム。 When a failure occurs in at least one of the plurality of nodes,
The restoration means restores the job management information and the process management information from the latest restart file created by the restart file creation means,
The computer system according to claim 2, wherein the update unit updates a physical node number of the node in which the failure has occurred to another physical node number in the node number conversion table included in the restored job management information.

少なくとも１以上のプロセッサ、および該少なくとも１以上のプロセッサが共有するメモリをそれぞれ有する複数のノードを含んで構成されるコンピュータシステムにおけるチェックポイントリスタート方法であって、
前記ノードのいずれかが、又は、前記ノードに通信可能に接続されるノード管理装置を前記コンピュータシステムが含む場合において当該ノード管理装置が、ジョブ管理情報記憶手段、プロセス管理情報記憶手段、リスタートファイル作成手段、復元手段、更新手段、決定手段、プロセス復元手段を備えており、
ジョブのプロセスごとに割り当て可能な論理ノード番号と、前記ジョブのプロセスを実行する前記ノードに固有の物理ノード番号との対応関係を示すノード番号変換テーブルの情報を含むジョブ管理情報前記をジョブ管理情報記憶手段に記憶する段階と、
前記ジョブのプロセスごとに割り当てられた前記論理ノード番号の情報を含むプロセス管理情報を前記プロセス管理情報記憶手段に記憶する段階と、
チェックポイント要求を受け付けると、実行中のジョブに関する前記ジョブ管理情報、および該ジョブの各プロセスに関する前記プロセス管理情報をそれぞれ前記ジョブ管理情報記憶手段および前記プロセス管理情報記憶手段から取得して、該取得した前記ジョブ管理情報および前記ジョブ管理情報から前記ジョブをリスタートするためのリスタートファイルを前記リスタートファイル作成手段により作成する段階と、
リスタート要求を受け付けると、前記リスタートファイルから前記ジョブ管理情報、および前記プロセス管理情報を前記復元手段により復元する段階と、
前記リスタート要求時において、前記物理ノード番号の更新要求を受け付けると、前記復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを前記更新手段により更新する段階と、
前記復元したプロセス管理情報に含まれる前記ジョブのプロセスごとに割り当てられた前記論理ノード番号に対応する前記物理ノード番号を、前記物理ノード番号の更新要求を受け付けた場合、前記更新手段により更新された前記ノード番号変換テーブルを参照して前記決定手段により決定し、前記物理ノード番号の更新要求を受け付けていない場合、前記復元手段により復元したジョブ管理情報に含まれる前記ノード番号変換テーブルを参照して前記決定手段により決定する段階と、
前記決定された物理ノード番号を有する前記ノード上で前記ジョブのプロセスを前記プロセス復元手段により復元する段階と、
を備えるチェックポイントリスタート方法。 A checkpoint restart method in a computer system comprising at least one or more processors and a plurality of nodes each having a memory shared by the at least one or more processors,
When any one of the nodes or a node management apparatus that is communicably connected to the node is included in the computer system, the node management apparatus includes a job management information storage unit, a process management information storage unit, and a restart file. A creation means, a restoration means, an update means, a determination means, and a process restoration means are provided,
Jobs and logical node numbers can be assigned to each process, job management information the job management information including information of the node number conversion table showing the correspondence between the unique physical node number to the node performing the process of the job Storing in the storage means;
And storing the process management information including information of the logical node number assigned to each process of the job to the process management information storage means,
Upon receipt of the checkpoint request, the job management information relating to the job being executed and the process management information relating to each process of the job are obtained from the job management information storage means and the process management information storage means, respectively. Creating a restart file for restarting the job from the job management information and the job management information by the restart file creating means ;
When receiving a restart request, the steps of restoring the said job management information from restart files, and the restoring means the process management information,
During the restart request, when receiving a request for updating the physical node number, the steps of updating the node number conversion table included in the restored job management information by the updating means,
The physical node number corresponding to the logical node number assigned to each process of the job included in the restored process management information is updated by the updating unit when the physical node number update request is received When the node number conversion table is determined by the determination unit and the update request for the physical node number is not accepted, the node number conversion table included in the job management information restored by the restoration unit is referred to. Determining by the determining means ;
Restoring the process of the job by the process restoration means on the node having the determined physical node number;
A checkpoint restart method comprising: