JPH09212467A

JPH09212467A - Load decentralization control system

Info

Publication number: JPH09212467A
Application number: JP8013522A
Authority: JP
Inventors: Masanori Ito; 雅典伊藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1996-01-30
Filing date: 1996-01-30
Publication date: 1997-08-15

Abstract

PROBLEM TO BE SOLVED: To improve the throughput of a parallel computer system by making CPU capability and the number of CPUs different, computer by computer, automatically selecting a parallel computer including a virtual computer which executes a batch job and distributing jobs, and decentralizing loads. SOLUTION: A job distribution destination determination part 9 of an operating system 5 distributes a job to a distribution destination determined by a job distribution destination determination part 9, which determines the computer 2 having the highest redundant capability as the job distribution destination. A slave computer is so informed by a job distribution destination information part 13 as to distribute the job to the computer 2 at the determined distribution destination. Further, information on jobs fed to other slave computers is received by a job information reception part 16. Consequently, the throughput of the parallel computer system including the virtual computer can be improved.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、並列型計算機の各
計算機に負荷を分散する負荷分散制御システムに関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a load balancing control system for balancing loads among computers of a parallel type computer.

【０００２】近年の計算機システムにおいて、業務処理
容量の増大、処理速度の高速化、連続運用の必要性、信
頼性の向上が要求されている。このため、仮想計算機を
混在でき、システムの運用中に動的に稼働計算機の数を
増減できる並列型計算機システムが提供されている。こ
の並列型計算機システムでは、ＣＰＵ能力の異なる計算
機とＣＰＵ能力が可変の計算機が混在し、計算機によっ
てＣＰＵ負荷状態が異なり、かつ並列計算機システムで
稼働する計算機の数が動的に変動するため、並列型計算
機システム全体の処理容量と処理速度を高めるためには
各計算機のＣＰＵ負荷を平準化する必要がある。しか
し、ユーザが各計算機のＣＰＵ負荷を平準化するのは困
難であるために、システムが自動的にバッチジョブを実
行する計算機を決定することが望まれている。In recent computer systems, there is a demand for increased business processing capacity, increased processing speed, necessity for continuous operation, and improved reliability. Therefore, there is provided a parallel computer system in which virtual computers can be mixed and the number of operating computers can be dynamically increased or decreased during system operation. In this parallel computer system, computers having different CPU capacities and computers having variable CPU capacities coexist, the CPU load states differ from computer to computer, and the number of computers operating in the parallel computer system dynamically fluctuates. In order to increase the processing capacity and processing speed of the entire computer system, it is necessary to equalize the CPU load of each computer. However, since it is difficult for the user to level the CPU load of each computer, it is desired that the system automatically determine the computer to execute the batch job.

【０００３】[0003]

【従来の技術】従来の並列型計算機システムは、図９に
示すように、同じ能力のＣＰＵを同じ台数備えた計算機
を並列に接続する。この並列型計算機システムにおける
負荷分散を行う手法は、ＯＳが自計算機のＣＰＵ使用率
を求めて他の全ての計算機に送信して知らせる。そし
て、ジョブがある計算機に投入された場合、最もＣＰＵ
使用率の低い計算機が余剰ＣＰＵ能力が大きいとみなし
てジョブを配送し、その計算機にジョブの実行をさせる
（自計算機の余剰ＣＰＵ能力が大きいときは自計算機が
ジョブを実行する）。2. Description of the Related Art In a conventional parallel computer system, as shown in FIG. 9, computers having the same number of CPUs having the same capacity are connected in parallel. In the method of load balancing in this parallel computer system, the OS obtains the CPU usage rate of its own computer and sends it to all other computers to notify it. When the job is submitted to a computer, the CPU
A computer having a low usage rate considers that the surplus CPU capacity is large, delivers the job, and causes the computer to execute the job (when the surplus CPU capacity of the self computer is large, the self computer executes the job).

【０００４】[0004]

【発明が解決しようとする課題】上述した並列型計算機
システムでは、並列型計算機システムを構成する計算機
ごとに１台あたりのＣＰＵ能力が違う場合や、ＣＰＵご
との能力が同じでも台数が違う場合には、ＣＰＵ使用率
が同じでもあっても余剰ＣＰＵ能力が同じとは限らない
から、ＣＰＵ使用率の最も低い計算機にジョブを配送し
ても、ジョブを最も高速に実行できるとはならない問題
が発生する。In the above-mentioned parallel computer system, when the CPU capacity per computer is different for each computer constituting the parallel computer system, or when the CPU capacity is the same but the number is different. Has the same surplus CPU power even if the CPU usage rate is the same. Therefore, even if the job is delivered to the computer with the lowest CPU usage rate, the job cannot be executed at the highest speed. To do.

【０００５】また、並列型計算機システムには仮想計算
機が混在できるが、仮想計算機には通常、次の３種類の
動作モードがある。・ＡＵＴＯモード：同じ計算機上で走行する他の仮想計
算機が要求するＣＰＵ能力と競合しない限り必要なだけ
可変にＣＰＵ能力を使用できるモードである。Although virtual computers can coexist in a parallel computer system, virtual computers usually have the following three types of operation modes. -AUTO mode: A mode in which the CPU power can be variably used as needed unless it competes with the CPU power required by another virtual computer running on the same computer.

【０００６】・上限ＡＵＴＯモード：ＡＵＴＯモードの
仮想計算機と同じ計算機上で共存可能で、決められた比
率（ＣＰＵ配分比）しかＣＰＵ能力を使用できないモー
ドである。Upper limit AUTO mode: A mode in which the virtual machine in the AUTO mode can coexist on the same computer and the CPU capacity can be used only by a predetermined ratio (CPU distribution ratio).

【０００７】・ロジカルモード：ＡＵＴＯモードや上限
ＡＵＴＯモードとは同じ計算機上で共存できないが、１
台の計算機のＣＰＵ能力を任意の固定比率（ＣＰＵ配分
比）に分割して使用するモードである。Logical mode: coexistence with AUTO mode and upper limit AUTO mode on the same computer, but 1
In this mode, the CPU power of each computer is divided into an arbitrary fixed ratio (CPU distribution ratio) for use.

【０００８】これらの仮想計算機システムはそれぞれ独
立に、その上で走行するソフトウェアから認識できるＣ
ＰＵ（論理ＣＰＵ）の数を定義することができる。この
ため、上限ＡＵＴＯモードとロジカルモードの仮想計算
機は、実ＣＰＵの能力のうち配分比だけのＣＰＵ能力を
持つ計算機と見なせるから、上記した問題が発生する。
また、仮想計算機システムは、動的にＣＰＵ配分比を変
更でき、上限ＡＵＴＯモードとロジカルモードの仮想計
算機において、異なる時刻に同じＣＰＵ使用率であって
も、余剰ＣＰＵ能力が同じとは限らないという問題も発
生する。Each of these virtual computer systems can be independently recognized by software running on it.
The number of PUs (logical CPUs) can be defined. For this reason, the virtual machines in the upper limit AUTO mode and the logical mode can be regarded as a computer having a CPU capacity corresponding to the distribution ratio among the capabilities of the real CPUs, so that the above-mentioned problem occurs.
In addition, the virtual computer system can dynamically change the CPU distribution ratio, and even in the virtual machines of the upper limit AUTO mode and the logical mode, even if the CPU usage rate is the same at different times, the surplus CPU power is not always the same. Problems also occur.

【０００９】また、計算機の実記憶負荷が高く、仮想計
算機が過剰に動作し、外部ページと実ページで過剰に交
換が行われている場合には、ページングＩ／Ｏ待ちが頻
発してＣＰＵ使用率が低くなることがあり、この場合
に、新たにジョブを動作させると、実記憶負荷が更に高
くなり、ページングＩ／Ｏ待ちがより頻繁に発生し、Ｃ
ＰＵ使用率が更に低くなって仮想計算機の処理効率が低
下してしまう問題が発生する（この場合には、従来の負
荷分散の手法では、ジョブを配送するのは逆効果となっ
てしまう点で問題である）。Further, when the real memory load of the computer is high, the virtual computer is excessively operated, and the external page and the real page are excessively exchanged, paging I / O waiting frequently occurs and the CPU is used. If the job is newly started in this case, the real storage load is further increased, the paging I / O wait occurs more frequently, and the C
There is a problem that the PU usage rate becomes even lower and the processing efficiency of the virtual machine decreases (in this case, the conventional load balancing method would have the disadvantage that delivering the job would have an adverse effect. Is a problem).

【００１０】本発明は、これらの問題を解決するため、
計算機ごとにＣＰＵ能力および台数が異なりしかも仮想
計算機が混在する並列型計算機システムであってもバッ
チジョブを実行する計算機を自動的に選択してジョブを
配送し負荷分散を図り、並列型計算機システムの処理効
率を高めることを目的としている。The present invention solves these problems.
Even in a parallel computer system in which the CPU capacity and the number of computers are different for each computer and virtual computers coexist, a computer that executes a batch job is automatically selected, the job is distributed, and the load is distributed. The purpose is to improve processing efficiency.

【００１１】[0011]

【課題を解決するための手段】図１を参照して課題を解
決するための手段を説明する。図１において、並列型計
算機システム１は、複数の計算機２および仮想計算機２
を相互に接続して並列処理を行うためのものである。Means for solving the problem will be described with reference to FIG. In FIG. 1, a parallel computer system 1 includes a plurality of computers 2 and a virtual computer 2.
Are connected to each other for parallel processing.

【００１２】計算機２および仮想計算機２は、ジョブを
実行するものである。仮想計算機２は、計算機（実計算
機）２に任意の個数を動的に設けることができる。ここ
で、計算機（マスタ）２は、負荷情報収集手段２１、余
剰能力評価手段２２などから構成されるものである。The computer 2 and the virtual computer 2 execute jobs. An arbitrary number of virtual computers 2 can be dynamically provided in the computer (real computer) 2. Here, the computer (master) 2 is composed of a load information collection means 21, a surplus capacity evaluation means 22, and the like.

【００１３】負荷情報収集手段２１は、計算機（スレー
ブ）２から負荷情報などを収集するものである。余剰能
力評価手段２２は、各計算機から収集した負荷情報をも
とに当該計算機の余剰能力を評価して算出するものであ
る。The load information collecting means 21 collects load information and the like from the computer (slave) 2. The surplus capacity evaluation unit 22 evaluates and calculates the surplus capacity of the computer based on the load information collected from each computer.

【００１４】次に、動作を説明する。計算機（マスタ）
２の負荷情報収集手段２１が各計算機から負荷情報を収
集し、余剰能力評価手段２２がこの収集した負荷情報を
もとに余剰能力を評価して算出し、いずれかの計算機２
にジョブが投入されたときに当該ジョブ情報の通知を受
けた計算機（マスタ）２が評価した余剰能力が最も高い
計算機２を選択し、ジョブを受け付けた計算機２が最も
余剰能力が高いときはその計算機２にジョブを実行さ
せ、一方、ジョブを受け付けた計算機以外の他の計算機
２が最も余剰能力の高いときはそのジョブを受け付けた
計算機２にジョブを転送させ実行させるようにしてい
る。Next, the operation will be described. Calculator (master)
The load information collecting unit 21 of No. 2 collects the load information from each computer, and the surplus capacity evaluating unit 22 evaluates and calculates the surplus capacity based on the collected load information.
When a job is submitted to a computer, the computer 2 having the highest surplus capacity evaluated by the computer (master) 2 notified of the job information is selected, and when the computer 2 receiving the job has the highest surplus capacity, The computer 2 is made to execute the job, and when the computer 2 other than the computer that has received the job has the highest surplus capacity, the computer 2 that has received the job is made to transfer and execute the job.

【００１５】この際、実記憶に対する負荷が所定負荷よ
りも高くて過負荷と判明したときにその計算機２を除外
して他の計算機２について余剰能力を評価して算出し、
最も余剰能力の高い計算機にジョブを実行させるように
している。At this time, when the load on the real memory is higher than the predetermined load and it is determined that the load is overloaded, the computer 2 is excluded, and the surplus capacity is evaluated and calculated for the other computers 2,
The computer with the highest surplus capacity is made to execute the job.

【００１６】また、余剰能力として、各計算機２のＣＰ
Ｕ処理時間＋ＣＰＵ待ち時間、あるいはＣＰＵ処理時間
＋ＣＰＵ待ち時間＋Ｉ／Ｏ処理時間＋Ｉ／Ｏ待ち時間と
して算出するようにしている。As the surplus capacity, the CP of each computer 2
The calculation is performed as U processing time + CPU waiting time or CPU processing time + CPU waiting time + I / O processing time + I / O waiting time.

【００１７】従って、計算機ごとにＣＰＵ能力および台
数が異なり、しかも仮想計算機が混在する並列型計算機
システムであってもバッチジョブを実行する計算機（仮
想計算機を含む）２を自動的に選択してジョブを配送し
負荷分散を図ることにより、並列型計算機システムの処
理効率を高めることが可能となる。Therefore, even in a parallel computer system in which the CPU capacity and the number of computers are different for each computer and virtual computers coexist, a computer (including a virtual computer) 2 that executes a batch job is automatically selected and a job is executed. By distributing and distributing the load, the processing efficiency of the parallel computer system can be improved.

【００１８】[0018]

【発明の実施の形態】次に、図１から図８を用いて本発
明の実施の形態および動作を順次詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Next, an embodiment and an operation of the present invention will be sequentially described in detail with reference to FIGS.

【００１９】図１は、本発明のシステム構成図を示す。
図１において、負荷情報制御テーブル２３は、計算機２
から収集した負荷情報を記憶したり、これら記憶した負
荷情報からジョブを実行させる計算機２を選択する時点
で余剰能力としてエラップス期待値を計算して設定する
ものである（後述する図５参照）。ここで、エラップス
期待値の最も小さい計算機２にジョブを配送するように
する。FIG. 1 shows a system configuration diagram of the present invention.
In FIG. 1, the load information control table 23 is the computer 2
The load information collected from the above is stored, or the expected Elaps value is calculated and set as the surplus capacity at the time of selecting the computer 2 that executes the job from the stored load information (see FIG. 5 described later). Here, the job is delivered to the computer 2 having the smallest Elaps expected value.

【００２０】次に、図２を用いて計算機（マスタ）２お
よび図３を用いて計算機（スレーブ）２の構成を順次詳
細に説明する。図２は、本発明の計算機（マスタ）例を
示す。これは、図１の計算機（マスタ）２の詳細構成図
であって、図１の負荷情報収集手段２１は図２のＣＰＵ
負荷情報収集部１０に対応し、図１の余剰能力評価手段
２２は図２のジョブ配送先決定部９の一部に含まれるも
のである。Next, the configurations of the computer (master) 2 and the computer (slave) 2 will be sequentially described in detail with reference to FIG. 2 and FIG. FIG. 2 shows an example of a computer (master) of the present invention. This is a detailed configuration diagram of the computer (master) 2 of FIG. 1, and the load information collecting means 21 of FIG. 1 is the CPU of FIG.
Corresponding to the load information collection unit 10, the surplus capacity evaluation unit 22 of FIG. 1 is included in a part of the job delivery destination determination unit 9 of FIG.

【００２１】図２において、ＯＳ５は、オペレーティン
グシステムであって、全体を統括制御するものであり、
ジョブ実行部７、ジョブ配送部８、ジョブ配送先決定部
９、ＣＰＵ負荷情報収集部１０、ＣＰＵ負荷情報受信部
１２、ジョブ配送先通知部１３、ジョブ受付部１４、ジ
ョブ情報受信部１６などから構成されるものである。In FIG. 2, an OS 5 is an operating system that controls the entire system,
From the job execution unit 7, the job delivery unit 8, the job delivery destination determination unit 9, the CPU load information collection unit 10, the CPU load information reception unit 12, the job delivery destination notification unit 13, the job reception unit 14, the job information reception unit 16, and the like. It is composed.

【００２２】ジョブ配送部８は、ジョブ配送先決定部９
によって決定された配送先にジョブを配送するものであ
る。ジョブ配送先決定部９は、余剰能力の最も高い（エ
ラップス値の最も小さい）計算機をジョブ配送先と決定
するものである。The job delivery section 8 includes a job delivery destination determining section 9
The job is delivered to the delivery destination determined by. The job delivery destination determination unit 9 determines the computer having the highest surplus capacity (the smallest Elaps value) as the job delivery destination.

【００２３】ＣＰＵ負荷情報収集部１０は、計算機２の
負荷情報を収集するものである。ＣＰＵ負荷情報受信部
１２は、スレーブ計算機２よりＣＰＵ負荷情報を受信す
るものである。The CPU load information collecting unit 10 collects load information of the computer 2. The CPU load information receiving unit 12 receives the CPU load information from the slave computer 2.

【００２４】ジョブ配送先通知部１３は、ジョブ配送先
決定部９によって決定された配送先の計算機２にジョブ
を配送するようにスレーブ計算機に通知するものであ
る。ジョブ受付部１４は、投入されたジョブを受け付け
るものである。The job delivery destination notification unit 13 notifies the slave computer to deliver the job to the delivery destination computer 2 determined by the job delivery destination determination unit 9. The job receiving unit 14 receives the input job.

【００２５】ジョブ情報受信部１６は、スレーブ計算機
に投入されたジョブの情報を受信するものである。図３
は、本発明の計算機（スレーブ）例を示す。これは、図
１の計算機（マスタ）２以外のスレーブの計算機２の詳
細構成図であある。The job information receiving unit 16 receives information on jobs input to the slave computer. FIG.
Shows an example of a computer (slave) of the present invention. This is a detailed configuration diagram of a slave computer 2 other than the computer (master) 2 in FIG. 1.

【００２６】図３において、ＯＳ５は、オペレーティン
グシステムであって、全体を統括制御するものであり、
ジョブ実行部７、ジョブ配送部８、ＣＰＵ負荷情報収集
部１０、ジョブ受付部１４、ジョブ配送先受信部１６、
ジョブ情報通知部１７などから構成されるものである。
７、８、１０、１４は図２と同一であるので説明を省略
する。In FIG. 3, OS 5 is an operating system, which controls the entire system,
A job execution unit 7, a job delivery unit 8, a CPU load information collection unit 10, a job reception unit 14, a job delivery destination reception unit 16,
The job information notification unit 17 and the like are provided.
Since 7, 8, 10, and 14 are the same as those in FIG. 2, description thereof will be omitted.

【００２７】図３において、ＣＰＵ負荷情報通知部１１
は、ＣＰＵ負荷情報などをマスタ計算機に通知するもの
である。ジョブ配送先受信部１５は、ジョブの配送先を
マスタ計算機から受信するものである。In FIG. 3, the CPU load information notifying unit 11
Is for notifying CPU load information and the like to the master computer. The job delivery destination receiving unit 15 receives the delivery destination of the job from the master computer.

【００２８】ジョブ情報通知部１７は、投入されたジョ
ブのジョブ情報をマスタ計算機へ通知するものである。
以下図４ないし図８を用いて図１ないし図３の構成の動
作を順次詳細に説明する。The job information notifying section 17 notifies the master computer of the job information of the input job.
Hereinafter, the operation of the configuration of FIGS. 1 to 3 will be sequentially described in detail with reference to FIGS. 4 to 8.

【００２９】図４は、本発明の動作説明図（その１）を
示す。図４の（ａ）は、負荷情報収集のフローチャート
を示す。図４の（ａ）において、ステージ１は、任意の
計算機で実行するものである。FIG. 4 is a diagram (No. 1) for explaining the operation of the present invention. FIG. 4A shows a flowchart of load information collection. In FIG. 4A, stage 1 is executed by an arbitrary computer.

【００３０】Ｓ１は、自計算機の負荷情報を収集する。
この負荷情報は、例えば図４の（ｂ）に示すないし
の情報を収集する。Ｓ２は、マスタ計算機に通知する。In step S1, the load information of the own computer is collected.
As the load information, for example, information shown in (b) of FIG. 4 is collected. S2 notifies the master computer.

【００３１】図４の（ａ）において、ステージ２は、マ
スタ計算機で実行するものである。Ｓ３は、各計算機の
負荷情報を受信する。Ｓ４は、負荷情報制御テーブルに
格納する。In FIG. 4A, stage 2 is executed by the master computer. In S3, the load information of each computer is received. S4 is stored in the load information control table.

【００３２】以上のステージ１のＳ１、Ｓ２およびステ
ージ２のＳ３、Ｓ４によって、全ての計算機２の負荷情
報およびＩ／Ｏ負荷情報を計算機（マスタ）２が収集
し、後述する図５の負荷情報制御テーブル２３のように
設定（Ｉ／Ｏ負荷情報は未設定）できたこととなる。By the above-mentioned S1 and S2 of stage 1 and S3 and S4 of stage 2, the computer (master) 2 collects the load information and the I / O load information of all the computers 2, and the load information of FIG. This means that the setting (I / O load information has not been set) can be made as in the control table 23.

【００３３】図４の（ｂ）は、ステージ１において収集
・通知する負荷情報の例を示す。負荷情報は、図示の
ないしの下記のものである。計算機識別子：並列型計算機システムを構成する各計
算機を識別するものＣＰＵ能力：実ＣＰＵの能力（実ＣＰＵ一台当たり
のＭＩＰＳ値）ＣＰＵ台数（１〜Ｎ）実ＣＰＵ台数（１〜Ｍ）ＣＰＵ使用率実ＣＰＵ使用率ＣＰＵ配分比：仮想計算機への実ＣＰＵ能力の配分比
（動的変更可能）計算機構成情報：実計算機、ＡＵＴＯモード／上限Ａ
ＵＴＯモード／ロジックモードの仮想計算機の区別を表
示実記憶負荷情報：スラッシングを起こしているか否か
を表示図４の（ｃ）は、余剰能力の評価のフローチャートを示
す。FIG. 4B shows an example of load information collected and notified in stage 1. The load information is as shown below or the following. Computer identifier: The one that identifies each computer that constitutes the parallel computer system CPU capacity: Real CPU capacity (MIPS value per real CPU) Number of CPUs (1 to N) Number of real CPUs (1 to M) CPU usage Percentage Real CPU usage CPU distribution ratio: Allocation ratio of real CPU capacity to virtual computers (dynamic change possible) Computer configuration information: Real computer, AUTO mode / upper limit A
Displaying the distinction between virtual machines in UTO mode / logic mode Real storage load information: Displaying whether thrashing is occurring FIG. 4 (c) shows a flowchart for evaluating surplus capacity.

【００３４】図４の（ｃ）において、ステージ１は、任
意の計算機で実行するものである。Ｓ１１は、ユーザが
任意の計算機にジョブを投入する。Ｓ１２は、ジョブが
投入された計算機がマスタ計算機か、スレーブ計算機か
判別する。マスタ計算機の場合には、ステージ２（図４
の（ａ）のステージ２）に進む。一方、スレーブ計算機
の場合には、Ｓ１３でマスタ計算機に、投入されたジョ
ブの情報を通知する。In FIG. 4C, the stage 1 is executed by an arbitrary computer. In S11, the user submits a job to any computer. In S12, it is determined whether the computer into which the job is submitted is a master computer or a slave computer. In the case of the master computer, the stage 2 (see FIG.
Go to stage 2) of (a). On the other hand, in the case of the slave computer, the information of the submitted job is notified to the master computer in S13.

【００３５】以上のＳ１１、Ｓ１２によって、計算機に
投入されたジョブ情報が全てマスタ計算機に通知された
こととなる。図５は、本発明の負荷情報制御テーブル例
を示す。この負荷情報制御テーブル２３は、既述した図
４のＳ４で、全ての計算機２から収集された負荷情報を
設定して記憶したものであって、既述した図４の（ｂ）
のないしの情報を設定して記憶したものである。図
中の“エラップス期待値”は、ないしの情報をもと
に算出したものであって、計算機の余剰能力を表すもの
であり、小さいほど、計算機の余剰能力が高いものであ
る。このエラップス期待値は、例えば下記の式によって
計算する（尚、図５の負荷情報制御テーブル２３は、Ｃ
ＰＵ処理時間＋ＣＰＵ待ち時間についてのものであ
る）。ジョブ投入からジョブ終了までに必要な時間（エ
ラップス期待値）は、＝ＣＰＵ処理時間＋ＣＰＵ待ち時間＋Ｉ／Ｏ処理時間＋Ｉ／Ｏ待ち時間（式１）＝ＣＰＵ処理時間×（１＋α（ＣＰＵ数、ＣＰＵ使用率））＋Ｉ／Ｏ処理時間×（１＋β（チャネル数、チャネル使用率））（式２） αとβは待ち行列理論の一般論から導かれるものであ
る。このαとβとの関係は、例えばＣＰＵバウンドなジ
ョブであれば、必然的にＣＰＵ処理時間が大きくなり、
Ｉ／Ｏ処理時間が小さくなるので、αの大小関係に敏感
に、βの大小関係に鈍感になり、ＣＰＵ負荷情報および
Ｉ／Ｏ負荷情報をまとめて計算機２の余剰能力を評価し
てエラップス値として算出することが可能となった。By the above S11 and S12, all the job information input to the computer has been notified to the master computer. FIG. 5 shows an example of the load information control table of the present invention. The load information control table 23 is one in which the load information collected from all the computers 2 is set and stored in S4 of FIG. 4 described above, and the load information control table 23 of FIG.
This is the information that is set and stored. "Elaps expected value" in the figure is calculated on the basis of the following information, and represents the surplus capacity of the computer. The smaller the value, the higher the surplus capacity of the computer. The expected Elaps value is calculated, for example, by the following formula (note that the load information control table 23 in FIG.
PU processing time + CPU latency). The time required from job input to job end (expected Elaps value) is: CPU processing time + CPU waiting time + I / O processing time + I / O waiting time (Equation 1) = CPU processing time × (1 + α (number of CPUs, CPU Utilization rate)) + I / O processing time × (1 + β (number of channels, channel utilization rate)) (Equation 2) α and β are derived from the general theory of queuing theory. The relationship between α and β inevitably increases the CPU processing time for a CPU-bound job,
Since the I / O processing time becomes shorter, the magnitude relationship of α becomes more sensitive and the magnitude relationship of β becomes insensitive, and the CPU load information and the I / O load information are collected to evaluate the surplus capacity of the computer 2 and the Elaps value. It became possible to calculate as.

【００３６】尚、Ｉ／Ｏ処理は、チャネルと呼ばれる入
出力機構を経由してディスク装置などと主記憶との間で
データの転送を行っている。１回のＩ／Ｏ処理にかかる
時間は、チャネル数やチャネルの使用率に影響されるの
で、上記（式２）に示すようにＣＰＵの場合と同様に評
価するようにしている。In the I / O processing, data is transferred between the disk device and the main memory via an input / output mechanism called a channel. Since the time required for one I / O processing is influenced by the number of channels and the usage rate of the channels, the time is evaluated in the same manner as in the case of the CPU as shown in (Equation 2).

【００３７】図６は、本発明の動作説明図（その２）を
示す。図６において、ステージ２は、マスタ計算機で実
行するものである。Ｓ２１は、任意の計算機より、投入
されたジョブの情報を受信する。これは、スレーブ計算
機に投入されたジョブの情報をマスタ計算機が受信、お
よびマスタ計算機に投入されたジョブの情報を受け付
け、既述した図５の負荷情報制御テーブル２３に設定す
る。FIG. 6 is a diagram for explaining the operation of the present invention (No. 2). In FIG. 6, stage 2 is executed by the master computer. In S21, the information of the input job is received from an arbitrary computer. For this, the master computer receives the information of the job input to the slave computer, receives the information of the job input to the master computer, and sets it in the load information control table 23 of FIG. 5 described above.

【００３８】Ｓ２２は、負荷情報制御テーブルを参照
し、各計算機の負荷情報を順々に取り出す。Ｓ２３は、
実記憶負荷が過負荷か判別する。ＹＥＳの場合には、Ｓ
２４ないしＳ２６をスキップしてＳ２７に進む。一方、
ＮＯの場合には、Ｓ２４に進む。In step S22, the load information control table is referred to, and the load information of each computer is sequentially extracted. S23 is
Determine if the real memory load is overloaded. If YES, S
Steps 24 to S26 are skipped and the process proceeds to S27. on the other hand,
In the case of NO, the process proceeds to S24.

【００３９】Ｓ２４は、計算機の種別を判別する。・実計算機またはＡＵＴＯモードの仮想計算機の場合に
は、Ｓ２６に示す下記の式４によって当該計算機のエラ
ップス期待値を評価し、Ｓ２７に進む。In step S24, the type of computer is determined. In the case of a real computer or a virtual computer in the AUTO mode, the Elaps expected value of the computer is evaluated by the following equation 4 shown in S26, and the process proceeds to S27.

【００４０】（１＋α（実ＣＰＵ数、実ＣＰＵ使用率））／ＣＰＵ能力（式４）・上限ＡＵＴＯモード仮想計算機またはロジカルモード
仮想計算機の場合には、Ｓ２５に示す下記の式５によっ
て当該計算機のエラップス期待値を評価し、Ｓ２７に進
む。(1 + α (actual CPU number, actual CPU usage rate)) / CPU capacity (Equation 4) ・ In the case of an upper limit AUTO mode virtual computer or a logical mode virtual computer, Elaps expected value is evaluated, and the process proceeds to S27.

【００４１】（１＋α（実ＣＰＵ数、実ＣＰＵ使用率））／（ＣＰＵ能力×ＣＰＵ配分比）（式５）Ｓ２７は、負荷情報制御テーブルの最後まで評価を行っ
たか判別する。ＹＥＳの場合には、Ｓ２８に進む。ＮＯ
の場合には、Ｓ２２に戻り、繰り返す。(1 + α (actual CPU number, actual CPU usage rate)) / (CPU capacity × CPU distribution ratio) (Equation 5) S27 determines whether the evaluation has been performed up to the end of the load information control table. If YES, the process proceeds to S28. NO
In the case of, it returns to S22 and repeats.

【００４２】Ｓ２８は、エラップス期待値が最小かつ実
記憶過負荷でない計算機をジョブ配送先として選択す
る。Ｓ２９は、ジョブを受け付けた計算機に、選択した
計算機をジョブ配送先として通知する。そして、図７の
ステージ３へ進む。In step S28, a computer having the smallest Elaps expected value and not having an actual memory overload is selected as the job delivery destination. In S29, the computer that has accepted the job is notified of the selected computer as the job delivery destination. And it progresses to the stage 3 of FIG.

【００４３】以上のＳ２１からＳ２９によって、全ての
計算機から受信した負荷情報を負荷情報制御テーブル２
３に設定した後、先頭から順番に取り出して実記憶負荷
が過負荷でない場合に計算機の種別によって分けてそれ
ぞれエラップス期待値を式４あるいは式５によって計算
し、エラップス期待値が最も小さい計算機にジョブを配
送させて転送するように通知することが可能となる。そ
して、後述する図７のステージ３によってジョブをエラ
ップス期待値の最も小さい（余剰能力の最も高い）計算
機に転送して実行させることが可能となる。Through the above steps S21 to S29, the load information received from all the computers is stored in the load information control table 2.
After setting the value to 3, the Elaps expected value is calculated according to Formula 4 or Formula 5 according to the type of the computer when the actual memory load is not overloaded by sequentially extracting from the beginning, and the computer with the smallest Elaps expected value is the job. It will be possible to notify the customer to deliver and transfer. Then, by a stage 3 of FIG. 7 which will be described later, it becomes possible to transfer the job to the computer with the smallest expected Elaps value (the highest surplus capacity) and execute it.

【００４４】図７は、本発明の動作説明図（その３）を
示す。図７において、ステージ３は、ジョブを受け付け
た計算機で実行するものである。FIG. 7 is an operation explanatory diagram (3) of the present invention. In FIG. 7, stage 3 is executed by the computer that has accepted the job.

【００４５】Ｓ３１は、マスタ計算機よりジョブ配送先
を受信し、ジョブをマスタ計算機から指示された計算機
に配送する。ステージ４は、ジョブを配送された計算機
で実行するものである。In S31, the job delivery destination is received from the master computer, and the job is delivered to the computer instructed by the master computer. In stage 4, the job is executed by the delivered computer.

【００４６】Ｓ４１は、ジョブを受け付けた計算機より
配送されたジョブを実行する。以上のＳ３１、Ｓ４１に
よって、エラップス期待値の最も小さい計算機にジョブ
を配送して実行させることが可能となる。In step S41, the job delivered from the computer that received the job is executed. By the above S31 and S41, it becomes possible to deliver and execute the job to the computer with the smallest expected Elaps value.

【００４７】図８は、本発明の余剰能力の評価例を示
す。これは、ＣＰＵ負荷情報のみをもとに余剰能力を評
価して計算したものである。以下説明する。ここで、あ
るサービスを行う複数の窓口に対する客の到着頻度がラ
ンダム到着に従い、そのサービス量が指数分布に従うと
仮定すると、到着した客がサービスを受けるまでの待ち
時間は、処理時間×（窓口の数と窓口の平均稼働率の関
数）として図８の（式６）として書けることが、待ち行
列理論の一般論として知られている。今、窓口の稼働率
をＣＰＵ使用率に、窓口の数をＣＰＵ台数に対応させ、
あるジョブを実行完了するために必要な時間をＣＰＵ待
ち時間とＣＰＵ処理時間だけで評価すれば、ＣＰＵ処理
時間は、（式７）のように評価でき、ジョブの処理に必
要なダイナミックステップ数はどの計算機で走行させて
も同じだから除外して考えると、ジョブを投入してから
終了するまでに要する時間に比例する量が得られる。こ
の値（エラップス期待値）が最も小さく、かつ実記憶負
荷が過負荷状態でない計算機に対してジョブを配送し、
実行させる。ここで、Ｉ／Ｏ処理時間およびＩ／Ｏ待ち
時間を既述した（式１）、（式２）のように含めるよう
にしてもよい。FIG. 8 shows an example of evaluating the surplus capacity of the present invention. This is calculated by evaluating the surplus capacity based only on the CPU load information. This will be described below. Here, if it is assumed that the arrival frequency of customers to a plurality of counters performing a certain service follows random arrival and that the service amount follows an exponential distribution, the waiting time until the arriving customers receive the service is calculated as (processing time x (counter counter). It is known as a general theory of queuing theory that it can be written as (Equation 6) in FIG. 8 as a function of the number and the average operating rate of windows. Corresponding the operating rate of the counter to the CPU usage rate and the number of the counter to the number of CPUs,
If the time required to complete the execution of a job is evaluated only by the CPU wait time and the CPU processing time, the CPU processing time can be evaluated as in (Equation 7), and the number of dynamic steps required for processing the job is Since it is the same regardless of which computer is used for running, if it is excluded, it is possible to obtain an amount proportional to the time required from the submission of a job to its completion. The job is delivered to the computer where this value (Elaps expected value) is the smallest and the real memory load is not overloaded.
Let it run. Here, the I / O processing time and the I / O waiting time may be included as in (Expression 1) and (Expression 2) described above.

【００４８】[0048]

【発明の効果】以上説明したように、本発明によれば、
計算機ごとにＣＰＵ能力および台数が異なり、しかも仮
想計算機が混在する並列型計算機システムであってもバ
ッチジョブを実行する計算機（仮想計算機を含む）２を
自動的に選択してジョブを配送し負荷分散を図る構成を
採用しているため、仮想計算機を含む並列型計算機シス
テムのバッチジョブを実行させる際に、最も余剰能力の
高い計算機を動的に選択してジョブを配送し実行させ、
処理効率を高めることができる。As described above, according to the present invention,
Even in a parallel computer system in which the CPU capacity and the number of computers are different for each computer, and virtual computers coexist, the computers (including virtual computers) 2 that execute batch jobs are automatically selected and the jobs are distributed to distribute the load. Because of the adoption of the configuration, when executing a batch job of a parallel computer system including a virtual machine, the computer with the highest surplus capacity is dynamically selected and the job is delivered and executed.
The processing efficiency can be improved.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明のシステム構成図である。FIG. 1 is a system configuration diagram of the present invention.

【図２】本発明の計算機（マスタ）例である。FIG. 2 is an example of a computer (master) of the present invention.

【図３】本発明の計算機（スレーブ）例である。FIG. 3 is an example of a computer (slave) of the present invention.

【図４】本発明の動作説明図（その１）である。FIG. 4 is an operation explanatory diagram (1) of the present invention.

【図５】本発明の負荷制御情報テーブル例である。FIG. 5 is an example of a load control information table of the present invention.

【図６】本発明の動作説明図（その２）である。FIG. 6 is an operation explanatory diagram (2) of the present invention.

【図７】本発明の動作説明図（その３）である。FIG. 7 is an operation explanatory diagram (3) of the present invention.

【図８】本発明の余剰能力の評価例である。FIG. 8 is an example of evaluation of surplus capacity of the present invention.

【図９】従来技術の説明図である。FIG. 9 is an explanatory diagram of a conventional technique.

【符号の説明】[Explanation of symbols]

１：並列型計算機システム２：計算機、仮想計算機２１：負荷情報収集手段２２：余剰能力評価手段２３：負荷情報制御テーブル 1: parallel computer system 2: computer, virtual computer 21: load information collecting means 22: surplus capacity evaluation means 23: load information control table

Claims

【特許請求の範囲】[Claims]

【請求項１】並列型計算機の各計算機に負荷を分散する
負荷分散制御システムにおいて、各計算機（仮想計算機を含む、以下同様）の負荷情報の
通知を受けて余剰能力を評価して算出する余剰能力評価
手段と、いずれかの計算機にジョブが投入されたときに当該ジョ
ブ情報の通知を受けて上記評価した余剰能力が最も高い
計算機を選択し、ジョブを受け付けた計算機が最も余剰
能力が高いときはその計算機にジョブを実行させ、一
方、ジョブを受け付けた計算機以外の他の計算機が最も
余剰能力が高いときはそのジョブを受け付けた計算機に
ジョブを転送させて実行させる手段とを並列型計算機の
うちのマスタ計算機に備えたことを特徴とする負荷分散
制御システム。1. A surplus calculated in a load balancing control system in which a load is distributed to each computer of a parallel type computer, when surplus capacity is evaluated by receiving notification of load information of each computer (including a virtual computer, the same applies hereinafter). Capacity evaluation means and when a job is submitted to one of the computers, the computer with the highest surplus capacity evaluated above is selected in response to the notification of the job information, and when the computer that received the job has the highest surplus capacity Causes the computer to execute the job, and when a computer other than the computer that has accepted the job has the highest surplus capacity, it transfers the job to the computer that has accepted the job and executes it. A load balancing control system equipped with our master computer.

【請求項２】実記憶に対する負荷が所定負荷よりも高く
て過負荷と判明したときにその計算機を除外して他の計
算機について余剰能力を評価して算出することを特徴と
する請求項１記載の負荷分散制御システム。2. The computer according to claim 1, wherein when the load on the real memory is higher than a predetermined load and it is determined that the load is overloaded, the computer is excluded and the surplus capacity is evaluated and calculated for another computer. Load balancing control system.

【請求項３】上記余剰能力として、各計算機のＣＰＵ処
理時間＋ＣＰＵ待ち時間、あるいはＣＰＵ処理時間＋Ｃ
ＰＵ待ち時間＋Ｉ／Ｏ処理時間＋Ｉ／Ｏ待ち時間とした
ことを特徴とする請求項１あるいは請求項２記載の負荷
分散制御システム。3. As the surplus capacity, CPU processing time + CPU waiting time of each computer or CPU processing time + C
3. The load balancing control system according to claim 1, wherein the PU waiting time + I / O processing time + I / O waiting time.