JP2021190001A

JP2021190001A - Job scheduling program, information processing apparatus, and job scheduling method

Info

Publication number: JP2021190001A
Application number: JP2020097647A
Authority: JP
Inventors: 成人鈴木; Shigeto Suzuki; 龍一関澤; Ryuichi Sekizawa
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2021-12-13

Abstract

To improve utilization efficiency of a system.SOLUTION: An information processing apparatus 10 determines whether a value of a first parameter of a first job 2 to be newly submitted is coincident with a value of a second parameter which is the same kind as first parameters of second jobs 3a, 3b, ... which have been already executed. When the second jobs 3a, 3b, ... do not include a job of which the value of the second parameter coincides with the value of the first parameter, the information processing apparatus 10 determines the time to terminate execution of the first job 2 as an estimated execution time for the case of scheduling the first job 2. When the second jobs 3a, 3b, ... include third jobs 4a, 4b, ... of which the value of the second parameter coincides with the value of the first parameter, the information processing apparatus 10 determines the estimated execution time on the basis of past execution time of the third jobs 4a, 4b, ....SELECTED DRAWING: Figure 1

Description

本発明は、ジョブスケジューリングプログラム、情報処理装置およびジョブスケジューリング方法に関する。 The present invention relates to a job scheduling program, an information processing apparatus, and a job scheduling method.

ＨＰＣ（High Performance Computing）システムなどの大規模なコンピュータシステム（以下、単にシステムと呼ぶこともある）は、大量の計算ノードを用いてジョブを実行する。そのため各計算ノードへのジョブの割当を管理するためのジョブスケジューリングが重要となる。 A large-scale computer system (hereinafter, also referred to simply as a system) such as an HPC (High Performance Computing) system executes a job using a large number of computing nodes. Therefore, job scheduling for managing job allocation to each compute node is important.

ジョブスケジューリングでは、例えば各ジョブの実行のために使用される計算ノードおよび各ジョブの実行のために計算ノードが使用される時間を示すスケジュールが決定される。そしてシステムは、スケジュールに従ってジョブを実行する。 Job scheduling determines, for example, a schedule that indicates the compute node used to execute each job and the time the compute node is used to execute each job. The system then executes the job according to the schedule.

ジョブスケジューリングに関する技術としては、例えば推定処理時間に基づいた手動操作により、自動スケジューリングの結果を適切に変更することを可能とするジョブスケジューリングシステムが提案されている。また例えば、時間軸方向をいくつかのスケジュール区間に分割し、個々のジョブをそのスケジュール区間単位に切り上げてスケジューリングする技術が提案されている。 As a technique related to job scheduling, a job scheduling system has been proposed that makes it possible to appropriately change the result of automatic scheduling by, for example, a manual operation based on an estimated processing time. Further, for example, a technique has been proposed in which the time axis direction is divided into several schedule sections, and individual jobs are rounded up to the schedule section for scheduling.

特開２００５−１４８９０１号公報Japanese Unexamined Patent Publication No. 2005-148901

山本啓二他、「区間スケジューリングを用いたジョブスケジューリングの性能評価」、情報処理学会研究報告、Ｖｏｌ．２０１４−ＨＰＣ−１４６Ｎｏ．３Keiji Yamamoto et al., "Performance Evaluation of Job Scheduling Using Interval Scheduling", IPSJ Research Report, Vol. 2014-HPC-146 No. 3

ジョブの実行のために計算ノードが使用される実際の時間（つまり、ジョブの実行時間）は、事前に分からない。そこで、ユーザが計算ノードの使用を希望する時間として指定した時間を用いて、ジョブスケジューリングすることが考えられる。システムでは、ユーザが指定した時間はジョブの打ち切り時間として使用される。すなわち、実行開始からユーザが指定した時間内に完了できないジョブの実行は、その時間が経過した時点で打ち切りとなり、ジョブの実行時間が指定された時間を超過することはない。そのため、ユーザが指定した時間に基づいてジョブスケジューリングを行えば、ユーザが指定した時間とジョブの実行時間とに差があっても、スケジュール通りに各ジョブの実行を開始することができる。しかしながら、ジョブの実際の実行時間がユーザの指定した時間より短い場合、その時間の差が大きいとシステムの使用効率が低下する。 The actual time the compute node is used to run the job (that is, the job run time) is not known in advance. Therefore, it is conceivable to schedule the job using the time specified by the user as the desired time to use the compute node. In the system, the time specified by the user is used as the job termination time. That is, the execution of a job that cannot be completed within the time specified by the user from the start of execution is terminated when that time elapses, and the execution time of the job does not exceed the specified time. Therefore, if job scheduling is performed based on the time specified by the user, the execution of each job can be started according to the schedule even if there is a difference between the time specified by the user and the execution time of the job. However, if the actual execution time of the job is shorter than the time specified by the user, the system usage efficiency will decrease if the time difference is large.

ここで、ジョブの実行時間を高精度に予測できれば、予測した実行時間に基づいてジョブスケジューリングを行うことで、システムの使用効率を向上させることが可能となる。ただし、実行されたジョブが予測した実行時間までに完了できなかった場合、そのジョブが予定の時間を超過して実行され、それ以降の他のジョブの実行スケジュールが崩れてしまう。ジョブの実行スケジュールが崩れると、システムの使用効率が却って悪化する可能性がある。 Here, if the job execution time can be predicted with high accuracy, it is possible to improve the system usage efficiency by performing job scheduling based on the predicted execution time. However, if the executed job cannot be completed by the expected execution time, the job will be executed beyond the scheduled time, and the execution schedule of other jobs after that will be disrupted. If the job execution schedule is broken, the system usage efficiency may worsen.

１つの側面では、本件は、システムの使用効率を向上させることを目的とする。 In one aspect, the present case aims to improve the efficiency of use of the system.

１つの案では、コンピュータに以下の処理を実行させるジョブスケジューリングプログラムが提供される
コンピュータは、新規に投入される第１ジョブおよび実行が終了した複数の第２ジョブそれぞれのパラメータを含む、第１ジョブおよび複数の第２ジョブそれぞれのジョブ情報に基づいて、第１ジョブの第１パラメータの値と複数の第２ジョブそれぞれの第１パラメータと同種の第２パラメータの値とが一致するか否かを判定する。そしてコンピュータは、複数の第２ジョブに、第２パラメータの値が第１パラメータの値と一致するジョブが含まれていない場合、第１ジョブが開始してから実行が打ち切られるまでの時間を第１ジョブがスケジューリングされる際の推定実行時間に決定する。またコンピュータは、複数の第２ジョブに、第２パラメータの値が第１パラメータの値と一致する一以上の第３ジョブが含まれる場合、一以上の第３ジョブそれぞれの過去に実行された際の実行時間に基づいて、推定実行時間を決定する。 In one proposal, the computer to which the job scheduling program for causing the computer to execute the following processing includes the parameters of the newly submitted first job and the plurality of second jobs whose execution has been completed, respectively, as the first job. And whether or not the value of the first parameter of the first job matches the value of the first parameter of each of the plurality of second jobs and the value of the second parameter of the same type based on the job information of each of the plurality of second jobs. judge. Then, when the plurality of second jobs do not include a job whose value of the second parameter matches the value of the first parameter, the computer sets the time from the start of the first job to the termination of execution. 1 Determines the estimated execution time when a job is scheduled. Further, when the computer includes one or more third jobs whose values of the second parameter match the values of the first parameter, when the plurality of second jobs are executed in the past of each of the one or more third jobs. Determine the estimated execution time based on the execution time of.

１態様によれば、システムの使用効率を向上させることができる。 According to one aspect, the efficiency of use of the system can be improved.

第１の実施の形態に係るジョブスケジューリング方法の一例を示す図である。It is a figure which shows an example of the job scheduling method which concerns on 1st Embodiment. 第２の実施の形態のシステム構成例を示す図である。It is a figure which shows the system configuration example of the 2nd Embodiment. 管理サーバのハードウェアの一構成例を示す図である。It is a figure which shows one configuration example of the hardware of a management server. スケジュールの一例を示す図である。It is a figure which shows an example of a schedule. 推定実行時間と実際の実行時間との差の第１の影響の例を示す図である。It is a figure which shows the example of the 1st influence of the difference between the estimated execution time and the actual execution time. 推定実行時間と実際の実行時間との差の第２の影響の例を示す図である。It is a figure which shows the example of the 2nd influence of the difference between the estimated execution time and the actual execution time. シミュレーション結果の一例を示す図である。It is a figure which shows an example of the simulation result. ジョブ作成過程の一例を示す図である。It is a figure which shows an example of a job creation process. ジョブスケジューリングのための各装置の機能を示すブロック図である。It is a block diagram which shows the function of each apparatus for job scheduling. 管理サーバのＤＢに格納される情報の一例を示す図である。It is a figure which shows an example of the information stored in the DB of a management server. ジョブ情報の一例を示す図である。It is a figure which shows an example of a job information. 学習結果情報の一例を示す図である。It is a figure which shows an example of learning result information. 類似ジョブ情報の一例を示す図である。It is a figure which shows an example of the similar job information. 実行時間推定処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of execution time estimation processing. 類似ジョブ抽出処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the similar job extraction process. ジョブスケジューリング処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of job scheduling processing.

以下、本実施の形態について図面を参照して説明する。なお各実施の形態は、矛盾のない範囲で複数の実施の形態を組み合わせて実施することができる。
〔第１の実施の形態〕
まず第１の実施の形態について説明する。 Hereinafter, the present embodiment will be described with reference to the drawings. It should be noted that each embodiment can be implemented by combining a plurality of embodiments within a consistent range.
[First Embodiment]
First, the first embodiment will be described.

図１は、第１の実施の形態に係るジョブスケジューリング方法の一例を示す図である。図１には、ジョブスケジューリング方法を実施する情報処理装置１０が示されている。情報処理装置１０は、例えばジョブスケジューリング方法の処理手順が記述されたジョブスケジューリングプログラムを実行することにより、ジョブスケジューリング方法を実施することができる。 FIG. 1 is a diagram showing an example of a job scheduling method according to the first embodiment. FIG. 1 shows an information processing apparatus 10 that implements a job scheduling method. The information processing apparatus 10 can implement the job scheduling method by, for example, executing a job scheduling program in which the processing procedure of the job scheduling method is described.

情報処理装置１０は、例えばＨＰＣシステム１に接続されており、ＨＰＣシステム１に新規に投入される第１ジョブ２のスケジューリングに用いられる実行時間を推定する。情報処理装置１０は、ジョブスケジューリング方法を実現するために、記憶部１１と処理部１２とを有する。記憶部１１は、例えば情報処理装置１０が有するメモリ、またはストレージ装置である。処理部１２は、例えば情報処理装置１０が有するプロセッサ、または演算回路である。 The information processing apparatus 10 is connected to, for example, the HPC system 1 and estimates the execution time used for scheduling the first job 2 newly submitted to the HPC system 1. The information processing apparatus 10 has a storage unit 11 and a processing unit 12 in order to realize a job scheduling method. The storage unit 11 is, for example, a memory or a storage device included in the information processing device 10. The processing unit 12 is, for example, a processor or an arithmetic circuit included in the information processing apparatus 10.

記憶部１１は、ジョブ情報１１ａ，１１ｂ，・・・を記憶する。ジョブ情報１１ａ，１１ｂ，・・・は、対応するジョブのパラメータを含む。ジョブのパラメータとしては、例えばジョブの名称を示すパラメータ（ジョブ名）、ジョブを投入したユーザの名称を示すパラメータ（ユーザ名）およびユーザが所属するグループを示すパラメータ（グループ名）等がある。また実行が終了したジョブのパラメータとしては、ジョブの実行時間がある。ジョブ情報１１ａ，１１ｂ，・・・には、第１ジョブ２および実行が終了した複数の第４ジョブそれぞれのジョブ情報が含まれる。 The storage unit 11 stores job information 11a, 11b, .... The job information 11a, 11b, ... Includes the parameters of the corresponding job. The parameters of the job include, for example, a parameter indicating the name of the job (job name), a parameter indicating the name of the user who submitted the job (user name), and a parameter indicating the group to which the user belongs (group name). Also, as a parameter of the job whose execution has been completed, there is the job execution time. The job information 11a, 11b, ... Includes job information for each of the first job 2 and the plurality of fourth jobs whose execution has been completed.

まず処理部１２は、第１ジョブ２と類似する、実行が終了した第２ジョブ３ａ，３ｂ，・・・を特定する。例えば処理部１２は、第１ジョブ２および実行が終了した複数の第４ジョブそれぞれのジョブ情報と所定の計算式を用いて、第１ジョブ２と複数の第４ジョブそれぞれとの類似度を計算する。処理部１２は、２つのジョブの類似度を、例えば２つのジョブのトピック分布の類似度を計算するための計算式を用いて計算する。処理部１２は、第１ジョブ２と複数の第４ジョブそれぞれとの類似度に基づいて、複数の第４ジョブの中から第１ジョブ２とジョブ情報が類似する第２ジョブ３ａ，３ｂ，・・・を特定する。例えば処理部１２は、複数の第４ジョブのうち、第１ジョブ２との類似度が高い順に所定数のジョブを、第２ジョブ３ａ，３ｂ，・・・として特定する。 First, the processing unit 12 identifies the second jobs 3a, 3b, ... Which are similar to the first job 2 and whose execution has been completed. For example, the processing unit 12 calculates the degree of similarity between the first job 2 and each of the plurality of fourth jobs by using the job information of the first job 2 and the plurality of fourth jobs whose execution has been completed and a predetermined calculation formula. do. The processing unit 12 calculates the similarity between the two jobs using, for example, a formula for calculating the similarity of the topic distributions of the two jobs. Based on the degree of similarity between the first job 2 and each of the plurality of fourth jobs, the processing unit 12 has the second jobs 3a, 3b, ...・・ Identify. For example, the processing unit 12 specifies a predetermined number of jobs as second jobs 3a, 3b, ... In descending order of similarity with the first job 2 among the plurality of fourth jobs.

次に処理部１２は、第１ジョブ２および第２ジョブ３ａ，３ｂ，・・・それぞれのジョブ情報に基づいて、第１ジョブ２の第１パラメータの値と第２ジョブ３ａ，３ｂ，・・・それぞれの第１パラメータと同種の第２パラメータの値とが一致するか否かを判定する。例えば第１パラメータは複数であり、複数の第１パラメータには、ジョブ名、ユーザ名およびグループ名が含まれる。 Next, the processing unit 12 sets the values of the first parameters of the first job 2 and the second jobs 3a, 3b, ..., Based on the respective job information of the first job 2 and the second jobs 3a, 3b, ... -It is determined whether or not the value of each first parameter and the value of the second parameter of the same type match. For example, the first parameter is plural, and the plurality of first parameters include a job name, a user name, and a group name.

そして処理部１２は、第１ジョブ２がスケジューリングされる際の推定実行時間を決定する。処理部１２は、第２ジョブ３ａ，３ｂ，・・・に、第２パラメータの値が第１パラメータの値と一致するジョブが含まれていない場合、第１ジョブ２が開始してから実行が打ち切られるまでの時間を第１ジョブ２がスケジューリングされる際の推定実行時間に決定する。例えば処理部１２は、第１ジョブ２のジョブ情報に含まれる、第１ジョブ２が開始してから実行が打ち切られるまでの時間を示すパラメータ（例えば、要求実行時間）を参照する。そして処理部１２は、第１ジョブ２の要求実行時間を第１ジョブ２がスケジューリングされる際の推定実行時間に決定する。 Then, the processing unit 12 determines the estimated execution time when the first job 2 is scheduled. If the second jobs 3a, 3b, ... Do not include a job whose value of the second parameter matches the value of the first parameter, the processing unit 12 executes the job after the first job 2 starts. The time until the termination is determined as the estimated execution time when the first job 2 is scheduled. For example, the processing unit 12 refers to a parameter (for example, request execution time) included in the job information of the first job 2 indicating the time from the start of the first job 2 to the termination of execution. Then, the processing unit 12 determines the request execution time of the first job 2 as the estimated execution time when the first job 2 is scheduled.

処理部１２は、第２ジョブ３ａ，３ｂ，・・・に、第２パラメータの値が第１パラメータの値と一致する第３ジョブ４ａ，４ｂ，・・・が含まれる場合、第３ジョブ４ａ，４ｂ，・・・それぞれの過去に実行された際の実行時間に基づいて、推定実行時間を決定する。例えば処理部１２は、第３ジョブ４ａ，４ｂ，・・・それぞれの過去に実行された際の実行時間のうちの最大実行時間に基づいて、推定実行時間を決定する。処理部１２は、推定実行時間を、第３ジョブ４ａ，４ｂ，・・・それぞれの実行時間のうちの最大実行時間に決定してもよいし、第３ジョブ４ａ，４ｂ，・・・それぞれの実行時間のうちの最大実行時間を所定の割合だけ増減したものに決定してもよい。 When the second job 3a, 3b, ... Includes the third job 4a, 4b, ... The value of the second parameter matches the value of the first parameter, the processing unit 12 includes the third job 4a, ... , 4b, ... The estimated execution time is determined based on the execution time when each was executed in the past. For example, the processing unit 12 determines the estimated execution time based on the maximum execution time of the execution times of the third jobs 4a, 4b, ..., Each of which has been executed in the past. The processing unit 12 may determine the estimated execution time to be the maximum execution time of the third jobs 4a, 4b, ..., Each of the third jobs 4a, 4b, .... The maximum execution time of the execution time may be increased or decreased by a predetermined ratio.

第１ジョブ２は、決定された推定実行時間に基づいてスケジューリングされる。第１ジョブ２は、処理部１２によってスケジューリングされてもよいし、他の情報処理装置によってスケジューリングされてもよい。例えば処理部１２は、第１ジョブ２を含む、ＨＰＣシステム１に投入された複数のジョブのスケジューリングを行う。処理部１２は、第１ジョブ２の実行のためにＨＰＣシステム１の計算ノードが推定実行時間だけ使用されるものとしてスケジューリングを行う。 The first job 2 is scheduled based on the determined estimated execution time. The first job 2 may be scheduled by the processing unit 12 or may be scheduled by another information processing device. For example, the processing unit 12 schedules a plurality of jobs submitted to the HPC system 1 including the first job 2. The processing unit 12 schedules the calculation node of the HPC system 1 to be used for the estimated execution time for the execution of the first job 2.

このような情報処理装置１０によれば、処理部１２は、第２パラメータの値が第１パラメータの値と一致するジョブがなければ、第１ジョブ２が開始してから実行が打ち切られるまでの時間を第１ジョブ２がスケジューリングされる際の推定実行時間に決定する。また処理部１２は、第２パラメータの値が第１パラメータの値と一致する第３ジョブ４ａ，４ｂ，・・・があれば、第３ジョブ４ａ，４ｂ，・・・それぞれの過去に実行された際の実行時間に基づいて、推定実行時間を決定する。 According to such an information processing apparatus 10, if there is no job whose value of the second parameter matches the value of the first parameter, the processing unit 12 starts the first job 2 until the execution is terminated. The time is determined as the estimated execution time when the first job 2 is scheduled. Further, if there is a third job 4a, 4b, ... The value of the second parameter matches the value of the first parameter, the processing unit 12 is executed in the past of each of the third jobs 4a, 4b, ... The estimated execution time is determined based on the execution time at the time.

ここで第１ジョブ２の推定実行時間より第１ジョブ２の実際の実行時間が長いと、第１ジョブ２を実行していた計算ノードは、第１ジョブ２の後に実行を予定していたジョブを開始予定時刻に開始できない。また第１ジョブ２の推定実行時間より第１ジョブ２の実際の実行時間が短いと、第１ジョブ２を実行していた計算ノードは、次のジョブの実行予定時刻まで待機することとなる。なお次のジョブの実行予定時刻まで待機することよりも、次のジョブを開始予定時刻に開始できないことのほうがＨＰＣシステム１の使用効率を低下させる。 Here, if the actual execution time of the first job 2 is longer than the estimated execution time of the first job 2, the calculation node executing the first job 2 is the job scheduled to be executed after the first job 2. Cannot start at the scheduled start time. If the actual execution time of the first job 2 is shorter than the estimated execution time of the first job 2, the calculation node that has executed the first job 2 will wait until the scheduled execution time of the next job. It should be noted that the inability to start the next job at the scheduled start time lowers the usage efficiency of the HPC system 1 rather than waiting until the scheduled execution time of the next job.

そこで情報処理装置１０は、第２パラメータの値が第１パラメータの値と一致する第３ジョブ４ａ，４ｂ，・・・に基づいて、第１ジョブ２の実行時間を予測する。情報処理装置１０は、第２パラメータの値が第１パラメータの値と一致するジョブがない（つまり、第１ジョブ２の実行時間を予測できない）場合、実行が打ち切られるまでの時間を推定実行時間に決定し、推定実行時間が実際の実行時間より長くなるようにする。また情報処理装置１０は、第２パラメータの値が第１パラメータの値と一致する第３ジョブ４ａ，４ｂ，・・・がある場合、第１ジョブ２の予測の実行時間を推定実行時間に決定し、計算ノードがジョブを実行していない時間が短くなるようにする。これにより、情報処理装置１０は、ＨＰＣシステム１の使用効率を向上させることができる。 Therefore, the information processing apparatus 10 predicts the execution time of the first job 2 based on the third jobs 4a, 4b, ..., Where the value of the second parameter matches the value of the first parameter. The information processing apparatus 10 estimates the execution time until the execution is terminated when there is no job whose value of the second parameter matches the value of the first parameter (that is, the execution time of the first job 2 cannot be predicted). To make the estimated execution time longer than the actual execution time. Further, when there are third jobs 4a, 4b, ..., Where the value of the second parameter matches the value of the first parameter, the information processing apparatus 10 determines the predicted execution time of the first job 2 as the estimated execution time. And make sure that the compute node spends less time running jobs. Thereby, the information processing apparatus 10 can improve the usage efficiency of the HPC system 1.

また処理部１２は、第３ジョブ４ａ，４ｂ，・・・がある場合、第３ジョブ４ａ，４ｂ，・・・それぞれの過去に実行された際の実行時間のうちの最大実行時間に基づいて、推定実行時間を決定する。これにより情報処理装置１０は、推定実行時間が実際の実行時間より長くなるようにできる。 Further, when there are third jobs 4a, 4b, ..., The processing unit 12 is based on the maximum execution time of the execution times when the third jobs 4a, 4b, ... Are executed in the past. , Determine the estimated execution time. As a result, the information processing apparatus 10 can make the estimated execution time longer than the actual execution time.

また第１パラメータにはジョブ名、ユーザ名およびグループ名が含まれる。これにより情報処理装置１０は、第１ジョブと同じような内容で、実行が終了したジョブの実行時間に基づいて、第１ジョブの推定実行時間を精度良く決定できる。 Further, the first parameter includes a job name, a user name and a group name. As a result, the information processing apparatus 10 can accurately determine the estimated execution time of the first job based on the execution time of the job whose execution has been completed, with the same contents as the first job.

また処理部１２は、複数の第４ジョブの中から第１ジョブ２とジョブ情報が類似する第２ジョブ３ａ，３ｂ，・・・を特定し、第２ジョブ３ａ，３ｂ，・・・に第３ジョブ４ａ，４ｂ，・・・が含まれるか否かに応じて上記の処理を実行する。これにより情報処理装置１０は、第１ジョブとジョブ情報が類似する、実行が終了したジョブの実行時間に基づいて、第１ジョブの推定実行時間を精度良く決定できる。 Further, the processing unit 12 identifies the second jobs 3a, 3b, ... The job information is similar to that of the first job 2 from the plurality of fourth jobs, and the second job 3a, 3b, ... 3 The above processing is executed depending on whether or not jobs 4a, 4b, ... Are included. As a result, the information processing apparatus 10 can accurately determine the estimated execution time of the first job based on the execution time of the job whose execution has been completed and whose job information is similar to that of the first job.

なお処理部１２は、第２ジョブ３ａ，３ｂ，・・・に、第３ジョブ４ａ，４ｂ，・・・が含まれる場合の、推定実行時間の決定では、第１パラメータと別種の第３パラメータに応じて、次のように推定実行時間を決定してもよい。 When the second jobs 3a, 3b, ... Include the third jobs 4a, 4b, ..., The processing unit 12 determines the estimated execution time with the third parameter of a different type from the first parameter. Depending on the situation, the estimated execution time may be determined as follows.

処理部１２は、第３ジョブ４ａ，４ｂ，・・・それぞれのジョブ情報に示される、第３ジョブ４ａ，４ｂ，・・・それぞれの第３パラメータを参照する。そして処理部１２は、第３ジョブ４ａ，４ｂ，・・・すべての第３パラメータの値が同じである場合、第３ジョブ４ａ，４ｂ，・・・それぞれの過去に実行された際の実行時間に基づいて、推定実行時間を決定する。また処理部１２は、第３ジョブ４ａ，４ｂ，・・・すべての第３パラメータの値が同じでない場合、第１ジョブ２が開始してから実行が打ち切られるまでの時間を推定実行時間に決定する。これにより情報処理装置１０は、推定実行時間が実際の実行時間より短くなることをさらに抑止できる。 The processing unit 12 refers to the third parameters of the third jobs 4a, 4b, ... Shown in the job information of the third jobs 4a, 4b, .... Then, when the values of the third jobs 4a, 4b, ..., All the third parameters are the same, the processing unit 12 has the execution time when the third jobs 4a, 4b, ... The estimated execution time is determined based on. Further, the processing unit 12 determines the time from the start of the first job 2 to the termination of execution as the estimated execution time when the values of all the third parameters of the third jobs 4a, 4b, ... Are not the same. do. As a result, the information processing apparatus 10 can further prevent the estimated execution time from becoming shorter than the actual execution time.

〔第２の実施の形態〕
次に第２の実施の形態について説明する。第２の実施の形態は、ＨＰＣシステムに投入するジョブの実行時間を推定し、推定した実行時間に基づいてジョブスケジューリングをするものである。 [Second Embodiment]
Next, the second embodiment will be described. In the second embodiment, the execution time of the job to be input to the HPC system is estimated, and the job scheduling is performed based on the estimated execution time.

図２は、第２の実施の形態のシステム構成例を示す図である。ＨＰＣシステム３０は、複数の計算ノード３１，３２，・・・を有している。計算ノード３１，３２，・・・は、投入されたジョブを実行するコンピュータである。 FIG. 2 is a diagram showing a system configuration example of the second embodiment. The HPC system 30 has a plurality of calculation nodes 31, 32, .... The calculation nodes 31, 32, ... Are computers that execute the submitted jobs.

ＨＰＣシステム３０内の計算ノード３１，３２，・・・は、ＨＰＣ運用管理サーバ２００に接続されている。ＨＰＣ運用管理サーバ２００は、ＨＰＣシステム３０の運用管理を行うコンピュータである。例えばＨＰＣ運用管理サーバ２００は、実行待ちのジョブについて、管理サーバ１００が推定したジョブの実行時間に基づいてジョブスケジューリングを行う。そしてＨＰＣ運用管理サーバ２００は、作成したジョブの実行スケジュールに従って、計算ノード３１，３２，・・・にジョブの実行を指示する。 The calculation nodes 31, 32, ... In the HPC system 30 are connected to the HPC operation management server 200. The HPC operation management server 200 is a computer that manages the operation of the HPC system 30. For example, the HPC operation management server 200 performs job scheduling for jobs waiting to be executed based on the job execution time estimated by the management server 100. Then, the HPC operation management server 200 instructs the calculation nodes 31, 32, ... To execute the job according to the execution schedule of the created job.

ＨＰＣ運用管理サーバ２００は、ネットワーク２０を介して端末装置４１，４２，・・・および管理サーバ１００に接続されている。端末装置４１，４２，・・・は、ＨＰＣシステム３０によるジョブの実行を希望するユーザが使用するコンピュータである。端末装置４１，４２，・・・は、ユーザの入力に基づいてＨＰＣシステム３０に実行させるジョブの内容を示すジョブ情報を生成し、生成したジョブ情報を含むジョブ投入要求を、ＨＰＣ運用管理サーバ２００に送信する。 The HPC operation management server 200 is connected to the terminal devices 41, 42, ... And the management server 100 via the network 20. The terminal devices 41, 42, ... Are computers used by a user who wishes to execute a job by the HPC system 30. The terminal devices 41, 42, ... Generate job information indicating the content of the job to be executed by the HPC system 30 based on the input of the user, and send a job input request including the generated job information to the HPC operation management server 200. Send to.

管理サーバ１００は、ＨＰＣ運用管理サーバ２００によるＨＰＣシステム３０のジョブスケジューリングを支援するコンピュータである。管理サーバ１００は、ＨＰＣ運用管理サーバ２００から、実行するジョブおよび実行が終了したジョブのジョブ情報を取得する。なお、実行が終了したジョブのジョブ情報には、ジョブの実行時間を示す情報が含まれる。管理サーバ１００は、実行が終了したジョブのジョブ情報と実行時間とに基づいて、新規に投入されたジョブ（新規投入ジョブ）の実行時間を推定する。そして管理サーバ１００は、ＨＰＣ運用管理サーバ２００に、新規投入ジョブの推定実行時間を送信する。 The management server 100 is a computer that supports job scheduling of the HPC system 30 by the HPC operation management server 200. The management server 100 acquires job information of a job to be executed and a job whose execution has been completed from the HPC operation management server 200. Note that the job information of the job whose execution has been completed includes information indicating the execution time of the job. The management server 100 estimates the execution time of a newly submitted job (newly submitted job) based on the job information and the execution time of the job whose execution has been completed. Then, the management server 100 transmits the estimated execution time of the newly submitted job to the HPC operation management server 200.

図３は、管理サーバのハードウェアの一構成例を示す図である。管理サーバ１００は、プロセッサ１０１によって装置全体が制御されている。プロセッサ１０１には、バス１０９を介してメモリ１０２と複数の周辺機器が接続されている。プロセッサ１０１は、マルチプロセッサであってもよい。プロセッサ１０１は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、またはＤＳＰ（Digital Signal Processor）である。プロセッサ１０１がプログラムを実行することで実現する機能の少なくとも一部を、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）などの電子回路で実現してもよい。 FIG. 3 is a diagram showing a configuration example of the hardware of the management server. The entire device of the management server 100 is controlled by the processor 101. A memory 102 and a plurality of peripheral devices are connected to the processor 101 via a bus 109. The processor 101 may be a multiprocessor. The processor 101 is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a DSP (Digital Signal Processor). At least a part of the functions realized by the processor 101 executing a program may be realized by an electronic circuit such as an ASIC (Application Specific Integrated Circuit) or a PLD (Programmable Logic Device).

メモリ１０２は、管理サーバ１００の主記憶装置として使用される。メモリ１０２には、プロセッサ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、メモリ１０２には、プロセッサ１０１による処理に利用する各種データが格納される。メモリ１０２としては、例えばＲＡＭ（Random Access Memory）などの揮発性の半導体記憶装置が使用される。 The memory 102 is used as the main storage device of the management server 100. At least a part of an OS (Operating System) program or an application program to be executed by the processor 101 is temporarily stored in the memory 102. Further, various data used for processing by the processor 101 are stored in the memory 102. As the memory 102, a volatile semiconductor storage device such as a RAM (Random Access Memory) is used.

バス１０９に接続されている周辺機器としては、ストレージ装置１０３、グラフィック処理装置１０４、入力インタフェース１０５、光学ドライブ装置１０６、機器接続インタフェース１０７およびネットワークインタフェース１０８がある。 Peripheral devices connected to the bus 109 include a storage device 103, a graphic processing device 104, an input interface 105, an optical drive device 106, a device connection interface 107, and a network interface 108.

ストレージ装置１０３は、内蔵した記録媒体に対して、電気的または磁気的にデータの書き込みおよび読み出しを行う。ストレージ装置１０３は、コンピュータの補助記憶装置として使用される。ストレージ装置１０３には、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。なお、ストレージ装置１０３としては、例えばＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）を使用することができる。 The storage device 103 electrically or magnetically writes and reads data to and from the built-in recording medium. The storage device 103 is used as an auxiliary storage device for a computer. The storage device 103 stores an OS program, an application program, and various data. As the storage device 103, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive) can be used.

グラフィック処理装置１０４には、モニタ２１が接続されている。グラフィック処理装置１０４は、プロセッサ１０１からの命令に従って、画像をモニタ２１の画面に表示させる。モニタ２１としては、有機ＥＬ（Electro Luminescence）を用いた表示装置や液晶表示装置などがある。 A monitor 21 is connected to the graphic processing device 104. The graphic processing device 104 causes the image to be displayed on the screen of the monitor 21 according to the instruction from the processor 101. The monitor 21 includes a display device using an organic EL (Electro Luminescence), a liquid crystal display device, and the like.

入力インタフェース１０５には、キーボード２２とマウス２３とが接続されている。入力インタフェース１０５は、キーボード２２やマウス２３から送られてくる信号をプロセッサ１０１に送信する。なお、マウス２３は、ポインティングデバイスの一例であり、他のポインティングデバイスを使用することもできる。他のポインティングデバイスとしては、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 A keyboard 22 and a mouse 23 are connected to the input interface 105. The input interface 105 transmits signals sent from the keyboard 22 and the mouse 23 to the processor 101. The mouse 23 is an example of a pointing device, and other pointing devices can also be used. Other pointing devices include touch panels, tablets, touchpads, trackballs and the like.

光学ドライブ装置１０６は、レーザ光などを利用して、光ディスク２４に記録されたデータの読み取りを行う。光ディスク２４は、光の反射によって読み取り可能なようにデータが記録された可搬型の記録媒体である。光ディスク２４には、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などがある。 The optical drive device 106 reads the data recorded on the optical disk 24 by using a laser beam or the like. The optical disk 24 is a portable recording medium on which data is recorded so that it can be read by reflection of light. The optical disk 24 includes a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable) / RW (ReWritable), and the like.

機器接続インタフェース１０７は、管理サーバ１００に周辺機器を接続するための通信インタフェースである。例えば機器接続インタフェース１０７には、メモリ装置２５やメモリリーダライタ２６を接続することができる。メモリ装置２５は、機器接続インタフェース１０７との通信機能を搭載した記録媒体である。メモリリーダライタ２６は、メモリカード２７へのデータの書き込み、またはメモリカード２７からのデータの読み出しを行う装置である。メモリカード２７は、カード型の記録媒体である。 The device connection interface 107 is a communication interface for connecting peripheral devices to the management server 100. For example, a memory device 25 or a memory reader / writer 26 can be connected to the device connection interface 107. The memory device 25 is a recording medium equipped with a communication function with the device connection interface 107. The memory reader / writer 26 is a device that writes data to the memory card 27 or reads data from the memory card 27. The memory card 27 is a card-type recording medium.

ネットワークインタフェース１０８は、ネットワーク２０に接続されている。ネットワークインタフェース１０８は、ネットワーク２０を介して、他のコンピュータまたは通信機器との間でデータの送受信を行う。 The network interface 108 is connected to the network 20. The network interface 108 transmits / receives data to / from another computer or communication device via the network 20.

管理サーバ１００は、以上のようなハードウェア構成によって、第２の実施の形態の処理機能を実現することができる。計算ノード３１，３２，・・・、ＨＰＣ運用管理サーバ２００、および端末装置４１，４２，・・・も、管理サーバ１００と同様のハードウェアで実現できる。なお計算ノード３１，３２，・・・は、計算ノード３１，３２，・・・同士で高速通信を行うためのインターコネクト用インタフェースをさらに有している。図１に示した第１の実施の形態の情報処理装置１０も、図３に示した管理サーバ１００と同様のハードウェアにより実現することができる。 The management server 100 can realize the processing function of the second embodiment by the hardware configuration as described above. The calculation nodes 31, 32, ..., HPC operation management server 200, and terminal devices 41, 42, ... Can also be realized by the same hardware as the management server 100. The calculation nodes 31, 32, ... Further have an interconnect interface for high-speed communication between the calculation nodes 31, 32, .... The information processing apparatus 10 of the first embodiment shown in FIG. 1 can also be realized by the same hardware as the management server 100 shown in FIG.

管理サーバ１００は、例えばコンピュータ読み取り可能な記録媒体に記録されたプログラムを実行することにより、第２の実施の形態の処理機能を実現する。管理サーバ１００に実行させる処理内容を記述したプログラムは、様々な記録媒体に記録しておくことができる。例えば、管理サーバ１００に実行させるプログラムをストレージ装置１０３に格納しておくことができる。プロセッサ１０１は、ストレージ装置１０３内のプログラムの少なくとも一部をメモリ１０２にロードし、プログラムを実行する。また管理サーバ１００に実行させるプログラムを、光ディスク２４、メモリ装置２５、メモリカード２７などの可搬型記録媒体に記録しておくこともできる。可搬型記録媒体に格納されたプログラムは、例えばプロセッサ１０１からの制御により、ストレージ装置１０３にインストールされた後、実行可能となる。またプロセッサ１０１が、可搬型記録媒体から直接プログラムを読み出して実行することもできる。 The management server 100 realizes the processing function of the second embodiment, for example, by executing a program recorded on a computer-readable recording medium. The program that describes the processing content to be executed by the management server 100 can be recorded on various recording media. For example, a program to be executed by the management server 100 can be stored in the storage device 103. The processor 101 loads at least a part of the program in the storage device 103 into the memory 102 and executes the program. Further, the program to be executed by the management server 100 can be recorded on a portable recording medium such as an optical disk 24, a memory device 25, and a memory card 27. The program stored in the portable recording medium can be executed after being installed in the storage device 103 by control from the processor 101, for example. The processor 101 can also read and execute the program directly from the portable recording medium.

図２に示すシステムにおいて、ＨＰＣ運用管理サーバ２００と管理サーバ１００とが連係動作し、ＨＰＣシステム３０で実行されるジョブのスケジューリングが行われる。例えば管理サーバ１００が、新規投入ジョブがＨＰＣシステム３０で実行される場合の実行時間を推定する。ＨＰＣ運用管理サーバ２００は、新規投入ジョブの推定実行時間に基づいて、計算ノード３１，３２，・・・それぞれが実行するジョブのスケジュールを生成するためのジョブスケジューリングを行う。 In the system shown in FIG. 2, the HPC operation management server 200 and the management server 100 operate in cooperation with each other, and the job executed by the HPC system 30 is scheduled. For example, the management server 100 estimates the execution time when a newly submitted job is executed in the HPC system 30. The HPC operation management server 200 performs job scheduling for generating a schedule of jobs to be executed by each of the calculation nodes 31, 32, ..., Based on the estimated execution time of the newly submitted job.

図４は、スケジュールの一例を示す図である。スケジュール５０は、ＨＰＣ運用管理サーバ２００が複数の新規投入ジョブを推定実行時間に基づいてスケジューリングしたスケジュールの一例である。スケジュール５０の縦軸は、計算ノード数を示す。なお図４の水平な点線は、計算ノード３１，３２，・・・の総数を示す。スケジュール５０の横軸は、時間を示す。第２の実施の形態では、現在から所定時間経過後までのスケジューリング期間が複数の区間に分割されている。なお、スケジューリング期間における最初の区間を０番目の区間ということがある。また各区間は等間隔でもよいし、それぞれ異なる間隔でもよい。図４ではジョブは、縦の長さがジョブの実行に用いられる計算ノード数、横の長さがジョブの推定実行時間を示す長方形で示される。長方形内の数字は、ジョブの優先度を示す。ＨＰＣ運用管理サーバ２００は、以下のようにスケジュール５０を生成する。 FIG. 4 is a diagram showing an example of a schedule. Schedule 50 is an example of a schedule in which the HPC operation management server 200 schedules a plurality of newly submitted jobs based on the estimated execution time. The vertical axis of the schedule 50 indicates the number of calculation nodes. The horizontal dotted line in FIG. 4 indicates the total number of calculation nodes 31, 32, .... The horizontal axis of the schedule 50 indicates the time. In the second embodiment, the scheduling period from the present to the lapse of a predetermined time is divided into a plurality of sections. The first section in the scheduling period may be referred to as the 0th section. Further, each section may be at equal intervals or may be at different intervals. In FIG. 4, a job is represented by a rectangle whose vertical length indicates the number of calculation nodes used to execute the job and whose horizontal length indicates the estimated execution time of the job. The numbers in the rectangle indicate the priority of the job. The HPC operation management server 200 generates the schedule 50 as follows.

ＨＰＣ運用管理サーバ２００は、０番目の区間から２番目の区間までの期間に、優先度が１のジョブを割り当てる。またＨＰＣ運用管理サーバ２００は、０番目の区間から１番目の区間までの期間に、優先度が２のジョブを割り当てる。またＨＰＣ運用管理サーバ２００は、０番目の区間から４番目の区間までの期間に、優先度が３のジョブを割り当てる。またＨＰＣ運用管理サーバ２００は、０番目の区間から３番目の区間までの期間に、優先度が４のジョブを割り当てる。 The HPC operation management server 200 allocates a job having a priority of 1 to the period from the 0th section to the 2nd section. Further, the HPC operation management server 200 allocates a job having a priority of 2 to the period from the 0th section to the 1st section. Further, the HPC operation management server 200 allocates a job having a priority of 3 to the period from the 0th section to the 4th section. Further, the HPC operation management server 200 allocates a job having a priority of 4 to the period from the 0th section to the 3rd section.

ここで優先度が１〜５のジョブそれぞれの実行に用いられる計算ノード数の合計は、計算ノード３１，３２，・・・の総数を超える。よってＨＰＣ運用管理サーバ２００は、すでに優先度が１〜４のジョブがスケジューリングされている０番目の区間には、優先度が５のジョブをスケジューリングできない。また同様に、ＨＰＣ運用管理サーバ２００は、優先度が５のジョブを１〜２番目の区間にスケジューリングできない。よってＨＰＣ運用管理サーバ２００は、３番目の区間から６番目の区間までの期間に、優先度が５のジョブを割り当てる。またＨＰＣ運用管理サーバ２００は、優先度が６のジョブを０〜４番目の区間にスケジューリングできないため、５番目の区間から８番目の区間までの期間に、優先度が６のジョブを割り当てる。 Here, the total number of calculation nodes used to execute each job having priority 1 to 5 exceeds the total number of calculation nodes 31, 32, .... Therefore, the HPC operation management server 200 cannot schedule a job having a priority of 5 in the 0th section in which a job having a priority of 1 to 4 is already scheduled. Similarly, the HPC operation management server 200 cannot schedule a job having a priority of 5 in the first or second section. Therefore, the HPC operation management server 200 allocates a job having a priority of 5 to the period from the third section to the sixth section. Further, since the HPC operation management server 200 cannot schedule a job having a priority of 6 to the 0th to 4th sections, the HPC operation management server 200 allocates a job having a priority of 6 to the period from the 5th section to the 8th section.

上記のようにして、ジョブの推定実行時間に基づいたスケジューリングが行われる。なお、ジョブの実際の実行時間は実行前に分からないため、推定実行時間に基づいたスケジューリング通りにジョブが実行されないことがある。そこで次に、推定実行時間が実際の実行時間より長い場合および推定実行時間が実際の実行時間より短い場合の、ジョブの実行に与える影響について説明する。 As described above, scheduling is performed based on the estimated execution time of the job. Since the actual execution time of the job is not known before execution, the job may not be executed according to the scheduling based on the estimated execution time. Therefore, next, the influence on the job execution when the estimated execution time is longer than the actual execution time and when the estimated execution time is shorter than the actual execution time will be described.

図５は、推定実行時間と実際の実行時間との差の第１の影響の例を示す図である。実行結果６１は、ジョブの実際の実行時間が実行前から分かっていると仮定して（つまり、推定実行時間＝実際の実行時間として）スケジューリングされたスケジュールに従って、優先度が１〜４のジョブがＨＰＣシステム３０で実行された結果である。実行結果６１では、優先度が１のジョブおよび優先度が２のジョブは、０番目の区間で実行されている。また実行結果６１では、優先度が３のジョブは、１番目の区間で実行されている。また実行結果６１では、優先度が４のジョブは、２番目の区間から３番目の区間までの期間で実行されている。 FIG. 5 is a diagram showing an example of the first effect of the difference between the estimated execution time and the actual execution time. In the execution result 61, jobs having a priority of 1 to 4 are assigned according to a scheduled schedule assuming that the actual execution time of the job is known before the execution (that is, estimated execution time = actual execution time). It is the result of execution in HPC system 30. In the execution result 61, the job having the priority 1 and the job having the priority 2 are executed in the 0th section. Further, in the execution result 61, the job having the priority 3 is executed in the first section. Further, in the execution result 61, the job having the priority of 4 is executed in the period from the second section to the third section.

スケジュール６２は、優先度が２のジョブの実行時間が実際の実行時間より長めに推定された場合に、実行結果６１に示す優先度が１〜４のジョブがスケジューリングされたスケジュールである。スケジュール６２では、優先度が１のジョブは、０番目の区間にスケジューリングされている。またスケジュール６２では、優先度が２のジョブは、０番目の区間から１番目の区間の期間にスケジューリングされている。 The schedule 62 is a schedule in which the jobs having the priority 1 to 4 shown in the execution result 61 are scheduled when the execution time of the job having the priority 2 is estimated to be longer than the actual execution time. In the schedule 62, the job having the priority 1 is scheduled in the 0th interval. Further, in the schedule 62, the job having the priority 2 is scheduled in the period from the 0th section to the 1st section.

ここで、優先度が１のジョブ、優先度が２のジョブおよび優先度が３のジョブそれぞれの実行に用いられる計算ノード数の合計は、計算ノード３１，３２，・・・の総数を超える。そのため優先度が３のジョブは２番目の区間にスケジューリングされている。一方優先度が１のジョブ、優先度が２のジョブおよび優先度が４のジョブそれぞれの実行に用いられる計算ノード数の合計は、計算ノード３１，３２，・・・の総数以下である。そのため優先度が４のジョブは、０番目の区間から１番目の区間の期間に、優先度が３のジョブを追い越してスケジューリングされている（バックフィルされている）。 Here, the total number of calculated nodes used to execute the job having the priority 1, the job having the priority 2 and the job having the priority 3 exceeds the total number of the calculated nodes 31, 32, .... Therefore, the job with priority 3 is scheduled in the second section. On the other hand, the total number of calculation nodes used to execute each of the job having priority 1, the job having priority 2, and the job having priority 4 is not more than the total number of calculation nodes 31, 32, .... Therefore, the job having the priority of 4 is scheduled (backfilled) overtaking the job having the priority of 3 in the period from the 0th section to the 1st section.

実行結果６３は、スケジュール６２に従って優先度が１〜４のジョブがＨＰＣシステム３０で実行された結果である。実行結果６３では、優先度が１のジョブおよび優先度が２のジョブは、０番目の区間で実行されている。また実行結果６３では、優先度が３のジョブは、２番目の区間で実行されている。また実行結果６３では、優先度が４のジョブは、０番目の区間から１番目の区間までの期間で実行されている。 The execution result 63 is a result of the jobs having priorities 1 to 4 being executed in the HPC system 30 according to the schedule 62. In the execution result 63, the job having the priority 1 and the job having the priority 2 are executed in the 0th section. Further, in the execution result 63, the job having the priority 3 is executed in the second section. Further, in the execution result 63, the job having the priority of 4 is executed in the period from the 0th section to the 1st section.

このように実行結果６３では、実行結果６１よりも１番目の区間でＨＰＣシステム３０の使用効率が低下する。なお実行結果６１では、２番目の区間から３番目の区間までの期間で優先度が４のジョブのみが実行されている。しかし実行結果６３では、２番目の区間から３番目の区間までの期間で、新しく投入されたジョブを空いている計算ノードにスケジューリングできるため、ＨＰＣシステム３０の使用効率は低下しない。よって推定実行時間が実際の実行時間より長い場合、ＨＰＣシステム３０の使用効率が低下する。 As described above, in the execution result 63, the usage efficiency of the HPC system 30 is lowered in the first section from the execution result 61. In the execution result 61, only the job having the priority of 4 is executed in the period from the second section to the third section. However, in the execution result 63, since the newly submitted job can be scheduled to the vacant calculation node in the period from the second section to the third section, the usage efficiency of the HPC system 30 does not decrease. Therefore, if the estimated execution time is longer than the actual execution time, the usage efficiency of the HPC system 30 will decrease.

図６は、推定実行時間と実際の実行時間との差の第２の影響の例を示す図である。実行結果７１は、ジョブの実際の実行時間が実行前から分かっていると仮定して、図５に示した例とは別の優先度が１〜４のジョブがＨＰＣシステム３０で実行された結果である。実行結果７１では、優先度が１のジョブは、０番目の区間で実行されている。また実行結果７１では、優先度が２のジョブおよび優先度が４のジョブは、０番目の区間から１番目の区間までの期間で実行されている。また実行結果７１では、優先度が３のジョブは、２番目の区間で実行されている。 FIG. 6 is a diagram showing an example of the second effect of the difference between the estimated execution time and the actual execution time. The execution result 71 is the result of executing a job having a priority of 1 to 4 in the HPC system 30, which is different from the example shown in FIG. 5, assuming that the actual execution time of the job is known before the execution. Is. In the execution result 71, the job having the priority 1 is executed in the 0th interval. Further, in the execution result 71, the job having the priority 2 and the job having the priority 4 are executed in the period from the 0th section to the 1st section. Further, in the execution result 71, the job having the priority 3 is executed in the second section.

スケジュール７２は、優先度が２のジョブの実行時間が実際の実行時間より短めに推定された場合に、実行結果７１に示す優先度が１〜４のジョブがスケジューリングされたスケジュールである。スケジュール７２では、優先度が１のジョブおよび優先度が２のジョブは、０番目の区間にスケジューリングされている。またスケジュール７２では、優先度が３のジョブは、１番目の区間にスケジューリングされている。またスケジュール７２では、優先度が４のジョブは、２番目の区間から３番目の区間の期間にスケジューリングされている。 The schedule 72 is a schedule in which jobs with priorities 1 to 4 shown in the execution result 71 are scheduled when the execution time of the job with priority 2 is estimated to be shorter than the actual execution time. In the schedule 72, the job having the priority 1 and the job having the priority 2 are scheduled in the 0th interval. Further, in the schedule 72, the job having the priority 3 is scheduled in the first section. Further, in the schedule 72, the job having the priority of 4 is scheduled in the period from the second section to the third section.

実行結果７３は、スケジュール７２に従って優先度が１〜４のジョブがＨＰＣシステム３０で実行された結果である。実行結果７３では、優先度が１のジョブは、０番目の区間で実行されている。また実行結果７３では、優先度が２のジョブは、０番目の区間から１番目の区間までの期間で実行されている。 The execution result 73 is a result of the jobs having priorities 1 to 4 being executed in the HPC system 30 according to the schedule 72. In the execution result 73, the job having the priority 1 is executed in the 0th interval. Further, in the execution result 73, the job having the priority 2 is executed in the period from the 0th section to the 1st section.

ここで、優先度が２のジョブおよび優先度が３のジョブそれぞれの実行に用いられる計算ノード数の合計は、計算ノード３１，３２，・・・の総数を超える。そのため実行結果７３では、優先度が３のジョブは、スケジューリングされている１番目の区間で実行されず、２番目の区間で実行されている。また実行結果７３では、優先度が４のジョブは、３番目の区間から４番目の区間までの期間で実行されている。 Here, the total number of calculated nodes used to execute the job having the priority 2 and the job having the priority 3 exceeds the total number of the calculated nodes 31, 32, .... Therefore, in the execution result 73, the job having the priority 3 is not executed in the first scheduled section, but is executed in the second section. Further, in the execution result 73, the job having the priority of 4 is executed in the period from the third section to the fourth section.

このように実行結果７１では０番目の区間から１番目の区間までの期間で実行されている優先度４のジョブは、実行結果７３では３番目の区間から４番目の区間までの期間で実行されている。これにより、実行結果７３では、実行結果７１よりも０番目の区間および１番目の区間でＨＰＣシステム３０の使用効率が低下する。よって推定実行時間が実際の実行時間より短い場合、ＨＰＣシステム３０の使用効率が低下する。 In this way, the job of priority 4 executed in the period from the 0th section to the 1st section in the execution result 71 is executed in the period from the 3rd section to the 4th section in the execution result 73. ing. As a result, in the execution result 73, the usage efficiency of the HPC system 30 is lowered in the 0th section and the 1st section than the execution result 71. Therefore, if the estimated execution time is shorter than the actual execution time, the usage efficiency of the HPC system 30 will decrease.

上記のように、推定実行時間が実際の実行時間より長い場合も短い場合も、ＨＰＣシステム３０の使用効率が低下することがある。図５に示すように推定実行時間が実際の実行時間より長い場合、各ジョブはスケジュール６２に示される開始時間通りに開始される。一方図６に示すように推定実行時間が実際の実行時間より短い場合、スケジュール７２に示されるジョブの開始時間が遅れることがある。よって推定実行時間が実際の実行時間より長い場合よりも短い場合のほうが、ＨＰＣシステム３０の使用効率が低下する。また以下では、推定実行時間に対するシステムの使用効率のシミュレーション結果について説明する。 As described above, the efficiency of use of the HPC system 30 may decrease when the estimated execution time is longer or shorter than the actual execution time. If the estimated execution time is longer than the actual execution time as shown in FIG. 5, each job is started according to the start time shown in the schedule 62. On the other hand, when the estimated execution time is shorter than the actual execution time as shown in FIG. 6, the start time of the job shown in the schedule 72 may be delayed. Therefore, the usage efficiency of the HPC system 30 is lower when the estimated execution time is shorter than when it is longer than the actual execution time. In the following, the simulation results of system usage efficiency with respect to the estimated execution time will be described.

図７は、シミュレーション結果の一例を示す図である。シミュレーション結果８１は、推定実行時間の条件を変えてシミュレーションした際のシステムの使用効率を示す。シミュレーション結果８１には、推定実行時間をジョブが開始してから実行が打ち切られるまでの時間（要求実行時間）に設定した場合の、システムの使用効率が示される。またシミュレーション結果８１には、推定実行時間が実際の実行時間（実測）より長い場合と短い場合とを含む場合のシステムの使用効率が示される。またシミュレーション結果８１には、推定実行時間が実際の実行時間より長い場合のみを含む場合のシステムの使用効率が示される。またシミュレーション結果８１には、推定実行時間が実際の実行時間より短い場合のみを含む場合のシステムの使用効率が示される。またシミュレーション結果８１には、推定実行時間が実際の実行時間と等しくなる場合のシステムの使用効率が示される。 FIG. 7 is a diagram showing an example of simulation results. The simulation result 81 shows the usage efficiency of the system when the simulation is performed by changing the conditions of the estimated execution time. The simulation result 81 shows the efficiency of use of the system when the estimated execution time is set to the time from the start of the job to the termination of execution (request execution time). Further, the simulation result 81 shows the usage efficiency of the system when the estimated execution time is longer than the actual execution time (actual measurement) and when it is shorter than the actual execution time. Further, the simulation result 81 shows the efficiency of use of the system when the estimated execution time is longer than the actual execution time. Further, the simulation result 81 shows the efficiency of use of the system when the estimated execution time is shorter than the actual execution time. Further, the simulation result 81 shows the efficiency of use of the system when the estimated execution time is equal to the actual execution time.

シミュレーション結果８１では、システムの使用効率の一例として充填率および計算ノード数ごとの平均待ち時間が示されている。充填率Ｒ（％）は、以下の式で表される。
Ｒ＝１００×（Ｐ＋Ｑ）／（８２９４４×ＤＡＹ×２４×３６００）（１）
Ｐは、シミュレーション期間中に実行が終了したジョブが使用した計算ノードと使用時間との積の合計である。Ｑは、シミュレーション期間中に実行が終了しなかったジョブが使用した計算ノードと使用時間との積の合計である。ＤＡＹは、シミュレーション期間（日）であり、例えば１２．９である。なおＰは、以下の式で表される。
Ｐ＝Σ（（ｎｏｄｅ＿ｎｕｍ）×（ｅｌａｐｓｅ））（２）
式（２）のΣは、シミュレーション期間中に実行が終了したジョブについて、かっこ内の値を合計することを示す。ｎｏｄｅ＿ｎｕｍは、ジョブが使用した計算ノード数である。ｅｌａｐｓｅは、ジョブが計算ノードを使用した秒単位の時間（実行時間）である。またＱは、以下の式で表される。
Ｑ＝Σ（（ｎｏｄｅ＿ｎｕｍ）×（ｓｔａｒｔ＋ＤＡＹ−ｊｏｂ＿ｓｔａｒｔ））
（３）
式（３）のΣは、シミュレーション期間中に実行が終了しなかったジョブについて、かっこ内の値を合計することを示す。ｓｔａｒｔは、シミュレーションが開始した時刻である。ｊｏｂ＿ｓｔａｒｔは、ジョブの実行が開始した時刻である。つまり、ｓｔａｒｔ＋ＤＡＹ−ｊｏｂ＿ｓｔａｒｔは、シミュレーション期間中にジョブが計算ノードを使用した秒単位の時間である。 In the simulation result 81, the filling rate and the average waiting time for each number of calculation nodes are shown as an example of the utilization efficiency of the system. The filling factor R (%) is expressed by the following formula.
R = 100 × (P + Q) / (82944 × DAY × 24 × 3600) (1)
P is the total product of the calculated node used by the job whose execution is completed during the simulation period and the usage time. Q is the sum of the products of the calculated nodes used by the jobs whose execution did not finish during the simulation period and the usage time. DAY is a simulation period (day), for example 12.9. Note that P is expressed by the following equation.
P = Σ ((node_num) × (elapse)) (2)
Σ in the equation (2) indicates that the values in parentheses are summed for the jobs whose execution has been completed during the simulation period. node_num is the number of compute nodes used by the job. elapse is the time (execution time) in seconds that the job used the compute node. Further, Q is expressed by the following equation.
Q = Σ ((node_num) × (start + DAY-job_start))
(3)
Σ in the equation (3) indicates that the values in parentheses are summed for the jobs whose execution has not been completed during the simulation period. start is the time when the simulation started. job_start is the time when the job execution started. That is, start + DAY-job_start is the time in seconds that the job used the compute node during the simulation period.

また待ち時間Ｗ（ｈ）は、以下の式で表される。
Ｗ＝（ｊｏｂ＿ｓｔａｒｔ−ｊｏｂ＿ｓｕｂ）／３６００（４）
ｊｏｂ＿ｓｕｂは、ジョブの投入時刻である。つまりｊｏｂ＿ｓｔａｒｔ−ｊｏｂ＿ｓｕｂは、ジョブが投入されてから実行されるまでの秒単位の時間である。 The waiting time W (h) is expressed by the following equation.
W = (job_start-job_sub) / 3600 (4)
job_sub is the job submission time. That is, job_start-job_sub is the time in seconds from the submission of the job to the execution.

なお図７の下線は、各システムの使用効率の指標について、推定実行時間が実際の実行時間と等しくなる場合以外の条件のうち、最も好適なものを示す。このようなシミュレーション結果８１にも示されるように、推定実行時間が実際の実行時間より長い場合よりも短い場合のほうが、ＨＰＣシステム３０の使用効率が低下する。 The underline in FIG. 7 shows the most suitable condition for the index of the utilization efficiency of each system, except for the case where the estimated execution time is equal to the actual execution time. As shown in such a simulation result 81, the usage efficiency of the HPC system 30 is lower when the estimated execution time is shorter than when it is longer than the actual execution time.

そこで第２の実施の形態の管理サーバ１００は、新規投入ジョブの実行時間が予測できる可能性が低ければ、要求実行時間をジョブの推定実行時間に決定する。また管理サーバ１００は、新規投入ジョブの実行時間が予測できる可能性が高ければ、予測した実行時間に基づいてジョブの推定実行時間に決定する。 Therefore, if it is unlikely that the execution time of the newly submitted job can be predicted, the management server 100 of the second embodiment determines the requested execution time as the estimated execution time of the job. Further, if there is a high possibility that the execution time of the newly submitted job can be predicted, the management server 100 determines the estimated execution time of the job based on the predicted execution time.

管理サーバ１００は、実行済みのジョブのうち、新規投入ジョブに類似し、所定のパラメータの値が一致するジョブの実行時間に基づいて、新規投入ジョブの実行時間を予測する。所定のパラメータは、例えばジョブの名称を示すジョブ名、ジョブを投入したユーザの名称を示すユーザ名およびユーザが所属するグループを示すグループ名である。そのために、管理サーバ１００は、まず例えば新規投入ジョブに類似するジョブ（類似ジョブ）を特定する。ジョブ間の類似度は、ジョブの実行要求を入力したユーザのユーザＩＤ、ジョブの種別、ジョブ実行時の並列度（何台の計算ノードで並列実行させるか）などの、ジョブのステイタスを示すパラメータを含む情報（以下、ジョブステイタス情報）の類似度で表される。ジョブステイタス情報は、第１の実施の形態で説明したジョブ情報の一例である。 The management server 100 predicts the execution time of the newly submitted job based on the execution time of the jobs that are similar to the newly submitted job and match the values of the predetermined parameters among the executed jobs. The predetermined parameters are, for example, a job name indicating the name of the job, a user name indicating the name of the user who submitted the job, and a group name indicating the group to which the user belongs. Therefore, the management server 100 first identifies, for example, a job (similar job) similar to a newly submitted job. The similarity between jobs is a parameter that indicates the status of the job, such as the user ID of the user who entered the job execution request, the type of job, and the degree of parallelism at the time of job execution (how many calculation nodes should be executed in parallel). It is represented by the degree of similarity of information including (hereinafter, job status information). The job status information is an example of the job information described in the first embodiment.

各ジョブのジョブステイタス情報は、ジョブのステイタスに関する項目の項目名とその項目の値との組を複数含む文書である。文書間の類似度の算出に利用可能な技術として、潜在的ディリクレ配分法（ＬＤＡ：Latent Dirichlet Allocation）推定モデルがある。例えば管理サーバ１００は、ＬＤＡ推定モデルを用いて、各ジョブのジョブステイタス情報に表されるトピック分布を算出し、ジョブ間のトピック分布の類似度をジョブの類似度とする。 The job status information of each job is a document including a plurality of sets of item names of items related to job status and values of the items. As a technique that can be used to calculate the similarity between documents, there is a latent Dirichlet Allocation (LDA) estimation model. For example, the management server 100 calculates the topic distribution represented in the job status information of each job by using the LDA estimation model, and sets the similarity of the topic distribution between jobs as the job similarity.

ＬＤＡ推定モデルは、トピックモデルの一種である。トピックモデルは、文書が複数の潜在的なトピックから確率的に生成される（文書内の各単語はあるトピックが持つ確率分布に従って出現する）と仮定したモデルである。ＬＤＡ推定モデルを用いると、分析対象となる文書データの集合から、各文書に表されているトピックの混合比率を推定することができる。 The LDA estimation model is a kind of topic model. The topic model is a model that assumes that a document is probabilistically generated from multiple potential topics (each word in the document appears according to the probability distribution of a topic). Using the LDA estimation model, it is possible to estimate the mixing ratio of the topics represented in each document from the set of document data to be analyzed.

各文書のトピック分布の生成には、多項分布の共役事前分布であるディリクレ分布（dirichlet distribution）が利用される。なお、ディリクレ分布は、以下の式で表される。 The Dirichlet distribution, which is a conjugate prior of the multinomial distribution, is used to generate the topic distribution for each document. The Dirichlet distribution is expressed by the following equation.

式（５）は、ハイパーパラメータであるベクトルαの元で、ベクトルｘが生じる確率を示している。Γはガンマ関数である。ベクトルｘは、確率変数を示す実数ベクトルである。Ｋはトピック数である。ｋはトピックのインデックスである。 Equation (5) shows the probability that the vector x is generated under the hyperparameter vector α. Γ is a gamma function. The vector x is a real vector indicating a random variable. K is the number of topics. k is a topic index.

管理サーバ１００は、トレーニングデータセットであるジョブステイタス情報群から、各文章（ジョブステイタス情報）にどんな単語が出現するかをそれぞれ調べる。そして管理サーバ１００は、同じ文章内にどの単語が多く出現するかカウントすることで、同じ文章内に出現する確率が高い単語をグルーピングし、これをトピックとする。 The management server 100 examines what kind of word appears in each sentence (job status information) from the job status information group which is a training data set. Then, the management server 100 counts which words frequently appear in the same sentence, groups words that have a high probability of appearing in the same sentence, and sets this as a topic.

具体的には、管理サーバ１００は、各文書および各単語について以下の式（６）により、確率を計算する。 Specifically, the management server 100 calculates the probabilities for each document and each word by the following equation (6).

Ｎは文書集合の全単語数である。Ｖは全語彙数（全文書集合に含まれる単語の種類数）である。ｄは文書のインデックスである。ｎは単語のインデックスである。ｖは語彙のインデックスである。ｗはある１つの単語である。ｚはある１つのトピックである。バックスラッシュは、集合からの差を示す。βは、単語分布のハイパーパラメータである。式（６）は、文書ｄにおける単語ｗ_d,nについてのトピックｚ_d,nのサンプリング式である。 N is the total number of words in the document set. V is the total number of vocabularies (the number of types of words contained in the entire document set). d is the index of the document. n is a word index. v is a vocabulary index. w is a word. z is a topic. The backslash indicates the difference from the set. β is a hyperparameter of word distribution. Equation (6) is a sampling equation for topics z _{d, n} _{for words w d, n in document d.}

管理サーバ１００は、式（６）で得られる確率が高い（例えば所定値以上の）単語の組み合わせをトピックとする。すなわち管理サーバ１００は、ＬＤＡ推定モデルを用いた学習の結果、トピックに属する単語の集合を得る。 The management server 100 uses a combination of words with a high probability (for example, a predetermined value or more) obtained by the equation (6) as a topic. That is, the management server 100 obtains a set of words belonging to the topic as a result of learning using the LDA estimation model.

管理サーバ１００は、各ジョブのジョブステイタス情報に含まれる単語が属するトピックに基づいて、ジョブステイタス情報のトピック分布を計算する。管理サーバ１００は、各ジョブのジョブステイタス情報に基づいて生成されたトピック分布をジョブ間で比較して、ジョブ間の類似度を算出することができる。 The management server 100 calculates the topic distribution of the job status information based on the topic to which the word included in the job status information of each job belongs. The management server 100 can compare the topic distributions generated based on the job status information of each job among the jobs and calculate the similarity between the jobs.

例えば管理サーバ１００は、類似ジョブを、トピック分布の類似度によって推定する。例えば管理サーバ１００は、トピック分布間のコサイン類似度を計算することで、ジョブの類似度を算出する。 For example, the management server 100 estimates similar jobs based on the similarity of the topic distribution. For example, the management server 100 calculates the job similarity by calculating the cosine similarity between the topic distributions.

管理サーバ１００は、ジョブごとにトピック分布を算出する。トピック分布は、トピックのインデックスを要素番号とし、文書（ジョブステイタス情報）内での該当トピックの出現頻度の値を要素とするベクトルで表すことができる。管理サーバ１００は、新規投入ジョブのトピック分布を示すベクトルと、実行が終了しているジョブのトピック分布を示すベクトルとのコサイン類似度を算出し、ジョブ間の類似度とする。これにより、比較対象のジョブそれぞれのトピック分布に共通のトピックが多く含まれるほど、類似度が高くなる。 The management server 100 calculates the topic distribution for each job. The topic distribution can be represented by a vector whose element number is the index of the topic and whose element is the value of the frequency of appearance of the corresponding topic in the document (job status information). The management server 100 calculates the cosine similarity between the vector showing the topic distribution of the newly submitted job and the vector showing the topic distribution of the job whose execution has been completed, and uses this as the similarity between the jobs. As a result, the more common topics are included in the topic distribution of each job to be compared, the higher the similarity.

なお管理サーバ１００は、新規投入ジョブのトピック分布に含まれる各トピックと、実行が終了しているジョブのトピック分布に含まれる各トピックとの類似度を算出し、トピック間の類似度に基づいて、ジョブ分布間の類似度を算出してもよい。例えば管理サーバ１００は、比較対象のトピック分布それぞれに含まれるトピック間の類似度の合計を、トピック分布の類似度とする。 The management server 100 calculates the similarity between each topic included in the topic distribution of the newly submitted job and each topic included in the topic distribution of the job whose execution has been completed, and based on the similarity between the topics. , The degree of similarity between job distributions may be calculated. For example, the management server 100 uses the total degree of similarity between topics included in each of the topic distributions to be compared as the degree of similarity of the topic distribution.

管理サーバ１００は、トピック間の類似度Ｓ_kk'を、例えばベクトル空間法で計測することができる。ベクトル空間法は、語彙空間Ｖにおけるトピックごとの語彙の出現頻度ベクトルの余弦で定義される。ｋ番目のトピックとｋ’番目のトピック間の類似度を式で表すと、以下の式で表される。 The management server 100 _{can measure the similarity S kk'between} topics, for example, by the vector space method. The vector space method is defined by the cosine of the frequency of occurrence vector of the vocabulary for each topic in the vocabulary space V. The similarity between the k-th topic and the k'th topic is expressed by the following formula.

ｎ_kは、ｋ番目のトピックの出現頻度ベクトルである。ｎ_k'は、ｋ’番目のトピックの出現頻度ベクトルｎ_kである。
このように、ＬＤＡ推定モデルを用いて各ジョブのトピック分布を計算し、トピック分布間の類似度によって、ジョブの類似度を算出することができる。管理サーバ１００は、すでに実行が終了しているジョブのうち、新規投入ジョブとの類似度に基づいて所定数の類似ジョブを特定する。そして管理サーバ１００は、所定数の類似ジョブのうち、新規投入ジョブとジョブ名、ユーザ名およびグループ名が一致する類似ジョブの実行時間に基づいて、新規投入ジョブの実行時間が予測できる可能性が高いか否かを判定する。ここで、ジョブ名、ユーザ名およびグループ名が同じジョブには、様々な作成段階のジョブが含まれることがある。 n _k is the frequency vector of the kth topic. n _k 'is, k' is the appearance frequency vector n _k of th topic.
In this way, the topic distribution of each job can be calculated using the LDA estimation model, and the job similarity can be calculated by the similarity between the topic distributions. The management server 100 identifies a predetermined number of similar jobs based on the degree of similarity with the newly submitted job among the jobs whose execution has already been completed. Then, the management server 100 may be able to predict the execution time of the newly submitted job based on the execution time of the newly submitted job and the similar job whose job name, user name, and group name match among a predetermined number of similar jobs. Determine if it is high or not. Here, a job having the same job name, user name, and group name may include jobs at various creation stages.

図８は、ジョブ作成過程の一例を示す図である。ジョブ９１，９２，９３は、それぞれジョブ名、ユーザ名およびグループ名が同一のジョブである。ジョブ９１は、例えばスクリプトが動作するか否かを確認するためのジョブである。ジョブ９１は、使用する計算ノード数が１つであると設定されている。またジョブ９１は、使用するデータが極めて小さい。またジョブ９１は、要求実行時間が極めて短く設定されている。ジョブ９２は、例えば複数の計算ノードを使った動作ができるか否かを確認するためのジョブである。ジョブ９２は、使用する計算ノード数が複数であると設定されている。またジョブ９２は、使用するデータが小さい。またジョブ９２は、要求実行時間が短く設定されている。ジョブ９３は、例えば完成したスクリプトを実行するためのジョブである。ジョブ９３は、使用する計算ノード数が多量であると設定されている。またジョブ９３は、使用するデータが大きい。またジョブ９３は、要求実行時間が長く設定されている。 FIG. 8 is a diagram showing an example of a job creation process. Jobs 91, 92, and 93 are jobs having the same job name, user name, and group name, respectively. The job 91 is, for example, a job for confirming whether or not the script operates. Job 91 is set to use one compute node. Further, the data used by the job 91 is extremely small. Further, the job 91 is set to have an extremely short request execution time. The job 92 is, for example, a job for confirming whether or not an operation using a plurality of calculation nodes can be performed. The job 92 is set to use a plurality of calculation nodes. Further, the data used by the job 92 is small. Further, the job 92 is set to have a short request execution time. Job 93 is, for example, a job for executing a completed script. Job 93 is set to use a large number of calculation nodes. Further, the job 93 uses a large amount of data. Further, the job 93 has a long request execution time.

このようにジョブ名、ユーザ名およびグループ名が同一のジョブであっても作成段階によって実行するスクリプトの規模が異なることがある。ここで新規投入ジョブで実行されるスクリプトと規模が異なるスクリプトを実行するジョブは、新規投入ジョブの実行時間の予測に役立たないことがある。例えば完成したスクリプトを実行する新規投入ジョブの実行時間の予測に、一部の処理をテストするためのスクリプトを実行するジョブの実行時間を使用することは有効ではない。 In this way, even if the job has the same job name, user name, and group name, the scale of the script to be executed may differ depending on the creation stage. Here, a job that executes a script whose scale is different from the script executed by the newly submitted job may not be useful for predicting the execution time of the newly submitted job. For example, it is not effective to use the execution time of the job that executes the script to test some processing to predict the execution time of the newly submitted job that executes the completed script.

そこで管理サーバ１００は、新規投入ジョブとジョブ名、ユーザ名およびグループ名が一致する類似ジョブに、ジョブの規模を示すパラメータの値が異なるジョブが含まれていなければ、新規投入ジョブの実行時間を予測できる可能性が高いと判定する。また管理サーバ１００は、新規投入ジョブとジョブ名、ユーザ名およびグループ名が一致する類似ジョブに、ジョブの規模を示すパラメータが異なるジョブが含まれていれば、新規投入ジョブの実行時間を予測できる可能性が高くないと判定する。 Therefore, the management server 100 determines the execution time of the newly submitted job unless the newly submitted job and the similar job having the same job name, user name, and group name include a job having a different value of the parameter indicating the scale of the job. Judge that it is highly likely to be predictable. Further, the management server 100 can predict the execution time of the newly submitted job if the newly submitted job and the similar job having the same job name, user name, and group name include a job having a different parameter indicating the scale of the job. Judge that the possibility is not high.

なおジョブの規模を示すパラメータとしては、例えば要求実行時間や使用計算ノード数が含まれる。またジョブの規模を示すパラメータとしては、例えば要求実行時間や使用する計算ノード数に応じたジョブの区分を示すキューが含まれる。第２の実施の形態では、ジョブの規模を示すパラメータは、ジョブステイタス情報に示されたジョブ名、ユーザ名、グループ名および実際の実行時間を除くすべてのパラメータとする。 The parameters indicating the scale of the job include, for example, the request execution time and the number of calculation nodes used. Further, the parameter indicating the scale of the job includes, for example, a queue indicating the classification of the job according to the request execution time and the number of calculation nodes to be used. In the second embodiment, the parameters indicating the scale of the job are all parameters except the job name, the user name, the group name, and the actual execution time shown in the job status information.

以下、推定実行時間によるジョブスケジューリング方法について詳細に説明する。
図９は、ジョブスケジューリングのための各装置の機能を示すブロック図である。ＨＰＣ運用管理サーバ２００は、ＤＢ２１０、タイマ部２２０、情報取得部２３０、ジョブスケジューリング部２４０、および制御指示部２５０を有する。 Hereinafter, the job scheduling method based on the estimated execution time will be described in detail.
FIG. 9 is a block diagram showing the functions of each device for job scheduling. The HPC operation management server 200 includes a DB 210, a timer unit 220, an information acquisition unit 230, a job scheduling unit 240, and a control instruction unit 250.

ＤＢ２１０は、実行するジョブのステイタスを示すジョブステイタス情報を記憶する。タイマ部２２０は、ＨＰＣシステム３０からジョブごとのジョブステイタス情報を収集するタイミングを管理する。例えばタイマ部２２０は、一定の時間間隔で、ジョブステイタス情報の収集を情報取得部２３０に指示する。 The DB 210 stores job status information indicating the status of the job to be executed. The timer unit 220 manages the timing of collecting job status information for each job from the HPC system 30. For example, the timer unit 220 instructs the information acquisition unit 230 to collect job status information at regular time intervals.

情報取得部２３０は、タイマ部２２０からの指示に応じて、ＨＰＣシステム３０から、ジョブステイタス情報を取得する。情報取得部２３０は、取得したジョブステイタス情報を、ＤＢ２１０に格納する。 The information acquisition unit 230 acquires job status information from the HPC system 30 in response to an instruction from the timer unit 220. The information acquisition unit 230 stores the acquired job status information in the DB 210.

ジョブスケジューリング部２４０は、新規投入ジョブの推定実行時間と使用計算ノード数に基づいて、新規投入ジョブが実行される所定の期間を決定することで（新規投入ジョブを所定の期間に割り当てて）実行スケジュールを作成する。例えばジョブスケジューリング部２４０は、割当候補期間において、他のジョブの合計使用計算ノード数に新規投入ジョブの使用計算ノード数を足しても計算ノード３１，３２，・・・の総数以下の場合、新規投入ジョブを割当候補期間に割り当てる。制御指示部２５０は、ジョブスケジューリング部２４０によって作成された実行スケジュールに従って、ＨＰＣシステム３０にジョブの実行を指示する。 The job scheduling unit 240 executes by determining a predetermined period during which the newly submitted job is executed (assigning the newly submitted job to the predetermined period) based on the estimated execution time of the newly submitted job and the number of calculation nodes used. Create a schedule. For example, in the allocation candidate period, the job scheduling unit 240 is new if the total number of calculation nodes 31, 32, ... Allocate the submitted job to the allocation candidate period. The control instruction unit 250 instructs the HPC system 30 to execute the job according to the execution schedule created by the job scheduling unit 240.

管理サーバ１００は、ＤＢ１１０、タイマ部１２０、メトリクス収集部１３０、ＬＤＡ学習部１４０、類似ジョブ特定部１５０、過去運用分析部１６０および予測結果送信部１７０を有する。 The management server 100 has a DB 110, a timer unit 120, a metric collection unit 130, an LDA learning unit 140, a similar job identification unit 150, a past operation analysis unit 160, and a prediction result transmission unit 170.

ＤＢ１１０は、新規投入ジョブごとの推定実行時間の決定に使用する情報を記憶する。タイマ部１２０は、一定の時間間隔で、ＨＰＣ運用管理サーバ２００からの情報収集を、メトリクス収集部１３０に指示する。 The DB 110 stores information used for determining the estimated execution time for each newly submitted job. The timer unit 120 instructs the metric collection unit 130 to collect information from the HPC operation management server 200 at regular time intervals.

メトリクス収集部１３０は、タイマ部１２０の指示に応じて、ＨＰＣ運用管理サーバ２００からジョブステイタス情報を取得する。メトリクス収集部１３０は、取得したジョブステイタス情報をＤＢ１１０に格納する。 The metric collection unit 130 acquires job status information from the HPC operation management server 200 in response to an instruction from the timer unit 120. The metric collection unit 130 stores the acquired job status information in the DB 110.

ＬＤＡ学習部１４０は、ジョブステイタス情報に基づいて、ＬＤＡ推定モデルを生成する。例えばＬＤＡ学習部１４０は、複数のジョブのジョブ情報に含まれる単語を解析し、トピックごとのグループに単語を分類する。ＬＤＡ学習部１４０は、学習結果をＤＢ１１０に格納する。 The LDA learning unit 140 generates an LDA estimation model based on the job status information. For example, the LDA learning unit 140 analyzes words included in job information of a plurality of jobs and classifies the words into groups for each topic. The LDA learning unit 140 stores the learning result in the DB 110.

類似ジョブ特定部１５０は、ＬＤＡ推定モデルに基づいて、新規投入ジョブとジョブステイタス情報が類似する、すでに実行が終了したジョブ（類似ジョブ）を特定する。類似ジョブ特定部１５０は、例えば類似ジョブを登録したリストをＤＢ１１０に格納する。 The similar job identification unit 150 identifies a job (similar job) whose execution has already been completed and whose job status information is similar to that of the newly input job based on the LDA estimation model. The similar job identification unit 150 stores, for example, a list in which similar jobs are registered in the DB 110.

過去運用分析部１６０は、新規投入ジョブの推定実行時間を決定する。例えば過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名（ジョブ名等）が新規投入ジョブと一致する類似ジョブがあるか否かを判定する。過去運用分析部１６０は、ジョブ名等が新規投入ジョブと一致する類似ジョブがある場合、ジョブ名等が新規投入ジョブと一致する類似ジョブそれぞれの、実際の実行時間を除くすべてのパラメータが一致するか否かを判定する。 The past operation analysis unit 160 determines the estimated execution time of the newly submitted job. For example, the past operation analysis unit 160 determines whether or not there is a similar job whose job name, user name, and group name (job name, etc.) match the newly submitted job. When there is a similar job whose job name etc. matches the newly submitted job, the past operation analysis unit 160 matches all the parameters of each similar job whose job name etc. matches the newly submitted job except the actual execution time. Judge whether or not.

そして過去運用分析部１６０は、ジョブ名等が新規投入ジョブと一致する類似ジョブそれぞれの、実際の実行時間を除くすべてのパラメータが一致する場合、ジョブ名等が新規投入ジョブと一致する類似ジョブの実際の実行時間に基づいて推定実行時間を決定する。例えば過去運用分析部１６０は、ジョブ名等が新規投入ジョブと一致する類似ジョブの最大実行時間を新規投入ジョブの推定実行時間に決定する。また例えば過去運用分析部１６０は、ジョブ名等が新規投入ジョブと一致する類似ジョブの実行時間の範囲が所定の範囲内の場合、ジョブ名等が新規投入ジョブと一致する類似ジョブの実行時間の平均を新規投入ジョブの推定実行時間に決定する。 Then, in the past operation analysis unit 160, when all the parameters other than the actual execution time of each similar job whose job name and the like match the newly submitted job match, the job name and the like of the similar job whose job name and the like match the newly submitted job. Determine the estimated execution time based on the actual execution time. For example, the past operation analysis unit 160 determines the maximum execution time of a similar job whose job name or the like matches the newly submitted job as the estimated execution time of the newly submitted job. Further, for example, in the past operation analysis unit 160, when the range of the execution time of a similar job whose job name or the like matches the newly submitted job is within a predetermined range, the execution time of the similar job whose job name or the like matches the newly submitted job Determine the average as the estimated execution time of the newly submitted job.

また過去運用分析部１６０は、ジョブ名等が新規投入ジョブと一致する類似ジョブがない場合または、ジョブ名等が新規投入ジョブと一致する類似ジョブの一部のパラメータが異なる場合、新規投入ジョブの要求実行時間を新規投入ジョブの推定実行時間に決定する。予測結果送信部１７０は、新規投入ジョブの推定実行時間を、ＨＰＣ運用管理サーバ２００に送信する。 If there is no similar job whose job name or the like matches the newly submitted job, or when some parameters of the similar job whose job name or the like matches the newly submitted job are different, the past operation analysis unit 160 determines the newly submitted job. Determine the request execution time as the estimated execution time of the newly submitted job. The prediction result transmission unit 170 transmits the estimated execution time of the newly submitted job to the HPC operation management server 200.

なお、図９に示した各要素間を接続する線は通信経路の一部を示すものであり、図示した通信経路以外の通信経路も設定可能である。また、図９に示した各要素の機能は、例えば、その要素に対応するプログラムモジュールをコンピュータに実行させることで実現することができる。 The line connecting each element shown in FIG. 9 indicates a part of the communication path, and a communication path other than the shown communication path can be set. Further, the function of each element shown in FIG. 9 can be realized, for example, by causing a computer to execute a program module corresponding to the element.

図１０は、管理サーバのＤＢに格納される情報の一例を示す図である。図１０の例では、ＤＢ１１０には、ジョブ情報１１１、学習結果情報１１２、および類似ジョブ情報１１３が格納されている。ジョブ情報１１１は、ジョブごとのジョブステイタス情報である。学習結果情報１１２は、ＬＤＡによる学習結果を示す情報である。類似ジョブ情報１１３は、新規投入ジョブに類似するジョブを示す情報である。 FIG. 10 is a diagram showing an example of information stored in the DB of the management server. In the example of FIG. 10, the job information 111, the learning result information 112, and the similar job information 113 are stored in the DB 110. The job information 111 is job status information for each job. The learning result information 112 is information indicating a learning result by LDA. The similar job information 113 is information indicating a job similar to a newly submitted job.

図１１は、ジョブ情報の一例を示す図である。ジョブ情報１１１には、例えばジョブごとのジョブステイタス情報１１１ａ，１１１ｂ，・・・が含まれている。ジョブステイタス情報１１１ａ，１１１ｂ，・・・には、ジョブＩＤ、ジョブ名、ジョブの実行を要求しているユーザのユーザ名、該当ユーザが属するグループのグループ名など、ジョブの実行に関連するパラメータが含まれる。 FIG. 11 is a diagram showing an example of job information. The job information 111 includes, for example, job status information 111a, 111b, ... For each job. The job status information 111a, 111b, ... Contains parameters related to job execution, such as job ID, job name, user name of the user requesting job execution, and group name of the group to which the user belongs. included.

またジョブステイタス情報１１１ａ，１１１ｂ，・・・のパラメータには、ジョブが開始してから実行が打ち切られるまでの要求実行時間やジョブの実行に対する使用計算ノード数が含まれる。またジョブステイタス情報１１１ａ，１１１ｂ，・・・のパラメータには、要求実行時間や使用する計算ノード数に応じたジョブの区分を示すキューが含まれる。また実行済みのジョブのジョブステイタス情報のパラメータには、該当ジョブの実行時間が含まれる。 Further, the parameters of the job status information 111a, 111b, ... Include the request execution time from the start of the job to the termination of execution and the number of calculation nodes used for job execution. Further, the parameters of the job status information 111a, 111b, ... Include a queue indicating job classification according to the request execution time and the number of calculation nodes to be used. In addition, the execution time of the corresponding job is included in the parameter of the job status information of the executed job.

図１２は、学習結果情報の一例を示す図である。学習結果情報１１２は、ＬＤＡ推定モデルによる学習結果である。学習結果情報１１２には、トピックを示すトピック番号に対応付けて、そのトピックに属する単語が登録されている。 FIG. 12 is a diagram showing an example of learning result information. The learning result information 112 is a learning result by the LDA estimation model. In the learning result information 112, words belonging to the topic are registered in association with the topic number indicating the topic.

図１３は、類似ジョブ情報の一例を示す図である。例えば類似ジョブ情報１１３には、学習結果情報１１２に基づいて判定された、新規投入ジョブそれぞれに対する類似ジョブリスト１１３ａ，１１３ｂ，・・・が含まれる。類似ジョブリスト１１３ａ，１１３ｂ，・・・には、ＬＤＡ推定モデルの学習結果情報１１２に基づいて判定された、実行前のジョブに類似する所定数のジョブのジョブＩＤが示される。 FIG. 13 is a diagram showing an example of similar job information. For example, the similar job information 113 includes similar job lists 113a, 113b, ... For each newly input job determined based on the learning result information 112. The similar job lists 113a, 113b, ... Show job IDs of a predetermined number of jobs similar to the jobs before execution, which are determined based on the learning result information 112 of the LDA estimation model.

次に実行時間推定処理の手順について説明する。
図１４は、実行時間推定処理の手順の一例を示すフローチャートである。以下、図１４に示す処理をステップ番号に沿って説明する。 Next, the procedure of the execution time estimation process will be described.
FIG. 14 is a flowchart showing an example of the procedure of the execution time estimation process. Hereinafter, the process shown in FIG. 14 will be described along with the step numbers.

［ステップＳ１０１］タイマ部１２０は、前回の実行時間推定処理の実行からの経過時間を計測し、所定時間が経過した場合、ジョブステイタス情報の取得をメトリクス収集部１３０に指示する。メトリクス収集部１３０は、ＨＰＣ運用管理サーバ２００からジョブステイタス情報を収集する。メトリクス収集部１３０は、取得した情報をＤＢ１１０に格納する。 [Step S101] The timer unit 120 measures the elapsed time from the execution of the previous execution time estimation process, and when the predetermined time has elapsed, instructs the metric collection unit 130 to acquire the job status information. The metric collection unit 130 collects job status information from the HPC operation management server 200. The metric collection unit 130 stores the acquired information in the DB 110.

［ステップＳ１０２］ＬＤＡ学習部１４０と類似ジョブ特定部１５０が連携して、推定実行時間を決定する対象の新規投入ジョブ（以下では、単に新規投入ジョブという）に類似する所定数の（例えば、１０個の）類似ジョブを抽出する。抽出されたジョブは、新規投入ジョブに対応する類似ジョブリスト（例えば、類似ジョブリスト１１３ａ）にジョブＩＤが登録される。類似ジョブ抽出処理の詳細については後述する（図１５参照）。 [Step S102] The LDA learning unit 140 and the similar job identification unit 150 cooperate with each other to determine a predetermined number of newly input jobs (hereinafter, simply referred to as new input jobs) for determining the estimated execution time (for example, 10). Extract similar jobs (of). For the extracted job, the job ID is registered in the similar job list (for example, the similar job list 113a) corresponding to the newly submitted job. Details of the similar job extraction process will be described later (see FIG. 15).

［ステップＳ１０３］過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名（ジョブ名等）が新規投入ジョブと一致する類似ジョブがあるか否かを判定する。例えば過去運用分析部１６０は、ジョブステイタス情報１１１ａ，１１１ｂ，・・・のうち、新規投入ジョブのジョブステイタス情報と類似ジョブリスト１１３ａに登録された類似ジョブそれぞれのジョブステイタス情報とを参照する。そして過去運用分析部１６０は、新規投入ジョブのジョブステイタス情報に示されるジョブ名、ユーザ名およびグループ名と類似ジョブそれぞれのジョブステイタス情報に示されるジョブ名、ユーザ名およびグループ名とが一致するか否かを判定する。 [Step S103] The past operation analysis unit 160 determines whether or not there is a similar job whose job name, user name, and group name (job name, etc.) match the newly submitted job. For example, the past operation analysis unit 160 refers to the job status information of the newly input job and the job status information of each similar job registered in the similar job list 113a among the job status information 111a, 111b, .... Then, in the past operation analysis unit 160, does the job name, user name, and group name shown in the job status information of the newly submitted job match the job name, user name, and group name shown in the job status information of each similar job? Judge whether or not.

過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブがあると判定した場合、処理をステップＳ１０４に進める。また過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブがないと判定した場合、処理をステップＳ１０８に進める。 When the past operation analysis unit 160 determines that there is a similar job whose job name, user name, and group name match the newly submitted job, the process proceeds to step S104. If the past operation analysis unit 160 determines that there is no similar job whose job name, user name, and group name match the newly submitted job, the process proceeds to step S108.

［ステップＳ１０４］過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致するすべての類似ジョブで他のパラメータも一致するか否かを判定する。例えば過去運用分析部１６０は、新規投入ジョブのジョブステイタス情報と、ステップＳ１０３でジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致すると判定された類似ジョブそれぞれのジョブステイタス情報とを参照する。そして過去運用分析部１６０は、参照したすべてのジョブステイタス情報で、実際の実行時間を除くすべてのパラメータが一致する場合、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブは、他のパラメータも一致すると判定する。 [Step S104] The past operation analysis unit 160 determines whether or not other parameters match in all similar jobs whose job name, user name, and group name match the newly submitted job. For example, the past operation analysis unit 160 refers to the job status information of the newly submitted job and the job status information of each similar job determined in step S103 that the job name, the user name, and the group name match the newly submitted job. Then, in the past operation analysis unit 160, if all the parameters except the actual execution time match in all the referenced job status information, the similar job whose job name, user name and group name match with the newly submitted job is It is determined that the other parameters also match.

過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致するすべての類似ジョブで他のパラメータも一致すると判定した場合、処理をステップＳ１０５に進める。また過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブで他のパラメータが一致しないジョブがあると判定した場合、処理をステップＳ１０８に進める。 If the past operation analysis unit 160 determines that all similar jobs whose job name, user name, and group name match the newly submitted job also match other parameters, the process proceeds to step S105. If the past operation analysis unit 160 determines that there is a similar job whose job name, user name, and group name match the newly submitted job but the other parameters do not match, the process proceeds to step S108.

［ステップＳ１０５］過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブの実行時間の範囲は１０分以内であるか否かを判定する。例えば過去運用分析部１６０は、ステップＳ１０３でジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致すると判定された類似ジョブそれぞれのジョブステイタス情報に示される実際の実行時間を参照する。そして過去運用分析部１６０は、参照した実際の実行時間のうちの最大の時間と最小の時間との差が１０分以内であれば、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブの実行時間の範囲は１０分以内であると判定する。 [Step S105] The past operation analysis unit 160 determines whether or not the execution time range of a similar job whose job name, user name, and group name match the newly submitted job is within 10 minutes. For example, the past operation analysis unit 160 refers to the actual execution time shown in the job status information of each similar job determined in step S103 that the job name, the user name, and the group name match the newly submitted job. Then, if the difference between the maximum time and the minimum time of the referenced actual execution time is within 10 minutes, the past operation analysis unit 160 matches the job name, user name, and group name with the newly submitted job. It is determined that the execution time range of similar jobs is within 10 minutes.

過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブの実行時間の範囲は１０分以内であると判定した場合、処理をステップＳ１０６に進める。過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブの実行時間の範囲は１０分以内ではないと判定した場合、処理をステップＳ１０７に進める。 When the past operation analysis unit 160 determines that the execution time range of a similar job whose job name, user name, and group name match the newly submitted job is within 10 minutes, the process proceeds to step S106. If the past operation analysis unit 160 determines that the execution time range of a similar job whose job name, user name, and group name match the newly submitted job is not within 10 minutes, the process proceeds to step S107.

［ステップＳ１０６］過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブの平均実行時間を新規投入ジョブの推定実行時間に決定する。例えば過去運用分析部１６０は、ステップＳ１０３でジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致すると判定された類似ジョブそれぞれのジョブステイタス情報に示される実際の実行時間の平均を新規投入ジョブの推定実行時間に決定する。そして、処理が終了する。 [Step S106] The past operation analysis unit 160 determines the average execution time of a similar job whose job name, user name, and group name match the newly submitted job as the estimated execution time of the newly submitted job. For example, the past operation analysis unit 160 calculates the average of the actual execution times shown in the job status information of each similar job determined in step S103 that the job name, user name, and group name match the newly submitted job for the newly submitted job. Determined by the estimated execution time. Then, the process ends.

［ステップＳ１０７］過去運用分析部１６０は、ジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致する類似ジョブの最大実行時間を新規投入ジョブの推定実行時間に決定する。例えば過去運用分析部１６０は、ステップＳ１０３でジョブ名、ユーザ名およびグループ名が新規投入ジョブと一致すると判定された類似ジョブそれぞれのジョブステイタス情報に示される実際の実行時間を参照する。そして過去運用分析部１６０は、参照した実際の実行時間のうちの最大の時間を新規投入ジョブの推定実行時間に決定する。そして、処理が終了する。 [Step S107] The past operation analysis unit 160 determines the maximum execution time of a similar job whose job name, user name, and group name match the newly submitted job as the estimated execution time of the newly submitted job. For example, the past operation analysis unit 160 refers to the actual execution time shown in the job status information of each similar job determined in step S103 that the job name, the user name, and the group name match the newly submitted job. Then, the past operation analysis unit 160 determines the maximum time of the referenced actual execution time as the estimated execution time of the newly submitted job. Then, the process ends.

［ステップＳ１０８］過去運用分析部１６０は、要求実行時間を新規投入ジョブの推定実行時間に決定する。例えば過去運用分析部１６０は、新規投入ジョブのジョブステイタ
ス情報に示される要求実行時間を新規投入ジョブの推定実行時間に決定する。 [Step S108] The past operation analysis unit 160 determines the request execution time as the estimated execution time of the newly submitted job. For example, the past operation analysis unit 160 determines the request execution time shown in the job status information of the newly submitted job as the estimated execution time of the newly submitted job.

このように過去運用分析部１６０は、新規投入ジョブとジョブ名、ユーザ名およびグループ名（ジョブ名等）が一致する類似ジョブがあるか否かを判定する。新規投入ジョブとジョブ名等が一致する類似ジョブがなければ、新規投入ジョブの要求実行時間を新規投入ジョブの推定実行時間に決定する。これにより過去運用分析部１６０は、新規投入ジョブの実行時間を予測することが難しい場合、新規投入ジョブの推定実行時間を新規投入ジョブが実行された場合の実際の実行時間よりも長くする。 In this way, the past operation analysis unit 160 determines whether or not there is a similar job in which the newly submitted job and the job name, the user name, and the group name (job name, etc.) match. If there is no similar job whose job name matches the newly submitted job, the request execution time of the newly submitted job is determined as the estimated execution time of the newly submitted job. As a result, when it is difficult to predict the execution time of the newly submitted job, the past operation analysis unit 160 makes the estimated execution time of the newly submitted job longer than the actual execution time when the newly submitted job is executed.

なお新規投入ジョブの推定実行時間が実際の実行時間より短いと、ＨＰＣシステム３０は新規投入ジョブの次に実行を予定していたジョブを予定していた時刻に実行できないことがある。すると予定時刻に実行できなかったジョブがスケジューリングされていたことで、本来実行できたジョブの実行が後回しにされることがある。過去運用分析部１６０は、新規投入ジョブの実行時間を予測することが難しい場合でも、新規投入ジョブの推定実行時間を実際の実行時間よりも長くすることで、推定実行時間が実際の実行時間より短いことによるＨＰＣシステム３０の使用効率の低下を抑止できる。 If the estimated execution time of the newly submitted job is shorter than the actual execution time, the HPC system 30 may not be able to execute the job scheduled to be executed next to the newly submitted job at the scheduled time. Then, because the job that could not be executed at the scheduled time was scheduled, the execution of the job that could be originally executed may be postponed. Even if it is difficult to predict the execution time of a newly submitted job, the past operation analysis unit 160 sets the estimated execution time of the newly submitted job longer than the actual execution time, so that the estimated execution time is longer than the actual execution time. It is possible to prevent a decrease in the usage efficiency of the HPC system 30 due to the short time.

また過去運用分析部１６０は、新規投入ジョブとジョブ名等が一致する類似ジョブがあれば、ジョブ名等が一致する類似ジョブの実際の実行時間に基づいて、新規投入ジョブの推定実行時間を決定する。これにより過去運用分析部１６０は、実際の実行時間と推定実行時間が近くなるようにし、計算ノードがジョブを実行していない時間が短くなるようにする。よって、ＨＰＣシステム３０の使用効率を向上させることができる。 Further, if there is a similar job whose job name and the like match with the newly submitted job, the past operation analysis unit 160 determines the estimated execution time of the newly submitted job based on the actual execution time of the similar job whose job name and the like match. do. As a result, the past operation analysis unit 160 makes the actual execution time and the estimated execution time close to each other, and shortens the time when the calculation node does not execute the job. Therefore, the usage efficiency of the HPC system 30 can be improved.

また過去運用分析部１６０は、新規投入ジョブとジョブ名等が一致する類似ジョブすべてについて、実際の実行時間を除くすべてのパラメータが一致するか否かを判定する。そして、実際の実行時間を除くすべてのパラメータが一致する場合、過去運用分析部１６０は、ジョブ名等が一致する類似ジョブの実際の実行時間に基づいて、新規投入ジョブの推定実行時間を決定する。また、新規投入ジョブとジョブ名等が一致する類似ジョブの中で、実際の実行時間以外のパラメータが一致しないジョブがある場合、過去運用分析部１６０は、新規投入ジョブの要求実行時間を新規投入ジョブの推定実行時間に決定する。 Further, the past operation analysis unit 160 determines whether or not all the parameters other than the actual execution time match for all the similar jobs whose job names and the like match with the newly submitted job. Then, when all the parameters except the actual execution time match, the past operation analysis unit 160 determines the estimated execution time of the newly submitted job based on the actual execution time of similar jobs having the same job name and the like. .. In addition, if there is a similar job whose parameters other than the actual execution time do not match among similar jobs whose job names match the newly submitted job, the past operation analysis unit 160 newly inputs the requested execution time of the newly submitted job. Determined by the estimated execution time of the job.

つまり過去運用分析部１６０は、新規投入ジョブとジョブ名等が一致する類似ジョブに、新規投入ジョブと作成段階が異なるジョブが含まれている場合、新規投入ジョブの実行時間の予測が困難と判定する。これにより、推定実行時間が実際の実行時間より短くなることをさらに抑止できる。 That is, the past operation analysis unit 160 determines that it is difficult to predict the execution time of the newly input job when the newly input job and the similar job whose job name etc. match include a job whose creation stage is different from that of the newly input job. do. As a result, it is possible to further prevent the estimated execution time from becoming shorter than the actual execution time.

また過去運用分析部１６０は、新規投入ジョブとジョブ名等が一致する類似ジョブそれぞれの実際の実行時間のうちの最大実行時間に基づいて、新規投入ジョブの推定実行時間を決定する。これにより、新規投入ジョブとジョブ名等が一致する類似ジョブそれぞれの実際の実行時間に基づいて、推定実行時間が実際の実行時間より長くなるようにできる。 Further, the past operation analysis unit 160 determines the estimated execution time of the newly input job based on the maximum execution time of the actual execution times of the newly input job and the similar job whose job name and the like match. As a result, the estimated execution time can be made longer than the actual execution time based on the actual execution time of each of the newly submitted job and the similar job whose job name and the like match.

次に類似ジョブ抽出処理の手順について説明する。
図１５は、類似ジョブ抽出処理の手順の一例を示すフローチャートである。以下、図１５に示す処理をステップ番号に沿って説明する。 Next, the procedure for similar job extraction processing will be described.
FIG. 15 is a flowchart showing an example of the procedure of the similar job extraction process. Hereinafter, the process shown in FIG. 15 will be described along with the step numbers.

［ステップＳ１１１］ＬＤＡ学習部１４０は、すべてのジョブのジョブステイタス情報内の出現単語を抽出する。
［ステップＳ１１２］ＬＤＡ学習部１４０は、ＬＤＡ推定モデルを用いて単語をトピックに分類する。すなわちＬＤＡ学習部１４０は、前述の式（６）を用いて、共通のジョブステイタス情報に出現する確率の高い単語同士を同じグループにグルーピングし、生成されたグループをトピックとする。ＬＤＡ学習部１４０は、生成したトピックと、各トピックに属する単語のリストとを、学習結果情報１１２としてＤＢ１１０に格納する。 [Step S111] The LDA learning unit 140 extracts the appearing words in the job status information of all jobs.
[Step S112] The LDA learning unit 140 classifies words into topics using the LDA estimation model. That is, the LDA learning unit 140 uses the above equation (6) to group words having a high probability of appearing in common job status information into the same group, and the generated group is used as a topic. The LDA learning unit 140 stores the generated topic and a list of words belonging to each topic in the DB 110 as learning result information 112.

［ステップＳ１１３］類似ジョブ特定部１５０は、ＬＤＡ推定モデルによる学習結果に基づいて、すべてのジョブについて、該当ジョブのジョブステイタス情報に含まれるトピック分布を算出する。 [Step S113] The similar job identification unit 150 calculates the topic distribution included in the job status information of the corresponding job for all jobs based on the learning result by the LDA estimation model.

［ステップＳ１１４］類似ジョブ特定部１５０は、ＬＤＡ推定モデルによる学習結果から得られたトピック分布に基づいて、新規投入ジョブのトピック分布と、既に実行が終了している他のジョブのトピック分布との類似度を計算する。 [Step S114] The similar job identification unit 150 has a topic distribution of a newly input job and a topic distribution of another job whose execution has already been completed, based on the topic distribution obtained from the learning result by the LDA estimation model. Calculate the similarity.

［ステップＳ１１５］類似ジョブ特定部１５０は、ステップＳ１１４で算出された類似度が上位１０個のジョブのジョブＩＤを、類似ジョブリスト１１３ａに登録する。そして類似ジョブ特定部１５０は、類似ジョブリスト１１３ａをＤＢ１１０に格納する。 [Step S115] The similar job identification unit 150 registers the job IDs of the jobs with the highest similarity calculated in step S114 in the similar job list 113a. Then, the similar job identification unit 150 stores the similar job list 113a in the DB 110.

このようにして新規投入ジョブとジョブステイタス情報が類似する類似ジョブを抽出できる。類似ジョブを用いることで、新規投入ジョブの実行時間が精度良く予測できる。
次にジョブスケジューリング処理の手順について説明する。 In this way, similar jobs with similar job status information can be extracted from newly submitted jobs. By using similar jobs, the execution time of newly submitted jobs can be predicted accurately.
Next, the procedure of job scheduling processing will be described.

図１６は、ジョブスケジューリング処理の手順の一例を示すフローチャートである。以下、図１６に示す処理をステップ番号に沿って説明する。
［ステップＳ１２１］ジョブスケジューリング部２４０は、ジョブの実行状況をロードする。例えばジョブスケジューリング部２４０は、ＤＢ１１０に記憶されたＨＰＣ３０がＨＰＣシステム３０のスケジュール（例えばスケジュール５０）に割当前のジョブのジョブステイタス情報を管理サーバ１００から取得する。またジョブスケジューリング部２４０は、管理サーバ１００からスケジュール５０に割当前のジョブの推定実行時間を取得する。 FIG. 16 is a flowchart showing an example of the procedure of the job scheduling process. Hereinafter, the process shown in FIG. 16 will be described along with the step numbers.
[Step S121] The job scheduling unit 240 loads the job execution status. For example, the job scheduling unit 240 acquires the job status information of the job before the HPC 30 stored in the DB 110 is assigned to the schedule of the HPC system 30 (for example, the schedule 50) from the management server 100. Further, the job scheduling unit 240 acquires the estimated execution time of the job before allocation from the management server 100 to the schedule 50.

［ステップＳ１２２］ジョブスケジューリング部２４０は、スケジュール５０に割当前のジョブの優先度を決定する。例えばジョブスケジューリング部２４０は、スケジュール５０に割当前のジョブそれぞれについて、投入された時刻が早いほど優先度が高くなるように優先度を決定する。 [Step S122] The job scheduling unit 240 determines the priority of the job before being assigned to the schedule 50. For example, the job scheduling unit 240 determines the priority of each job before being assigned to the schedule 50 so that the earlier the input time is, the higher the priority is.

［ステップＳ１２３］ジョブスケジューリング部２４０は、変数Ｘに初期値「１」を設定する。
［ステップＳ１２４］ジョブスケジューリング部２４０は、変数Ｎに初期値「０」を設定する。 [Step S123] The job scheduling unit 240 sets the initial value "1" in the variable X.
[Step S124] The job scheduling unit 240 sets the initial value "0" in the variable N.

［ステップＳ１２５］ジョブスケジューリング部２４０は、優先度ＸのジョブをＮ番目の区間から割当可能であるか否かを判定する。例えばジョブスケジューリング部２４０は、Ｎ番目の区間の始めから優先度Ｘのジョブの推定実行時間経過した時点が含まれる区間を特定する。ジョブスケジューリング部２４０は、スケジュール５０を参照し、Ｎ番目の区間から特定した区間までの期間（割当候補期間）に含まれる各区間それぞれで、合計使用計算ノード数を特定する。ジョブスケジューリング部２４０は、割当候補期間に含まれる各区間で、合計使用計算ノード数に優先度Ｘのジョブの使用計算ノード数を足した値が、計算ノード３１，３２，・・・の総数以下の場合、優先度ＸのジョブをＮ番目の区間から割当可能であると判定する。 [Step S125] The job scheduling unit 240 determines whether or not the job of priority X can be assigned from the Nth section. For example, the job scheduling unit 240 specifies a section including a time point when the estimated execution time of the job of priority X has elapsed from the beginning of the Nth section. The job scheduling unit 240 refers to the schedule 50 and specifies the total number of calculation nodes used in each section included in the period (allocation candidate period) from the Nth section to the specified section. In each section included in the allocation candidate period, the job scheduling unit 240 has a value obtained by adding the number of used calculation nodes of the job of priority X to the total number of used calculation nodes, which is equal to or less than the total number of calculation nodes 31, 32, ... In the case of, it is determined that the job of priority X can be assigned from the Nth interval.

ジョブスケジューリング部２４０は、優先度ＸのジョブをＮ番目の区間から割当可能であると判定した場合、処理をステップＳ１２８に進める。またジョブスケジューリング部２４０は、優先度ＸのジョブをＮ番目の区間から割当不可能であると判定した場合、処理をステップＳ１２６に進める。 When the job scheduling unit 240 determines that the job of priority X can be assigned from the Nth section, the process proceeds to step S128. Further, when the job scheduling unit 240 determines that the job of priority X cannot be assigned from the Nth section, the process proceeds to step S126.

［ステップＳ１２６］ジョブスケジューリング部２４０は、Ｎ番目の区間がスケジューリング期間の最後の区間であるか否かを判定する。ジョブスケジューリング部２４０は、Ｎ番目の区間がスケジューリング期間の最後の区間であると判定した場合、処理をステップＳ１２９に進める。またジョブスケジューリング部２４０は、Ｎ番目の区間がスケジューリング期間の最後の区間ではないと判定した場合、処理をステップＳ１２７に進める。 [Step S126] The job scheduling unit 240 determines whether or not the Nth section is the last section of the scheduling period. When the job scheduling unit 240 determines that the Nth section is the last section of the scheduling period, the job scheduling unit 240 advances the process to step S129. If the job scheduling unit 240 determines that the Nth section is not the last section of the scheduling period, the job scheduling unit 240 advances the process to step S127.

［ステップＳ１２７］ジョブスケジューリング部２４０は、変数Ｎに１を加算する（Ｎ＝Ｎ＋１）。そして処理がステップＳ１２５に進む。
［ステップＳ１２８］ジョブスケジューリング部２４０は、優先度ＸのジョブをＮ番目の区間から割り当てる。例えばジョブスケジューリング部２４０は、優先度ＸのジョブをステップＳ１２５で特定した割当候補期間に割り当てるようスケジュール５０を更新する。 [Step S127] The job scheduling unit 240 adds 1 to the variable N (N = N + 1). Then, the process proceeds to step S125.
[Step S128] The job scheduling unit 240 allocates a job having priority X from the Nth interval. For example, the job scheduling unit 240 updates the schedule 50 so that the job of priority X is assigned to the allocation candidate period specified in step S125.

［ステップＳ１２９］ジョブスケジューリング部２４０は、優先度Ｘのジョブが最後のジョブであるか否かを判定する。ジョブスケジューリング部２４０は、優先度Ｘのジョブが最後のジョブであると判定した場合、処理を終了する。またジョブスケジューリング部２４０は、優先度Ｘのジョブが最後のジョブではないと判定した場合、処理をステップＳ１３０に進める。 [Step S129] The job scheduling unit 240 determines whether or not the job of priority X is the last job. When the job scheduling unit 240 determines that the job of priority X is the last job, the job scheduling unit 240 ends the process. If the job scheduling unit 240 determines that the job of priority X is not the last job, the job scheduling unit 240 advances the process to step S130.

［ステップＳ１３０］ジョブスケジューリング部２４０は、変数Ｘに１を加算する（Ｘ＝Ｘ＋１）。そして処理がステップＳ１２４に進む。
このようにして、推定実行時間を用いたジョブスケジューリングが行われる。そしてジョブスケジューリングの結果であるスケジュール５０に従って、ＨＰＣシステム３０は、効率よくジョブを実行することができる。 [Step S130] The job scheduling unit 240 adds 1 to the variable X (X = X + 1). Then, the process proceeds to step S124.
In this way, job scheduling using the estimated execution time is performed. Then, the HPC system 30 can efficiently execute the job according to the schedule 50 which is the result of the job scheduling.

〔その他の実施の形態〕
第２の実施の形態では、ＨＰＣ運用管理サーバ２００は、スケジュール期間を複数の区間に分割し、各ジョブの終了時刻をスケジュール区間単位に切り上げてスケジューリングしているが、ジョブの終了予定時刻の直後に次のジョブをスケジューリングしてもよい。つまりＨＰＣ運用管理サーバ２００は、図４に示すジョブ間の間隔を詰めてスケジューリングしてもよい。 [Other embodiments]
In the second embodiment, the HPC operation management server 200 divides the schedule period into a plurality of sections and rounds up the end time of each job to each schedule section for scheduling, but immediately after the scheduled end time of the job. The next job may be scheduled to. That is, the HPC operation management server 200 may be scheduled with a close interval between jobs shown in FIG.

また第２の実施の形態では、新規投入ジョブとジョブ名、ユーザ名およびグループ名が一致する類似ジョブが特定されたが、管理サーバ１００は、ジョブを実行するシステムで設定できるパラメータに応じた他のパラメータが一致する類似ジョブを特定してもよい。また第２の実施の形態では、ジョブの規模を示すパラメータは、ジョブ名、ユーザ名、グループ名および実行時間を除くすべてのパラメータであったが、ジョブを実行するシステムで設定できるパラメータに応じた他の所定のパラメータであってもよい。 Further, in the second embodiment, a similar job whose job name, user name, and group name match with the newly submitted job is specified, but the management server 100 is different from the parameters that can be set in the system that executes the job. You may identify similar jobs that match the parameters of. Further, in the second embodiment, the parameters indicating the scale of the job are all parameters except the job name, the user name, the group name and the execution time, but the parameters can be set in the system for executing the job. It may be another predetermined parameter.

以上、実施の形態を例示したが、実施の形態で示した各部の構成は同様の機能を有する他のものに置換することができる。また、他の任意の構成物や工程が付加されてもよい。さらに、前述した実施の形態のうちの任意の２以上の構成（特徴）を組み合わせたものであってもよい。 Although the embodiment has been illustrated above, the configuration of each part shown in the embodiment can be replaced with another having the same function. Further, any other components or processes may be added. Further, any two or more configurations (features) of the above-described embodiments may be combined.

１ＨＰＣシステム
２第１ジョブ
３ａ，３ｂ，・・・第２ジョブ
４ａ，４ｂ，・・・第３ジョブ
１０情報処理装置
１１記憶部
１１ａ，１１ｂ，・・・ジョブ情報
１２処理部 1 HPC system 2 1st job 3a, 3b, ... 2nd job 4a, 4b, ... 3rd job 10 Information processing device 11 Storage unit 11a, 11b, ... Job information 12 Processing unit

Claims

コンピュータに、
新規に投入される第１ジョブおよび実行が終了した複数の第２ジョブそれぞれのパラメータを含む、前記第１ジョブおよび前記複数の第２ジョブそれぞれのジョブ情報に基づいて、前記第１ジョブの第１パラメータの値と前記複数の第２ジョブそれぞれの前記第１パラメータと同種の第２パラメータの値とが一致するか否かを判定し、
前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致するジョブが含まれていない場合、前記第１ジョブが開始してから実行が打ち切られるまでの時間を前記第１ジョブがスケジューリングされる際の推定実行時間に決定し、
前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致する一以上の第３ジョブが含まれる場合、前記一以上の第３ジョブそれぞれの過去に実行された際の実行時間に基づいて、前記推定実行時間を決定する、
処理を実行させるジョブスケジューリングプログラム。 On the computer
The first of the first jobs is based on the job information of each of the first job and the plurality of second jobs, including the parameters of the newly submitted first job and the plurality of second jobs whose execution has been completed. It is determined whether or not the value of the parameter matches the value of the first parameter of each of the plurality of second jobs and the value of the second parameter of the same type.
When the plurality of second jobs do not include a job whose value of the second parameter matches the value of the first parameter, the time from the start of the first job to the termination of execution is described above. Determined by the estimated execution time when the first job is scheduled,
When the plurality of second jobs include one or more third jobs whose values of the second parameter match the values of the first parameter, when each of the one or more third jobs has been executed in the past. Determines the estimated execution time based on the execution time of
A job scheduling program that executes processing.

前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致する前記一以上の第３ジョブが含まれる場合の、前記推定実行時間の決定では、前記一以上の第３ジョブそれぞれの過去に実行された際の実行時間のうちの最大実行時間に基づいて、前記推定実行時間を決定する、
請求項１記載のジョブスケジューリングプログラム。 In the determination of the estimated execution time when the plurality of second jobs include the one or more third jobs whose values of the second parameter match the values of the first parameter, the one or more first jobs are used. The estimated execution time is determined based on the maximum execution time of the execution times when each of the three jobs has been executed in the past.
The job scheduling program according to claim 1.

前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致する前記一以上の第３ジョブが含まれる場合の、前記推定実行時間の決定では、前記一以上の第３ジョブすべての前記第１パラメータと別種の第３パラメータの値が同じである場合、前記一以上の第３ジョブそれぞれの過去に実行された際の実行時間に基づいて、前記推定実行時間を決定し、
前記一以上の第３ジョブすべての前記第３パラメータの値が同じでない場合、前記第１ジョブが開始してから実行が打ち切られるまでの時間を前記推定実行時間に決定する、
請求項１または２記載のジョブスケジューリングプログラム。 In the determination of the estimated execution time when the plurality of second jobs include the one or more third jobs whose values of the second parameter match the values of the first parameter, the one or more first jobs are used. When the values of the first parameter of all three jobs and the third parameter of another type are the same, the estimated execution time is determined based on the execution time of each of the one or more third jobs in the past. death,
If the values of the third parameter are not the same for all of the one or more third jobs, the time from the start of the first job to the termination of execution is determined as the estimated execution time.
The job scheduling program according to claim 1 or 2.

前記第１パラメータは複数であり、複数の前記第１パラメータには、ジョブの名称を示すパラメータ、ジョブを投入したユーザの名称を示すパラメータおよび前記ユーザが所属するグループを示すパラメータが含まれる、
請求項１ないし３のいずれかに記載のジョブスケジューリングプログラム。 The first parameter is plural, and the plurality of first parameters include a parameter indicating the name of the job, a parameter indicating the name of the user who submitted the job, and a parameter indicating the group to which the user belongs.
The job scheduling program according to any one of claims 1 to 3.

前記コンピュータにさらに、前記第１ジョブおよび実行が終了した複数の第４ジョブそれぞれのジョブ情報と所定の計算式を用いて計算される前記第１ジョブと前記複数の第４ジョブそれぞれとの類似度に基づいて、前記複数の第４ジョブのなかから前記第１ジョブとジョブ情報が類似する前記複数の第２ジョブを特定させる、
請求項１ないし４のいずれかに記載のジョブスケジューリングプログラム。 The degree of similarity between the first job and each of the plurality of fourth jobs calculated by using the job information of each of the first job and the plurality of fourth jobs whose execution has been completed and a predetermined formula on the computer. Based on the above, the plurality of second jobs having similar job information to the first job are identified from the plurality of fourth jobs.
The job scheduling program according to any one of claims 1 to 4.

新規に投入される第１ジョブおよび実行が終了した複数の第２ジョブそれぞれのパラメータを含む、前記第１ジョブおよび前記複数の第２ジョブそれぞれのジョブ情報を記憶する記憶部と、
前記第１ジョブおよび前記複数の第２ジョブそれぞれのジョブ情報に基づいて、前記第１ジョブの第１パラメータの値と前記複数の第２ジョブそれぞれの前記第１パラメータと同種の第２パラメータの値とが一致するか否かを判定し、前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致するジョブが含まれていない場合、前記第１ジョブが開始してから実行が打ち切られるまでの時間を前記第１ジョブがスケジューリングされる際の推定実行時間に決定し、前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致する一以上の第３ジョブが含まれる場合、前記一以上の第３ジョブそれぞれの過去に実行された際の実行時間に基づいて、前記推定実行時間を決定する処理部と、
を有する情報処理装置。 A storage unit for storing job information of the first job and the plurality of second jobs, including parameters of the newly input first job and the plurality of second jobs whose execution has been completed.
Based on the job information of the first job and the plurality of second jobs, the value of the first parameter of the first job and the value of the second parameter of the same type as the first parameter of each of the plurality of second jobs. If the plurality of second jobs do not include a job whose value of the second parameter matches the value of the first parameter, the first job is started. The time from when the execution is terminated is determined as the estimated execution time when the first job is scheduled, and the value of the second parameter matches the value of the first parameter for the plurality of second jobs. When one or more third jobs are included, a processing unit that determines the estimated execution time based on the execution time of each of the one or more third jobs in the past, and a processing unit.
Information processing device with.

コンピュータが、
新規に投入される第１ジョブおよび実行が終了した複数の第２ジョブそれぞれのパラメータを含む、前記第１ジョブおよび前記複数の第２ジョブそれぞれのジョブ情報に基づいて、前記第１ジョブの第１パラメータの値と前記複数の第２ジョブそれぞれの前記第１パラメータと同種の第２パラメータの値とが一致するか否かを判定し、
前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致するジョブが含まれていない場合、前記第１ジョブが開始してから実行が打ち切られるまでの時間を前記第１ジョブがスケジューリングされる際の推定実行時間に決定し、
前記複数の第２ジョブに、前記第２パラメータの値が前記第１パラメータの値と一致する一以上の第３ジョブが含まれる場合、前記一以上の第３ジョブそれぞれの過去に実行された際の実行時間に基づいて、前記推定実行時間を決定する、
ジョブスケジューリング方法。 The computer
The first of the first jobs is based on the job information of each of the first job and the plurality of second jobs, including the parameters of the newly submitted first job and the plurality of second jobs whose execution has been completed. It is determined whether or not the value of the parameter matches the value of the first parameter of each of the plurality of second jobs and the value of the second parameter of the same type.
When the plurality of second jobs do not include a job whose value of the second parameter matches the value of the first parameter, the time from the start of the first job to the termination of execution is described above. Determined by the estimated execution time when the first job is scheduled,
When the plurality of second jobs include one or more third jobs whose values of the second parameter match the values of the first parameter, when each of the one or more third jobs has been executed in the past. Determines the estimated execution time based on the execution time of
Job scheduling method.