JP2013140490A

JP2013140490A - Parallel computing controller and parallel computing control method

Info

Publication number: JP2013140490A
Application number: JP2012000287A
Authority: JP
Inventors: Takahiro Yamazaki; 隆浩山崎
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-01-04
Filing date: 2012-01-04
Publication date: 2013-07-18

Abstract

【課題】本発明の課題は、パラメータ空間を探索するマルチジョブシステムにおいて探索時間を短縮することを目的とする。
【解決手段】上記課題は、複数のノードを用いた並列計算によって実行されるプログラムを記憶した記憶部と、前記記憶部から読み込んだ前記プログラムを複数のジョブで実行し、一定経過時間毎に各ジョブの計算結果を解析して、解析結果から得られる各ジョブの残り実行時間に基づいて全ジョブの終了が揃うように各ジョブの並列度を調整することによって、前記並列計算の全体のジョブスケジュールを行う並列度調整部を有し、該並列度調整部によって得られた該ジョブスケジュールに基づいて該並列度を変更する制御部とを有する並列計算制御装置により達成される。
【選択図】図１An object of the present invention is to reduce search time in a multi-job system that searches a parameter space.
The above-described problem is solved by executing a storage unit storing a program executed by parallel computation using a plurality of nodes, and executing the program read from the storage unit in a plurality of jobs, and for each predetermined elapsed time. By analyzing the job calculation results and adjusting the parallel degree of each job so that the completion of all jobs is completed based on the remaining execution time of each job obtained from the analysis results, the overall job schedule of the parallel calculation This is achieved by a parallel computing control device that includes a parallelism adjusting unit that performs the above and a control unit that changes the parallelism based on the job schedule obtained by the parallelism adjusting unit.
[Selection] Figure 1

Description

本発明は、ネットワークに接続された複数のプロセッサを用いた並列計算によってプログラムの実行を制御する並列計算制御装置及び並列計算制御方法に関する。 The present invention relates to a parallel calculation control apparatus and a parallel calculation control method for controlling program execution by parallel calculation using a plurality of processors connected to a network.

ネットワークに接続された複数のプロセッサを用いた並列計算機システムによって、材料の最適組成などのパラメータ探索等が行われている。パラメータ探索を行った際には、ある領域のパラメータでは他のものに比べて桁違いに計算量が大きくなり、それにより全体の探索完了までの時間を長くしてしまうということが起こる。 A parallel computer system using a plurality of processors connected to a network searches for parameters such as the optimum composition of materials. When a parameter search is performed, the amount of calculation for a parameter in a certain region is orders of magnitude larger than that for other parameters, thereby increasing the time until the entire search is completed.

そのため、現行プロセッサ数と前回プロセッサ数との単位時間当たりの処理量を表す現行処理能率の差が許容値以上の場合に、大小関係に基づいて、次回プロセッサ数を増減させることによって、処理時間を最小にするプロセッサ数に到達し、並列プログラムの処理時間を短縮すること、パラメータを最適化する際に、最適値算出の対象とされるパラメータの上限値及び下限値を境界とする探索空間内にそれらのパラメータを要素とする固体を複数生成させて、それらの固体を１つずつ探索空間の内側に向かうように方向性を持たせて変位させるように挟み撃ち探索を行うこと等が提案されている。 Therefore, if the difference in the current processing efficiency that represents the processing amount per unit time between the current number of processors and the previous number of processors is greater than or equal to the allowable value, the processing time can be reduced by increasing or decreasing the number of next processors based on the magnitude relationship. Reach the minimum number of processors, shorten the parallel program processing time, and optimize the parameters within the search space that is bounded by the upper and lower limits It has been proposed to generate a plurality of solids having these parameters as elements, and perform a pinch-and-shoot search so that the solids are displaced one by one with directivity toward the inside of the search space one by one. .

特開平０８−２４９２９４号公報JP 08-249294 A 特開２００７−２３３６７６号公報JP 2007-233676 A

上述した従来技術より、並列計算を行う際には並列度を上げれば、計算が完了するまでの時間を短縮することができると考えられる。しかしながら、複数のパラメータによるパラメータ空間を探索するマルチジョブシステムでは、パラメータ毎に処理量が異なり、各処理量は計算してみないと分からないと言った問題があった。 From the prior art described above, it is considered that the time until the calculation is completed can be shortened by increasing the degree of parallelism when performing the parallel calculation. However, in a multi-job system that searches a parameter space with a plurality of parameters, there is a problem that the processing amount differs for each parameter, and each processing amount cannot be known unless it is calculated.

並列計算によってパラメータ空間内の複数のパラメータを同時に処理する際には、他のパラメータに対する処理時間よりも掛かることが分かれば、そのジョブの並列度を上げて、全体のスループットを上げることが考えられる。しかし、数百に及ぶ規模のジョブを人が調整するには、並列計算に係る知識のみならず、計算対象に関する専門的な知識を要する。 When processing multiple parameters in the parameter space simultaneously by parallel computation, if you know that it takes longer than the processing time for other parameters, you can increase the parallelism of the job and increase the overall throughput. . However, in order for a person to adjust a job with a scale of several hundreds, not only knowledge related to parallel computation but also specialized knowledge related to a calculation target is required.

よって、本発明の目的は、パラメータ空間を探索するマルチジョブによる並列計算において探索時間を短縮する並列計算制御装置及び並列計算制御方法を提供することである。 Therefore, an object of the present invention is to provide a parallel calculation control device and a parallel calculation control method that reduce the search time in multi-job parallel calculation for searching a parameter space.

開示の技術は、複数のノードを用いた並列計算によって実行されるプログラムを記憶した記憶部と、前記記憶部から読み込んだ前記プログラムを複数のジョブで実行し、一定経過時間毎に各ジョブの計算結果を解析して、解析結果から得られる各ジョブの残り実行時間に基づいて全ジョブの終了が揃うように各ジョブの並列度を調整することによって、前記並列計算の全体のジョブスケジュールを行う並列度調整部を有し、該並列度調整部によって得られた該ジョブスケジュールに基づいて該並列度を変更する制御部とを有することを特徴とする並列計算制御装置のように構成される。 The disclosed technology includes a storage unit that stores a program executed by parallel calculation using a plurality of nodes, and executes the program read from the storage unit in a plurality of jobs, and calculates each job at a certain elapsed time. Parallel processing that analyzes the result and adjusts the parallel degree of each job so that the end of all jobs is completed based on the remaining execution time of each job obtained from the analysis result, thereby performing the entire job schedule of the parallel calculation And a controller that changes the degree of parallelism based on the job schedule obtained by the degree-of-parallelism adjustment unit.

また、上記課題を解決するための手段として、コンピュータに上記並列計算制御装置として機能させるためのプログラム、そのプログラムを記録した記録媒体、及び並列計算制御方法とすることもできる。 Further, as means for solving the above problems, a program for causing a computer to function as the parallel calculation control apparatus, a recording medium recording the program, and a parallel calculation control method may be used.

開示の技術では、複数のノードを用いた並列計算においてプログラムを複数のジョブで実行し、一定経過時間毎に全ジョブの終了が揃うように各ジョブの並列度を調整するため、他ノードのジョブの終了待ちを削減することができ、並列計算のスループットを改善することができる。 In the disclosed technology, in parallel computation using a plurality of nodes, the program is executed by a plurality of jobs, and the parallelism of each job is adjusted so that the completion of all jobs is completed at a certain elapsed time. Can be reduced, and the throughput of parallel computation can be improved.

本実施例に係る並列計算システムの構成例を示す図である。It is a figure which shows the structural example of the parallel computing system which concerns on a present Example. 並列計算部の構成を示す図である。It is a figure which shows the structure of a parallel calculation part. フロントエンド計算装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a front end calculation apparatus. 並列度調整処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of a parallel degree adjustment process. 制御部による全体処理の概要を説明するためのフローチャート図である。It is a flowchart figure for demonstrating the outline | summary of the whole process by a control part. 図５の処理Ｐ２０を説明するためのフローチャート図である。It is a flowchart figure for demonstrating the process P20 of FIG. １順目を含む処理における計算例を説明するための図である。It is a figure for demonstrating the example of a calculation in the process including the 1st order. ２順目における計算例を説明するための図である。It is a figure for demonstrating the example of a calculation in 2nd order. ３順目における計算例を説明するための図である。It is a figure for demonstrating the example of a calculation in the 3rd order. ４順目における計算例を説明するための図である。It is a figure for demonstrating the example of a calculation in 4th order. ５順目における計算例を説明するための図である。It is a figure for demonstrating the example of a calculation in 5th order. ジョブ順の並べ替えた結果例を示す図である。It is a figure which shows the example of a result which rearranged the job order. 結晶化過程の一例を示す図である。It is a figure which shows an example of a crystallization process. 結晶領域の体積の見積もり方法の一例を示す図である。It is a figure which shows an example of the estimation method of the volume of a crystal region. 組成と結晶化し易さとの関係の検証方法の一例を示す図である。It is a figure which shows an example of the verification method of the relationship between a composition and crystallization easiness.

以下、本発明の実施の形態を図面に基づいて説明する。図１は、本実施例に係る並列計算システムの構成例を示す図である。図１に示す並列計算システム１０００では、フロントエンド計算装置１００と、複数のノード２０（図２）を有する並列計算部２００とが、ネットワーク６を介して接続され、組成構造等に係るパラメータ空間における所望の構造状態に関するパラメータの最適値を探索するマルチジョブシステムを実現する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a configuration example of a parallel computing system according to the present embodiment. In a parallel computing system 1000 shown in FIG. 1, a front-end computing device 100 and a parallel computing unit 200 having a plurality of nodes 20 (FIG. 2) are connected via a network 6, and in a parameter space related to a composition structure or the like. A multi-job system that searches for optimum values of parameters related to a desired structural state is realized.

並列計算システム１０００では、例えば、アモルファス材料の結晶化を抑制するために不純物を混ぜる検証において、その組成を調整するための分子動力学法シミュレーションが行われてもよい。また、別の例として、熱電材料の探索のため、３元系材料の組成をふって、理想的な電子状態（バンド構造）および格子振動特性をもたらすものを、第一原理計算手法を使って探索するシミュレーションを行ってもよい。 In the parallel computing system 1000, for example, in the verification of mixing impurities in order to suppress crystallization of an amorphous material, a molecular dynamics simulation for adjusting the composition may be performed. As another example, using a first-principles calculation method, the ideal electronic state (band structure) and lattice vibration characteristics can be obtained by using the composition of a ternary material for the search for thermoelectric materials. You may perform the simulation to search.

図１において、フロントエンド計算装置１００は、コンピュータ装置（図３）であり、ネットワークを介して接続される並列計算部２００に対して、複数のパラメータを用いたシミュレーションを行わせる制御部１２０と、キューイングシステム１９０とを有する。フロントエンド計算装置１００は、並列計算制御装置に相当する。 In FIG. 1, a front-end computing device 100 is a computer device (FIG. 3), and a control unit 120 that causes a parallel computing unit 200 connected via a network to perform a simulation using a plurality of parameters; A queuing system 190. The front end computing device 100 corresponds to a parallel computing control device.

制御部１２０は、フロントエンド計算装置１００全体を制御する処理部であり、図３に示すＣＰＵ１１による処理によって実現される。制御部１２０は、複数のパラメータを用いたシミュレーションにおいて並列処理全体が短くなるように、各パラメータに係るジョブの並列度を動的に変更する並列度調整部１２４を有する。 The control unit 120 is a processing unit that controls the entire front-end computing device 100, and is realized by the processing by the CPU 11 shown in FIG. The control unit 120 includes a parallelism adjustment unit 124 that dynamically changes the parallelism of jobs related to each parameter so that the entire parallel processing is shortened in a simulation using a plurality of parameters.

並列度とは、同時に実行される１以上のパラメータに対して、各パラメータを指定したジョブを並列実行するためのノード２０（図２）の数である。並列処理全体を短くするために、各ノード２０の空き時間を少なくし、全ノード２０で略同時に処理が完了するように並列度が調整される。即ち、終了までの残り時間を見積もることによって、各パラメータに係る処理（ジョブ）の投入と、投入した各処理（ジョブ）を並列に行うためのノード２０の数とが、動的に決定される。 The degree of parallelism is the number of nodes 20 (FIG. 2) for executing in parallel a job specifying each parameter with respect to one or more parameters executed simultaneously. In order to shorten the entire parallel processing, the degree of parallelism is adjusted so that the idle time of each node 20 is reduced and the processing is completed almost simultaneously in all the nodes 20. That is, by estimating the remaining time until the end, the input of processes (jobs) related to each parameter and the number of nodes 20 for executing the input processes (jobs) in parallel are dynamically determined. .

また、制御部１２０は、並列度調整部１２４によって動的に調整された並列度に従って、並列計算部２００のノード２０に指定されたパラメータに係る処理を行わせるキューイングシステム１９０を有する。 In addition, the control unit 120 includes a queuing system 190 that causes the node 20 of the parallel computing unit 200 to perform processing related to the specified parameter according to the degree of parallelism dynamically adjusted by the parallelism adjusting unit 124.

記憶部１３０には、プログラムＡ３１、プログラムＢ３２、・・・と、プログラムＡ３１、プログラムＢ３２、・・・に対応する解析プログラムＡ'４１、解析プログラムＢ'４２、・・・とが格納される。プログラムＡ３１、プログラムＢ３２、・・・とは、並列計算部２００に実行させるプログラムである。解析プログラムＡ'４１は、プログラムＡ３１によるシミュレーション終了後に結果を解析するためのプログラムであり、制御部１２０によって実行される。解析プログラムＢ'４２、・・・についても同様に、制御部１２０によって実行される。 The storage unit 130 stores programs A31, B32,..., And analysis programs A′41, B′42,... Corresponding to the programs A31, B32,. Program A31, program B32,... Are programs that are executed by the parallel computing unit 200. The analysis program A′41 is a program for analyzing the result after the simulation by the program A31 is completed, and is executed by the control unit 120. Similarly, the analysis program B′42,... Is executed by the control unit 120.

更に、記憶部１３０には、入力データファイル５１と、出力データファイル５３とが格納される。入力データファイル５１は、ユーザによって入力された、各組成の構造データ、パラメータセット等である。入力データファイル５１で指定されるパラメータセット内の各パラメータに関して、並列度調整部１２４によって並列度が調整される。出力データファイル５３は、計算結果を含み、ジョブ毎に並列計算部２００から受信する経過時間等を含む。並列度調整部１２４は、出力データファイル５３を参照して、ジョブ終了毎にひとつのパラメータに対する計算が完遂するのに要する時間を見積もる。ジョブ毎の結果に基づいて、継続して計算する判断がなされた場合、出力データファイル５３が入力され次のジョブを行うために使用される。 Further, the storage unit 130 stores an input data file 51 and an output data file 53. The input data file 51 is structural data of each composition, parameter set, and the like input by the user. The parallel degree is adjusted by the parallel degree adjusting unit 124 for each parameter in the parameter set specified by the input data file 51. The output data file 53 includes a calculation result and includes an elapsed time received from the parallel calculation unit 200 for each job. The parallelism adjustment unit 124 refers to the output data file 53 and estimates the time required for completing the calculation for one parameter every time the job is completed. If it is determined to continue calculation based on the result for each job, the output data file 53 is input and used to perform the next job.

図２は、並列計算部の構成を示す図である。図２に例示する並列計算部２００は、複数のノード２０を有する。図２において、各ノード２０は、プロセッサ５、記憶装置２２、通信装置等を有するコンピュータ装置であり、フロントエンド計算装置１００の制御部１２０の制御配下で、並列計算処理を実行する。 FIG. 2 is a diagram illustrating a configuration of the parallel calculation unit. The parallel computing unit 200 illustrated in FIG. 2 has a plurality of nodes 20. In FIG. 2, each node 20 is a computer device having a processor 5, a storage device 22, a communication device, and the like, and executes parallel computing processing under the control of the control unit 120 of the front-end computing device 100.

プロセッサ５は、フロントエンド計算装置１００から転送されるプログラムＡ３１、又はプログラムＢ３２等をジョブ毎に実行する。実行中のデータ、実行結果は、記憶装置２２に格納される。ジョブ終了毎に実行結果がフロントエンド計算装置１００へ転送される。 The processor 5 executes the program A31 or the program B32 transferred from the front-end computing device 100 for each job. The data being executed and the execution result are stored in the storage device 22. The execution result is transferred to the front-end computer 100 every time the job ends.

記憶装置２２は、ノード２０の主記憶装置１２（図３）、補助記憶装置１３（図３）等を有してもよい。また、記憶装置２２は、更に、ノード２０がアクセス可能な外部記憶装置を含んでもよい。ここでは、プロセッサ５が並列処理を行うために必要なデータを格納する領域を含む装置として、総括的に記憶装置２２として示す。 The storage device 22 may include the main storage device 12 (FIG. 3) and the auxiliary storage device 13 (FIG. 3) of the node 20. The storage device 22 may further include an external storage device accessible by the node 20. Here, the storage device 22 is generally shown as a device including an area for storing data necessary for the processor 5 to perform parallel processing.

並列計算部２００に含まれるノード２０の台数は、数台、数１０台〜数１０，０００台等、特に、規定するものではないが、以下の説明では、本実施例の簡潔に説明するために、ノード数を１０台とする。 Although the number of nodes 20 included in the parallel computing unit 200 is not particularly specified, such as several, several tens to several 10,000, etc., in the following description, the present embodiment will be briefly described. In addition, the number of nodes is ten.

図３は、フロントエンド計算装置のハードウェア構成を示す図である。図３に示すフロントエンド計算装置１００は、コンピュータによって制御される装置であり、ＣＰＵ（Central Processing Unit）１１と、主記憶装置１２と、補助記憶装置１３と、表示装置１４と、入力装置１５と、通信Ｉ／Ｆ１６と、ドライブ１８と、記憶媒体１９とを有する。 FIG. 3 is a diagram illustrating a hardware configuration of the front-end computing device. 3 is a device controlled by a computer, and includes a CPU (Central Processing Unit) 11, a main storage device 12, an auxiliary storage device 13, a display device 14, and an input device 15. , Communication I / F 16, drive 18, and storage medium 19.

ＣＰＵ１１は、主記憶装置１２に格納されたプログラムに従ってフロントエンド計算装置１００を制御する。主記憶装置１２には、ＲＡＭ（Random Access Memory）等が用いられ、ＣＰＵ１１にて実行されるプログラム、ＣＰＵ１１での処理に必要なデータ、ＣＰＵ１１での処理にて得られたデータ等を格納する。また、主記憶装置１２の一部の領域が、ＣＰＵ１１での処理に利用されるワークエリアとして割り付けられる。 The CPU 11 controls the front-end computing device 100 according to a program stored in the main storage device 12. The main storage device 12 uses a RAM (Random Access Memory) or the like, and stores a program executed by the CPU 11, data necessary for processing by the CPU 11, data obtained by processing by the CPU 11, and the like. In addition, a partial area of the main storage device 12 is allocated as a work area used for processing by the CPU 11.

補助記憶装置１３には、例えば、ハードディスクユニットが用いられ、各種処理を実行するプログラム（プログラムＡ３１、プログラムＢ３２、・・・、解析プログラムＡ'４１、解析プログラムＢ'４２、・・・等を含む）と、入力データファイル５１と、出力データファイル５３とが格納される。 For example, a hard disk unit is used as the auxiliary storage device 13 and includes programs (program A31, program B32,..., Analysis program A′41, analysis program B′42,. ), An input data file 51, and an output data file 53 are stored.

表示装置１４は、ＣＰＵ１１の制御のもとに必要な各種情報を表示する。入力装置１５は、マウス、キーボード等を有し、ユーザがフロントエンド計算装置１００が処理を行うための必要な各種情報を入力する。 The display device 14 displays various information required under the control of the CPU 11. The input device 15 includes a mouse, a keyboard, and the like, and a user inputs various pieces of information necessary for the front-end computing device 100 to perform processing.

通信Ｉ／Ｆ１６は、例えば、インターネット、ＬＡＮ（Local Area Network）等に接続し、並列計算部２００内のノード２０との間の通信制御をするための装置である。通信Ｉ／Ｆ１６による通信は無線又は有線に限定されるものではない。 The communication I / F 16 is a device that is connected to, for example, the Internet, a LAN (Local Area Network), and the like and controls communication with the nodes 20 in the parallel computing unit 200. Communication by the communication I / F 16 is not limited to wireless or wired.

フロントエンド計算装置１００の制御部１２０の並列度調整部１２４によって行われる処理を実現するプログラムは、例えば、ＣＤ−ＲＯＭ（Compact Disc Read-Only）等の記憶媒体１９によってフロントエンド計算装置１００に提供される。即ち、プログラムが保存された記憶媒体１９がドライバ１８にセットされると、ドライバ１８が記憶媒体１９からプログラムを読み出し、その読み出されたプログラムがバスＢを介して補助記憶装置１３にインストールされる。そして、プログラムが起動されると、補助記憶装置１３にインストールされたプログラムに従ってＣＰＵ１１がその処理を開始する。尚、プログラムを格納する媒体としてＣＤ−ＲＯＭに限定するものではなく、コンピュータが読み取り可能な媒体であればよい。コンピュータ読取可能な記憶媒体として、ＣＤ−ＲＯＭの他に、ＤＶＤディスク、ＵＳＢメモリ等の可搬型記録媒体、フラッシュメモリ等の半導体メモリであっても良い。また、外部ネットワークを介してプログラムを補助記憶装置１３に転送しても良い。或いは、入力装置１５を使ってプログラムを作成し補助記憶装置１３に保存しても良い。 A program that realizes the processing performed by the parallelism adjusting unit 124 of the control unit 120 of the front-end computer 100 is provided to the front-end computer 100 via the storage medium 19 such as a CD-ROM (Compact Disc Read-Only). Is done. That is, when the storage medium 19 storing the program is set in the driver 18, the driver 18 reads the program from the storage medium 19, and the read program is installed in the auxiliary storage device 13 via the bus B. . When the program is activated, the CPU 11 starts its processing according to the program installed in the auxiliary storage device 13. The medium for storing the program is not limited to a CD-ROM, and any medium that can be read by a computer may be used. As a computer-readable storage medium, in addition to a CD-ROM, a portable recording medium such as a DVD disk or a USB memory, or a semiconductor memory such as a flash memory may be used. Further, the program may be transferred to the auxiliary storage device 13 via an external network. Alternatively, a program may be created using the input device 15 and stored in the auxiliary storage device 13.

また、主記憶装置１２、補助記憶装置１３、及び外部記憶装置で提供される一部又は全体の記憶領域が、本実施例に係るデータを格納する記憶部に相当する。 In addition, a partial or entire storage area provided by the main storage device 12, the auxiliary storage device 13, and the external storage device corresponds to a storage unit that stores data according to the present embodiment.

次に、並列度調整部１２４による並列度調整処理について説明する。図４は、並列度調整処理の概要を説明するための図である。図４に示す例では、ノード数「１０」とし、実行するパラメータ数「１０」で説明する。ノード数及びパラメータ数は、この例に限定されない。図４中、升目内の数字は、パラメータを指定する番号である。 Next, the parallelism adjustment processing by the parallelism adjustment unit 124 will be described. FIG. 4 is a diagram for explaining the outline of the parallel degree adjustment processing. In the example illustrated in FIG. 4, the number of nodes is “10”, and the number of parameters to be executed is “10”. The number of nodes and the number of parameters are not limited to this example. In FIG. 4, the numbers in the squares are numbers for specifying parameters.

簡単のために、ひとつのジョブの制限時間を１時間である並列計算システム１０００で、各ジョブ終了後に結果を解析し、その解析結果に応じてジョブの継続計算をするのか終了するのかを判断する場合で説明する。 For the sake of simplicity, in the parallel computing system 1000 in which the time limit for one job is one hour, the result is analyzed after each job is finished, and it is determined whether the job is continuously calculated or finished depending on the analysis result. The case will be explained.

並列度調整部１２４は、各ジョブ終了時に、ひとつのパラメータに対する計算が完遂するのに要する時間を見積もる。例えば、アモルファス材料の結晶化のシミュレーションの場合には、１時間のジョブ終了時に結晶化率が５％進んだとすると完全結晶化まで総２０時間要すると判断（予測）する。別の第一原理計算による構造緩和の場合には、原子に働く力の減少率から大雑把に総計算時間を見積もることができる。 The degree of parallelism adjustment unit 124 estimates the time required for completing the calculation for one parameter at the end of each job. For example, in the case of simulation of crystallization of an amorphous material, it is determined (predicted) that a total of 20 hours is required until complete crystallization if the crystallization rate advances by 5% at the end of an hour job. In the case of structural relaxation by another first-principles calculation, the total calculation time can be roughly estimated from the decreasing rate of the force acting on the atoms.

このようにして、各ジョブが終わった段階で、各パラメータごとの残り計算時間に基づいてジョブスケジュール４ａに例示されるように見積もられたとする。全体で１０ノードの並列計算システム１０００で、各ジョブは、最初１ノードで実行するとする。パラメータとジョブとの対応は、１対１とする。各ジョブは、ジョブ番号で識別される。ジョブ１はノード１で、ジョブ２はノード２で、ジョブ３はノード３で、・・・、ジョブ１０はノード１０で、ジョブの制限時間分（例えば、１時間）実行される。 In this way, when each job is finished, it is estimated that the job schedule 4a is estimated based on the remaining calculation time for each parameter. It is assumed that each job is initially executed on one node in the parallel computing system 1000 having a total of 10 nodes. There is a one-to-one correspondence between parameters and jobs. Each job is identified by a job number. Job 1 is node 1, job 2 is node 2, job 3 is node 3,..., Job 10 is executed on node 10 for the time limit of the job (for example, 1 hour).

並列度調整部１２４によって、ジョブ１及び２に対しては残り２０時間、ジョブ３に対しては残り１０時間、ジョブ４に対しては残り８時間、ジョブ５に対しては残り７時間、ジョブ６〜１０に対しては残り５時間必要であると予測する。 The degree of parallelism adjustment unit 124 has 20 hours remaining for jobs 1 and 2, 10 hours remaining for job 3, 8 hours remaining for job 4, and 7 hours remaining for job 5. For 6-10, the remaining 5 hours are expected.

このままでは、全体の計算に要する時間は、ジョブ１及び２に対する計算が完了するのに必要な２１時間と予測される。そのため、他ノード３から１０で空白部が生じてしまう。予測に基づいて、全処理が完遂するまでの時間において、各ノードに空白部が生じないように、処理時間の長いジョブに対して並列度を上げ、また、ジョブの順番を調整する。 If this is the case, the time required for the entire calculation is predicted to be 21 hours required for completing the calculations for jobs 1 and 2. For this reason, blank portions are generated in the other nodes 3 to 10. Based on the prediction, the parallelism is increased for a job with a long processing time and the order of the jobs is adjusted so that a blank portion does not occur in each node in the time until all processing is completed.

このように、並列度調整部１２４によって、例えば、ジョブスケジュール４ｂのように並列度が調整される。ジョブスケジュール４ｂは、ノード数×時間の面積へのタイリングで表される。並列度調整部１２４を含む制御部１２０による処理について、図５及び図６で説明する。 In this way, the degree of parallelism is adjusted by the parallel degree adjustment unit 124 as in the job schedule 4b, for example. The job schedule 4b is represented by tiling to the area of the number of nodes × time. Processing performed by the control unit 120 including the parallel degree adjustment unit 124 will be described with reference to FIGS. 5 and 6.

図５は、制御部による全体処理の概要を説明するためのフローチャート図である。図５において、制御部１２０は、シミュレーションを行うための前処理を行い（ステップＳ１１）、探索対象となるパラメータを設定する（ステップＳ１２）。前処理とは、プログラムＡ３１の実行を並列で行うための初期設定に係る処理を含む。探索対象となるパラメータは、ユーザによって設定される。又は、予め設定され記憶部１３０に格納されたパラメータセットを取得するようにしてもよい。 FIG. 5 is a flowchart for explaining an overview of the overall processing by the control unit. In FIG. 5, the control unit 120 performs preprocessing for performing simulation (step S11), and sets parameters to be searched (step S12). The preprocessing includes processing related to initial settings for executing the program A31 in parallel. The parameter to be searched is set by the user. Alternatively, a parameter set that is set in advance and stored in the storage unit 130 may be acquired.

制御部１２０は、入力データを作成して記憶部１３０に格納する（ステップＳ１３）。入力データと異なる初期設定とによって、ｎ個の入力データファイル５１が記憶部１３０に格納される。ｎ個の入力データファイル５１は、対応するｎ個のジョブで使用される。また、ｎ個のジョブは、設定されたパラメータのうちのｎ個のパラメータに相当するようにしても良い。 The control unit 120 creates input data and stores it in the storage unit 130 (step S13). The n input data files 51 are stored in the storage unit 130 according to initial settings different from the input data. The n input data files 51 are used in the corresponding n jobs. The n jobs may correspond to n parameters among the set parameters.

制御部１２０は、各ジョブを１並列で投入（試行）する（ステップＳ１４）。ジョブの試行は、例えば、ジョブの制限時間が１時間であれば、１時間の試行を各ジョブの１並列で行う。そして、制御部１２０は、並列度調整部１２４に、並列度調整処理を行わせる（ステップＳ１５）。制御部１２０は、並列度調整部１２４によって調整された並列度でｎ個のジョブを投入する（ステップＳ１６）。 The control unit 120 submits (trials) each job in parallel (step S14). For example, if the time limit of the job is 1 hour, the 1-hour trial is performed in parallel for each job. And the control part 120 makes the parallel degree adjustment part 124 perform a parallel degree adjustment process (step S15). The control unit 120 submits n jobs with the parallel degree adjusted by the parallel degree adjusting unit 124 (step S16).

制御部１２０は、ジョブの結果に基づいて、継続して計算するのか否かを判定する（ステップＳ１７）。継続して計算すると判定した場合、制御部１２０は、ステップＳ１５へ戻り、上述同様に、並列度調整部１２４に並列度調整処理を行わせる。一方、計算を継続しない、即ち、ｎ個のジョブが終了したと判定した場合、制御部１２０は、探索を終了したか否かを判定する（ステップＳ１８）。 The control unit 120 determines whether to continue calculation based on the job result (step S17). When it is determined that the calculation is to be continued, the control unit 120 returns to step S15 and causes the parallelism adjustment unit 124 to perform the parallelism adjustment process as described above. On the other hand, when it is determined that the calculation is not continued, that is, n jobs have been completed, the control unit 120 determines whether the search has been completed (step S18).

ステップＳ１８において、探索が終了していないと判定した場合、制御部１２０は、パラメータを変更して（ステップＳ１８−２）、ステップＳ１３へ戻り、上述した同様の処理を繰り返す。一方、探索が終了したと判定した場合、制御部１２０は、探索したパラメータの領域全体を解析して（ステップＳ１９）、このシミュレーションを終了する。ステップ１９では、制御部１２０は、シミュレーションしたプログラムに対応する解析プログラムを記憶部１３０から読み込んで実行する。 If it is determined in step S18 that the search has not ended, the control unit 120 changes the parameter (step S18-2), returns to step S13, and repeats the same processing described above. On the other hand, if it is determined that the search has been completed, the control unit 120 analyzes the entire area of the searched parameter (step S19), and ends this simulation. In step 19, the control unit 120 reads an analysis program corresponding to the simulated program from the storage unit 130 and executes it.

上述したステップＳ１３、Ｓ１４、及びＳ１５を含む処理Ｐ２０について、図６で詳述する。図６は、図５の処理Ｐ２０を説明するためのフローチャート図である。図６において、制御部１２０は、ｎ個の入力データを作成し（ステップＳ１３）、ｎ個のジョブを１並列で１時間実行する（ステップＳ１４）。 The process P20 including steps S13, S14, and S15 described above will be described in detail with reference to FIG. FIG. 6 is a flowchart for explaining the process P20 of FIG. In FIG. 6, the control unit 120 creates n pieces of input data (step S13), and executes n jobs for one hour in parallel (step S14).

制御部１２０は、並列度調整部１２４にステップＳ１５の処理を行わせる。ステップＳ１５での並列度調整処理は、以下に説明されるステップＳ５０からＳ５７で説明される。 The control unit 120 causes the parallelism adjustment unit 124 to perform the process of step S15. The parallel degree adjustment process in step S15 will be described in steps S50 to S57 described below.

並列度調整部１２４は、ｎ個のジョブの計算結果を解析する（ステップＳ５０）。例えば、並列度調整部１２４は、アモルファス材料の結晶化のシミュレーションの場合では結晶化率を、第一原理計算による構造緩和の場合では原子に働く力の減少率を計算結果から取得する。制御部１２０は、また、ジョブの実行に要した時間を取得する。ジョブの開始時刻から終了時刻までの時間を計算すればよい。 The degree of parallelism adjustment unit 124 analyzes the calculation results of n jobs (step S50). For example, the parallelism adjusting unit 124 acquires the crystallization rate in the case of simulation of crystallization of an amorphous material, and the reduction rate of the force acting on the atoms in the case of structural relaxation by the first principle calculation from the calculation result. The control unit 120 also acquires the time required for executing the job. The time from the start time to the end time of the job may be calculated.

並列度調整部１２４は、残り実行時間ｈ（ｉ）をiで指数付けたジョブごとに見積もって（ステップＳ５１）、並列度を調整して全ジョブを実行したときの最短実行時間ｈを見積もる（ステップＳ５２）。最短実行時間ｈは、残り実行時間ｈ（ｉ）の平均値（average（ｈ（ｉ）））とすればよい。 The degree of parallelism adjustment unit 124 estimates the remaining execution time h (i) for each job indexed by i (step S51), and estimates the shortest execution time h when all jobs are executed by adjusting the degree of parallelism (step S51). Step S52). The shortest execution time h may be an average value (average (h (i))) of the remaining execution time h (i).

並列度調整部１２４は、残り実行時間ｈ（ｉ）が最短実行時間ｈより長い（ｈ（ｉ）＞ｈの）ジョブに関しては、最短実行時間ｈに対する残り実行時間ｈ（ｉ）の割合の小数点以下を切り上げた整数値を算出することによって、並列度ｐ（ｉ）（ｐ（ｉ）＞ｈ（ｉ）＞ｈ＞ｐ（ｉ）−１）を求める（ステップＳ５３）。それ以外（ｈ（ｉ）＝＜ｈ）のジョブは、１並列による実行とする。並列度ｐｉは、即ち、ノード数を示す。 The degree of parallelism adjustment unit 124 determines the ratio of the remaining execution time h (i) to the shortest execution time h for jobs whose remaining execution time h (i) is longer than the shortest execution time h (h (i)> h). By calculating an integer value obtained by rounding up the following, the degree of parallelism p (i) (p (i)> h (i)> h> p (i) -1) is obtained (step S53). Other jobs (h (i) = <h) are executed in parallel. The parallel degree pi indicates the number of nodes.

次に、並列度調整部１２４は、平均回転数ｃを求める（ステップＳ５４）。並列度調整部１２４は、ジョブ毎に残り実行時間ｈ（ｉ）を並列度ｐ（ｉ）で割って（ｈ（ｉ）／ｐ（ｉ））、並列度ｐ（ｉ）でのジョブの残り実行時間を予測し、ジョブ全体の平均値（average（ｈ（ｉ）／ｐ（ｉ）））を算出する。並列度調整部１２４は、最短実行時間ｈの平均値（average（ｈ（ｉ）／ｐ（ｉ）））に対する割合の小数点以下を切り上げた整数値を算出することによって、平均回転数ｃ（ｃ＞ｈ／average（ｈ（ｉ）／ｐ（ｉ））＞ｃ−１）を算出する。即ち、並列度ｐ（ｉ）でジョブを投入した場合の、最短実行時間に達するまでの繰り返し回数の平均値を取得する。 Next, the parallelism adjusting unit 124 obtains the average rotation speed c (step S54). The degree of parallelism adjustment unit 124 divides the remaining execution time h (i) for each job by the degree of parallelism p (i) (h (i) / p (i)), and the remaining number of jobs at the degree of parallelism p (i) The execution time is predicted, and the average value (average (h (i) / p (i))) of the entire job is calculated. The degree of parallelism adjustment unit 124 calculates the average number of rotations c (c) by calculating an integer value obtained by rounding up the fractional part of the average value (average (h (i) / p (i))) of the shortest execution time h. > H / average (h (i) / p (i))> c-1) is calculated. That is, the average value of the number of iterations until the shortest execution time is reached when a job is submitted with the degree of parallelism p (i) is acquired.

そして、並列度調整部１２４は、投入ジョブ時間ｈａｖを求める（ステップＳ５５）。並列度調整部１２４は、ジョブ毎の残り実行時間ｈ（ｉ）を並列度ｐ（ｉ）で割って最短実行時間ｈに対する回転数を求めて、平均回転数ｃで割ることによって、投入ジョブ時間ｈａｖを求める。 Then, the parallelism adjustment unit 124 obtains the input job time hav (step S55). The parallelism adjustment unit 124 divides the remaining execution time h (i) for each job by the parallelism p (i) to obtain the rotation speed with respect to the shortest execution time h, and divides it by the average rotation speed c. Find hav.

並列度調整部１２４は、ジョブ毎に、並列度ｐ（ｉ）でノード２０を消費し、投入ジョブ時間ｈａｖで実行される並列度ジョブ時間ｐ（ｉ）ｈを求める（ステップＳ５６）。ジョブ毎の並列度ジョブ時間ｐ（ｉ）ｈは、並列度ｐ（ｉ）と投入ジョブ時間ｈａｖとを乗算することによって得られる。 The degree of parallelism adjustment unit 124 consumes the node 20 with the degree of parallelism p (i) for each job, and obtains the degree of parallelism job time p (i) h executed with the input job time hav (step S56). The parallelism job time p (i) h for each job is obtained by multiplying the parallelism p (i) and the input job time hav.

並列度調整部１２４は、iで指数付けたジョブの残り実行時間ｈ（ｉ）が０でなく、プロセス（計算機ノード資源）が尽きたか否かを判断する（ステップＳ５７）。並列度調整部１２４は、ジョブ毎に、並列度ジョブ時間ｐ（ｉ）ｈを減算した後の残り実行時間ｈ（ｉ）が０以上であるか否かを判断する。また、計算機ノード資源とは、ノード２０の総数に相当する。並列度調整部１２４は、計算機ノード資源から各ジョブの並列度ｐ（ｉ）を減算した値が０であるか否かを判断する。 The parallelism adjusting unit 124 determines whether the remaining execution time h (i) of the job indexed by i is not 0 and the process (computer node resource) has been exhausted (step S57). The degree of parallelism adjustment unit 124 determines, for each job, whether the remaining execution time h (i) after subtracting the degree of parallelism job time p (i) h is 0 or more. The computer node resource corresponds to the total number of nodes 20. The parallelism adjusting unit 124 determines whether or not the value obtained by subtracting the parallelism p (i) of each job from the computer node resource is zero.

ステップＳ５７において、ジョブ毎の残り実行時間ｈ（ｉ）が０でなく、プロセス（計算機ノード資源）が尽きたと判断した場合、並列度調整部１２４は、ステップＳ５１へ戻り、上述同様の処理を繰り返す。一方、ジョブ毎の残り実行時間ｈ（ｉ）が０であると判断した場合、並列度調整部１２４は、ジョブが連続するように順番を並べ替えて（ステップＳ５８）、図５のステップＳ１６へと進む。 If it is determined in step S57 that the remaining execution time h (i) for each job is not 0 and the process (computer node resource) has been exhausted, the parallelism adjustment unit 124 returns to step S51 and repeats the same processing as described above. . On the other hand, if it is determined that the remaining execution time h (i) for each job is 0, the parallelism adjustment unit 124 rearranges the order so that the jobs are continuous (step S58), and the process proceeds to step S16 in FIG. Proceed with

ノード数「１０」でジョブ数「１０」の場合の図４に示すジョブスケジュール４ａを一例として、並列度調整部１２４による処理例を、図７から図１２で詳述する。図７から図１２において、図６のステップＳ５１〜Ｓ５７の繰り返しを、１順目、２順目、・・・として説明する。 An example of processing performed by the parallelism adjusting unit 124 will be described in detail with reference to FIGS. 7 to 12 by taking the job schedule 4a shown in FIG. 4 when the number of nodes is “10” and the number of jobs is “10” as an example. 7 to 12, the repetition of steps S51 to S57 in FIG. 6 will be described as the first order, the second order, and so on.

図７は、１順目を含む処理における計算例を説明するための図である。図７に示す例では、入力データと異なる初期設定とによる１０個のジョブを１並列で制限時間（例えば、１時間）で実行して試行することによって得られた１時間で行われたシミュレーションによる進行状況から、各ジョブの残り実行時間ｈ（ｉ）を見積もる（ステップＳ１４〜Ｓ５１）。 FIG. 7 is a diagram for explaining a calculation example in the process including the first order. In the example shown in FIG. 7, the simulation is performed in one hour obtained by executing 10 jobs with input data and initial settings different from each other in parallel for a limited time (for example, 1 hour). From the progress, the remaining execution time h (i) of each job is estimated (steps S14 to S51).

記憶部３０の作業領域に作成したテーブルＴ７１に、各ジョブ１〜２０に対応づけて、残り実行時間ｈ（ｉ）が記録される。この例では、ジョブ１及びジョブ２で残り実行時間「２０」時間、ジョブ３で残り実行時間「１０」時間、ジョブ３で残り実行時間「１０」時間、ジョブ４で残り実行時間「８」時間、ジョブ５で残り実行時間「７」時間、ジョブ６〜１０で残り実行時間「５」時間であると予測したことが記録される。 The remaining execution time h (i) is recorded in the table T71 created in the work area of the storage unit 30 in association with each job 1-20. In this example, the remaining execution time “20” hours for job 1 and job 2, the remaining execution time “10” hours for job 3, the remaining execution time “10” hours for job 3, and the remaining execution time “8” hours for job 4 The remaining execution time “7” hours for job 5 and the remaining execution time “5” hours for jobs 6 to 10 are recorded.

残り実行時間ｈ（ｉ）を平均（average（ｈ（ｉ）））することによって最短実行時間ｈを得る（ステップＳ５２）。最短実行時間ｈとして「９」時間を取得する。 The shortest execution time h is obtained by averaging the remaining execution times h (i) (average (h (i))) (step S52). “9” hours are acquired as the shortest execution time h.

最短実行時間ｈより長い残り実行時間ｈ（ｉ）（ｈ（ｉ）＞ｈ）を示すジョブ１、２及び３（ｉ=１，２，３）に対して並列度ｐ（ｉ）（ｐ（ｉ）＞ｈ（ｉ）／ｈ＞ｐ（ｉ）−１）を算出して設定し、それ以外のジョブ４〜１０に対しては１並列を設定する（ステップＳ５３）。計算により、ジョブ１及び２には並列度「３」が設定され、ジョブ３には並列度「２」が設定され、ジョブ４〜１０には並列度「１」が設定される。 The degree of parallelism p (i) (p () for jobs 1, 2, and 3 (i = 1, 2, 3) indicating the remaining execution time h (i) (h (i)> h) longer than the shortest execution time h i)> h (i) / h> p (i) -1) is calculated and set, and 1 parallel is set for the other jobs 4 to 10 (step S53). According to the calculation, the degree of parallelism “3” is set for jobs 1 and 2, the degree of parallelism “2” is set for job 3, and the degree of parallelism “1” is set for jobs 4-10.

その後、各ジョブ１〜１０の並列度ｐ（ｉ）でジョブを実行した場合の残り実行時間（ｈ（ｉ）／ｐ（ｉ））を計算して、記憶部３０の作業領域に作成したテーブルＴ７１に、各ジョブ１〜１０に対応付けられて記録される。ジョブ１〜１０毎に、計算した並列度ｐ（ｉ）での残り実行時間（ｈ（ｉ）／ｐ（ｉ））の平均値（average（ｈ（ｉ）／ｐ（ｉ）））で最短実行時間ｈを割って、小数点以下を切り上げた整数値を平均回転数ｃとして取得する。平均回転数ｃは、ｃ＞ｈ／average（ｈ（ｉ）／ｐ（ｉ））＞ｃ−１を満たす（ステップＳ５４）。平均回転数ｃは、各ジョブ１〜１０に対応付けられて、記憶部３０の作業領域に作成したテーブルＴ７１に記録される。 Thereafter, the remaining execution time (h (i) / p (i)) when the job is executed with the parallelism p (i) of each job 1 to 10 is calculated, and the table created in the work area of the storage unit 30 At T71, it is recorded in association with each job 1-10. For each job 1 to 10, the shortest average value (average (h (i) / p (i))) of the remaining execution time (h (i) / p (i)) with the calculated degree of parallelism p (i) The execution time h is divided and an integer value obtained by rounding up the decimal part is obtained as the average rotation speed c. The average rotational speed c satisfies c> h / average (h (i) / p (i))> c-1 (step S54). The average rotation speed c is recorded in the table T71 created in the work area of the storage unit 30 in association with each job 1-10.

ジョブ１及び２では並列度「３」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「６．６７」となり、ジョブ３では並列度「２」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「５．００」となり、ジョブ４〜１０では並列度「１」であるので試行時の残り実行時間ｈ（ｉ）と同様となる。従って、平均回転数ｃは、「２」となる。 In jobs 1 and 2, the remaining execution time (h (i) / p (i)) with the degree of parallelism “3” is “6.67”, and in job 3, the remaining execution time (h (i) with the degree of parallelism “2”. / P (i)) is “5.00”, and the parallelism is “1” in jobs 4 to 10, so that it is the same as the remaining execution time h (i) at the time of trial. Accordingly, the average rotational speed c is “2”.

そして、各ジョブ１〜１０の投入ジョブ時間ｈａｖは、各ジョブ１〜１０の平均回転数ｃを平均（average（ｈ（ｉ）／ｐ（ｉ）／ｃ））することによって得られる（ステップＳ５５）。平均（average（ｈ（ｉ）／ｐ（ｉ）／ｃ））した値「２．９２」を四捨五入して、投入ジョブ時間ｈａｖ「３」時間を得る。 The input job time hav of each job 1-10 is obtained by averaging the average rotation speed c of each job 1-10 (average (h (i) / p (i) / c)) (step S55). ). The average (h (i) / p (i) / c) value “2.92” is rounded off to obtain the input job time hav “3”.

次に、投入ジョブ時間ｈａｖ「３」時間で順番に実行した場合の並列度状態を予測する（ステップＳ５６）。ジョブ１〜１０の順に、並列度ｐ（ｉ）を累計した消費ノード数累計をテーブルＴ７２に記録すると共に、並列度ジョブ時間ｐ（ｉ）ｈを算出して記録して、消費ノード数累計がノード数「１０」に達したら、ステップＳ５６での処理が終了する。 Next, a parallelism state is predicted when the jobs are executed sequentially in the input job time hav “3” time (step S56). In the order of jobs 1 to 10, the cumulative number of consumed nodes accumulated in the parallelism p (i) is recorded in the table T72 and the parallelism job time p (i) h is calculated and recorded. When the number of nodes reaches “10”, the process in step S56 ends.

この１順目の例では、テーブルＴ７２において、ジョブ１及び２の並列度ジョブ時間ｐ（ｉ）ｈは「９」時間であり、ジョブ３の並列度ジョブ時間ｐ（ｉ）ｈは「６」時間であり、ジョブ４及び５の並列度ジョブ時間ｐ（ｉ）ｈは「３」時間である。 In the first example, in the table T72, the parallelism job time p (i) h of the jobs 1 and 2 is “9” time, and the parallelism job time p (i) h of the job 3 is “6”. The parallelism job time p (i) h of jobs 4 and 5 is “3” time.

一方、ジョブ１を実行した場合の消費ノード数累計は「３」台であり、ジョブ１及び２を実行した場合の消費ノード数累計は「６」台であり、ジョブ１〜３を実行した場合の消費ノード数累計は「８」台であり、ジョブ１〜４を実行した場合の消費ノード数累計は「９」台である。そして、ジョブ１〜５を実行した場合の消費ノード数累計は「１０」台となり、総ノード数「１０」台に達するため、ジョブ６〜１０の並列度ジョブ時間は算出されない。従って、１順目の処理を終了する。 On the other hand, the cumulative number of consumed nodes when job 1 is executed is "3", the cumulative number of consumed nodes when jobs 1 and 2 are executed is "6", and jobs 1 to 3 are executed The total number of consumed nodes is “8”, and the total number of consumed nodes when jobs 1 to 4 are executed is “9”. When the jobs 1 to 5 are executed, the total number of consumed nodes is “10”, and the total number of nodes is “10”. Therefore, the parallel job time of the jobs 6 to 10 is not calculated. Accordingly, the first process is terminated.

全てのノードが消費されるが、テーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈがテーブルＴ７１の残り実行時間ｈ（ｉ）に達していない。全体の処理が完遂していないため、図８に例示する２順目の処理を開始する（ステップＳ５７のＹＥＳ）。 All nodes are consumed, but the parallelism job time p (i) h of the table T72 has not reached the remaining execution time h (i) of the table T71. Since the entire process has not been completed, the second process illustrated in FIG. 8 is started (YES in step S57).

図８は、２順目における計算例を説明するための図である。図８に示す例では、２順目以降では、テーブルＴ７１及びＴ７２を参照して、テーブルＴ７１の残り実行時間ｈ（ｉ）を更新することによって、各ジョブの残り実行時間ｈ（ｉ）を見積もる（ステップＳ５１）。 FIG. 8 is a diagram for explaining a calculation example in the second order. In the example shown in FIG. 8, after the second order, the remaining execution time h (i) of each job is estimated by referring to the tables T71 and T72 and updating the remaining execution time h (i) of the table T71. (Step S51).

図７に示すテーブルＴ７１の残り実行時間ｈ（ｉ）から、図７に示すテーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈを減算した値を予測時間とし、テーブルＴ７１を更新する。この例では、ジョブ１〜５の残り実行時間ｈ（ｉ）が更新され、夫々、「１１」、「１１」、「４」、「５」、及び「４」となる。 The table T71 is updated using the value obtained by subtracting the parallelism job time p (i) h of the table T72 shown in FIG. 7 from the remaining execution time h (i) of the table T71 shown in FIG. In this example, the remaining execution time h (i) of jobs 1 to 5 is updated to “11”, “11”, “4”, “5”, and “4”, respectively.

残り実行時間ｈ（ｉ）を平均（average（ｈ（ｉ）））することによって最短実行時間ｈを得る（ステップＳ５２）。最短実行時間ｈとして「６」時間を取得する。 The shortest execution time h is obtained by averaging the remaining execution times h (i) (average (h (i))) (step S52). “6” hours are acquired as the shortest execution time h.

最短実行時間ｈより長い残り実行時間ｈ（ｉ）（ｈ（ｉ）＞ｈ）を示すジョブ１及び２に対して、並列度ｐ（ｉ）（ｐ（ｉ）＞ｈ（ｉ）／ｈ＞ｐ（ｉ）−１）を算出して設定し、それ以外のジョブ３〜１０に対しては１並列を設定する（ステップＳ５３）。計算により、ジョブ１及び２には並列度「２」が設定され、ジョブ３〜１０には並列度「１」が設定される。 For jobs 1 and 2 indicating the remaining execution time h (i) (h (i)> h) longer than the shortest execution time h, the degree of parallelism p (i) (p (i)> h (i) / h> p (i) -1) is calculated and set, and 1 parallel is set for the other jobs 3 to 10 (step S53). Through the calculation, the parallel degree “2” is set for the jobs 1 and 2, and the parallel degree “1” is set for the jobs 3 to 10.

その後、各ジョブ１〜１０の並列度ｐ（ｉ）でジョブを実行した場合の残り実行時間（ｈ（ｉ）／ｐ（ｉ））を計算して、記憶部３０のテーブルＴ７１に記録される。ジョブ１〜１０毎に、ｃ＞ｈ／average（ｈ（ｉ）／ｐ（ｉ））＞ｃ−１を満たす平均回転数ｃを取得する（ステップＳ５４）。各ジョブ１〜１０の平均回転数ｃが、記憶部３０のテーブルＴ７１に記録される。 Thereafter, the remaining execution time (h (i) / p (i)) when the job is executed with the parallelism p (i) of each job 1 to 10 is calculated and recorded in the table T71 of the storage unit 30. . For each job 1 to 10, an average rotational speed c that satisfies c> h / average (h (i) / p (i))> c-1 is acquired (step S54). The average rotational speed c of each job 1 to 10 is recorded in the table T71 of the storage unit 30.

ジョブ１及び２では並列度「２」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「５．５０」となり、ジョブ３〜１０では並列度「１」であるので、ステップＳ５１で求めた残り実行時間ｈ（ｉ）のままである。従って、平均回転数ｃは、「２」となる。 In jobs 1 and 2, the remaining execution time (h (i) / p (i)) due to the degree of parallelism “2” is “5.50”, and in jobs 3 to 10 the degree of parallelism is “1”. The obtained remaining execution time h (i) remains as it is. Accordingly, the average rotational speed c is “2”.

そして、各ジョブ１〜１０の平均回転数ｃを平均（average（ｈ（ｉ）／ｐ（ｉ）／ｃ））した値「２．４５」を四捨五入することによって、投入ジョブ時間ｈａｖは「２」時間であることを得る（ステップＳ５５）。 Then, by rounding off a value “2.45” obtained by averaging (average (h (i) / p (i) / c)) the average rotation speed c of each of the jobs 1 to 10, the input job time hav is “2”. ”Is obtained (step S55).

次に、投入ジョブ時間ｈａｖ「２」時間で、ジョブ１〜１０の順において１順目の終了した次のジョブ６から開始してジョブ１０からジョブ１へと戻るように巡回して、並列度ｐ（ｉ）を累計した消費ノード数累計をテーブルＴ７２に記録すると共に、並列度ジョブ時間ｐ（ｉ）ｈを算出して記録する（ステップＳ５６）。消費ノード数累計がノード数「１０」に達したら、ステップＳ５６での処理が終了する。 Next, in the order of the input job time hav “2”, in the order of jobs 1 to 10, the job starts from the next job 6 that ends in the first order and returns from job 10 to job 1, and the degree of parallelism The cumulative number of consumed nodes obtained by accumulating p (i) is recorded in the table T72, and the parallelism job time p (i) h is calculated and recorded (step S56). When the cumulative number of consumed nodes reaches the number of nodes “10”, the process in step S56 ends.

この２順目の例では、テーブルＴ７２において、ジョブ６〜１０の並列度ジョブ時間ｐ（ｉ）ｈは「２」時間であり、ジョブ１及び２の並列度ジョブ時間ｐ（ｉ）ｈは「４」時間であり、また、ジョブ３の並列度ジョブ時間ｐ（ｉ）ｈは「３」時間である。 In the second example, in the table T72, the parallelism job time p (i) h of the jobs 6 to 10 is “2” time, and the parallelism job time p (i) h of the jobs 1 and 2 is “ 4 ”time, and the parallel degree job time p (i) h of job 3 is“ 3 ”time.

一方、２順目の消費ノード数累計は、ジョブ６から開始される。ジョブ６、７、８、９、１０、１、２、そしてジョブ３の順に累計され、夫々、消費ノード数累計は「１」台、「２」台、「３」台、「４」台、「５」台、「７」台、「８」台、そして「１０」台となり、ジョブ３で総ノード数「１０」台に達するため、ジョブ４及び５の並列度ジョブ時間は算出されない。従って、２順目の処理を終了する。 On the other hand, the cumulative number of consumed nodes in the second order starts from job 6. Jobs 6, 7, 8, 9, 10, 1, 2, and job 3 are accumulated in this order, and the accumulated number of consumed nodes is “1”, “2”, “3”, “4”, Since “5”, “7”, “8”, and “10” are reached and the total number of nodes in job 3 reaches “10”, the parallel job time of jobs 4 and 5 is not calculated. Therefore, the second process is terminated.

全てのノードが消費されるが、テーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈがテーブルＴ７１の残り実行時間ｈ（ｉ）に達していない。全体の処理が完遂していないため、図９に例示する３順目の処理を開始する（ステップＳ５７のＹＥＳ）。 All nodes are consumed, but the parallelism job time p (i) h of the table T72 has not reached the remaining execution time h (i) of the table T71. Since the entire process is not completed, the third process illustrated in FIG. 9 is started (YES in step S57).

図９は、３順目における計算例を説明するための図である。図９において、図８に示すテーブルＴ７１の残り実行時間ｈ（ｉ）から、図８に示すテーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈを減算した値を予測時間とし、テーブルＴ７１を更新する（ステップＳ５１）。この例では、ジョブ６〜１０及び１〜３の残り実行時間ｈ（ｉ）が更新される。ジョブ６〜１０は、夫々「３」となる。ジョブ１〜３は、夫々、「７」、「７」、及び「２」となる。 FIG. 9 is a diagram for explaining a calculation example in the third order. In FIG. 9, the value obtained by subtracting the parallel job time p (i) h of the table T72 shown in FIG. 8 from the remaining execution time h (i) of the table T71 shown in FIG. (Step S51). In this example, the remaining execution times h (i) of jobs 6 to 10 and 1-3 are updated. Jobs 6 to 10 are each “3”. Jobs 1 to 3 are “7”, “7”, and “2”, respectively.

残り実行時間ｈ（ｉ）を平均（average（ｈ（ｉ）））することによって最短実行時間ｈ「４」時間を得る（ステップＳ５２）。 The remaining execution time h (i) is averaged (average (h (i))) to obtain the shortest execution time h “4” (step S52).

最短実行時間ｈより長い残り実行時間ｈ（ｉ）（ｈ（ｉ）＞ｈ）を示すジョブ１、２及び４に対して並列度「２」を設定し、それ以外のジョブ３及び５〜１０に対しては並列度「１」を設定する（ステップＳ５３）。 The degree of parallelism “2” is set for jobs 1, 2 and 4 indicating the remaining execution time h (i) (h (i)> h) longer than the shortest execution time h, and the other jobs 3 and 5 to 10 are set. Is set to "1" (step S53).

その後、並列度ｐ（ｉ）での残り実行時間（ｈ（ｉ）／ｐ（ｉ））と、ｃ＞ｈ／average（ｈ（ｉ）／ｐ（ｉ））＞ｃ−１を満たす平均回転数ｃとが、記憶部３０のテーブルＴ７１に記録される（ステップＳ５４）。 After that, the average execution time satisfying the remaining execution time (h (i) / p (i)) at the degree of parallelism p (i) and c> h / average (h (i) / p (i))> c-1. The number c is recorded in the table T71 of the storage unit 30 (step S54).

ジョブ１及び２では並列度「２」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「３．５０」となり、ジョブ３では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「２．００」となり、ジョブ４では並列度「２」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「２．５０」となり、そしてジョブ５〜１０では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「３．００」となる。従って、平均回転数ｃは、「２」である。 In jobs 1 and 2, the remaining execution time (h (i) / p (i)) with parallelism “2” is “3.50”, and with job 3 the remaining execution time (h (i) with parallelism “1” / P (i)) becomes “2.00”, and in job 4, the remaining execution time (h (i) / p (i)) due to the degree of parallelism “2” becomes “2.50”, and jobs 5 to 10 Then, the remaining execution time (h (i) / p (i)) with the degree of parallelism “1” is “3.00”. Therefore, the average rotational speed c is “2”.

そして、各ジョブ１〜１０の平均回転数ｃを平均（average（ｈ（ｉ）／ｐ（ｉ）／ｃ））した値「１．３１」を四捨五入して、投入ジョブ時間ｈａｖは「２」時間であることを得る（ステップＳ５５）。 Then, a value “1.31” obtained by averaging (average (h (i) / p (i) / c)) the average rotation speed c of each job 1 to 10 is rounded off, and the input job time hav is “2”. Get time (step S55).

次に、投入ジョブ時間ｈａｖ「２」時間で、ジョブ１〜１０の順において２順目の終了した次のジョブ４から開始してジョブ１０からジョブ１へと戻るように巡回して、並列度ｐ（ｉ）を累計した消費ノード数累計をテーブルＴ７２に記録すると共に、並列度ジョブ時間ｐ（ｉ）ｈを算出して記録する（ステップＳ５６）。この３順目の例では、消費ノード数累計がノード数「１０」に達したジョブ１で、ステップＳ５６での処理が終了する。 Next, in the order of the input job time hav “2”, in the order of jobs 1 to 10, the job starts from the next job 4 that ends in the second order and returns from job 10 to job 1, and the degree of parallelism The cumulative number of consumed nodes obtained by accumulating p (i) is recorded in the table T72, and the parallelism job time p (i) h is calculated and recorded (step S56). In the third example, the job 1 in which the cumulative number of consumed nodes has reached the number of nodes “10” ends the process in step S56.

全てのノードが消費されるが、テーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈがテーブルＴ７１の残り実行時間ｈ（ｉ）に達していない。全体の処理が完遂していないため、図１０に例示する４順目の処理を開始する（ステップＳ５７のＹＥＳ）。 All nodes are consumed, but the parallelism job time p (i) h of the table T72 has not reached the remaining execution time h (i) of the table T71. Since the entire process has not been completed, the process in the fourth order illustrated in FIG. 10 is started (YES in step S57).

図１０は、４順目における計算例を説明するための図である。図１０において、図９に示すテーブルＴ７１の残り実行時間ｈ（ｉ）から、図９に示すテーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈを減算した値を予測時間とし、テーブルＴ７１を更新する（ステップＳ５１）。この例では、ジョブ４〜１０及びジョブ１の残り実行時間ｈ（ｉ）が更新される。ジョブ４〜１０そしてジョブ１は、夫々、「１」、「２」、「１」、「１」、「１」、「１」、「１」、そして「３」となる。 FIG. 10 is a diagram for explaining a calculation example in the fourth order. In FIG. 10, a value obtained by subtracting the parallel degree job time p (i) h of the table T72 shown in FIG. 9 from the remaining execution time h (i) of the table T71 shown in FIG. (Step S51). In this example, the remaining execution time h (i) of jobs 4 to 10 and job 1 is updated. Jobs 4 to 10 and job 1 are “1”, “2”, “1”, “1”, “1”, “1”, “1”, and “3”, respectively.

残り実行時間ｈ（ｉ）を平均（average（ｈｉ））することによって最短実行時間ｈ「２」時間を得る（ステップＳ５２）。 The remaining execution time h (i) is averaged (average (hi)) to obtain the shortest execution time h “2” (step S52).

最短実行時間ｈより長い残り実行時間ｈ（ｉ）（ｈ（ｉ）＞ｈ）を示すジョブ１及び２に対して並列度「２」及び「４」を夫々設定し、それ以外のジョブ３〜１０に対しては並列度「１」を設定する（ステップＳ５３）。 Parallelism levels “2” and “4” are set for jobs 1 and 2 indicating the remaining execution time h (i) (h (i)> h) longer than the shortest execution time h, and other jobs 3 to 3 are set. The degree of parallelism “1” is set for 10 (step S53).

ジョブ１では並列度「２」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「１．５０」となり、ジョブ２では並列度「４」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「１．７５」となり、ジョブ３では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「２．００」となり、ジョブ４では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「１．００」となり、ジョブ５では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「２．００」となり、ジョブ６〜１０では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「１．００」となる。従って、平均回転数ｃは、「２」である。 In job 1, the remaining execution time (h (i) / p (i)) with the degree of parallelism “2” is “1.50”, and in job 2, the remaining execution time (h (i) / p with the degree of parallelism “4”) (I)) becomes “1.75”, and in job 3, the remaining execution time (h (i) / p (i)) with parallelism “1” becomes “2.00”, and in job 4, parallelism “1” The remaining execution time (h (i) / p (i)) by “1” is “1.00”, and in job 5, the remaining execution time (h (i) / p (i)) by the parallelism “1” is “2”. .00 ”, and in jobs 6 to 10, the remaining execution time (h (i) / p (i)) with the degree of parallelism“ 1 ”is“ 1.00 ”. Therefore, the average rotational speed c is “2”.

そして、各ジョブ１〜１０の平均回転数ｃを平均（average（ｈ（ｉ）／ｐ（ｉ）／ｃ））した値「１．５１」を四捨五入して、投入ジョブ時間ｈａｖは「２」時間であることを得る（ステップＳ５５）。 Then, a value “1.51” obtained by averaging (average (h (i) / p (i) / c)) the average rotation speed c of each job 1 to 10 is rounded off, and the input job time hav is “2”. Get time (step S55).

次に、投入ジョブ時間ｈａｖ「１」時間で、ジョブ１〜１０の順において３順目の終了した次のジョブ９から開始してジョブ１０からジョブ１へと戻るように巡回して、並列度ｐ（ｉ）を累計した消費ノード数累計をテーブルＴ７２に記録すると共に、並列度ジョブ時間ｐ（ｉ）ｈを算出して記録する（ステップＳ５６）。この４順目の例では、消費ノード数累計がノード数「１０」に達したジョブ５で、ステップＳ５６での処理が終了する。 Next, in the order of the input job time hav “1”, in the order of jobs 1 to 10, a cycle is started so as to start from the next job 9 that has finished in the third order and return from the job 10 to the job 1. The cumulative number of consumed nodes obtained by accumulating p (i) is recorded in the table T72, and the parallelism job time p (i) h is calculated and recorded (step S56). In this fourth example, the process in step S56 ends for job 5 in which the cumulative number of consumed nodes has reached the number of nodes “10”.

全てのノードが消費され、かつ、テーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈがテーブルＴ７１の残り実行時間ｈ（ｉ）に達したため、全体の処理が完遂したと判断できる（ステップＳ５７のＮＯ）。従って、ジョブが連続するように順番を並べ替える処理が行われる（ステップＳ５８）。 Since all nodes are consumed and the parallelism job time p (i) h of the table T72 has reached the remaining execution time h (i) of the table T71, it can be determined that the entire processing has been completed (NO in step S57). ). Accordingly, processing for rearranging the order so that the jobs are continuous is performed (step S58).

図１１は、５順目における計算例を説明するための図である。図１１において、図１０に示すテーブルＴ７１の残り実行時間ｈ（ｉ）から、図１０に示すテーブルＴ７２の並列度ジョブ時間ｐ（ｉ）ｈを減算した値を予測時間とし、テーブルＴ７１を更新する（ステップＳ５１）。この例では、ジョブ２〜８の残り実行時間ｈ（ｉ）が更新される。ジョブ２〜８は、夫々、「３」、「１」、「０」、「１」、「０」、「０」、及び「０」となる。 FIG. 11 is a diagram for explaining a calculation example in the fifth order. In FIG. 11, a value obtained by subtracting the parallel degree job time p (i) h of the table T72 shown in FIG. 10 from the remaining execution time h (i) of the table T71 shown in FIG. 10 is used as the predicted time, and the table T71 is updated. (Step S51). In this example, the remaining execution time h (i) of jobs 2 to 8 is updated. Jobs 2 to 8 are “3”, “1”, “0”, “1”, “0”, “0”, and “0”, respectively.

残り実行時間ｈ（ｉ）を平均（average（ｈ（ｉ）））することによって最短実行時間ｈ「２」時間を得る（ステップＳ５２）。 The remaining execution time h (i) is averaged (average (h (i))) to obtain the shortest execution time h “2” (step S52).

最短実行時間ｈより長い残り実行時間ｈ（ｉ）（ｈ（ｉ）＞ｈ）を示すジョブ１及び２に対して並列度「３」を夫々設定し、それ以外の最短実行時間ｈ以下であるが残り実行時間ｈ（ｉ）が「０」より大きいジョブ３、５、９、及び１０に対しては並列度「１」を設定する（ステップＳ５３）。残り実行時間ｈ（ｉ）が「０」のジョブ４及び６〜８に対しては並列度「０」を設定する。以下、残り実行時間ｈ（ｉ）が「０」より大きいジョブ１、２、３、５、９、及び１０に対して以下の処理が行われる。 A degree of parallelism “3” is set for each of jobs 1 and 2 indicating the remaining execution time h (i) (h (i)> h) longer than the shortest execution time h, and is equal to or shorter than the other shortest execution time h. Is set to parallel degree “1” for jobs 3, 5, 9, and 10 whose remaining execution time h (i) is larger than “0” (step S53). The degree of parallelism “0” is set for jobs 4 and 6 to 8 whose remaining execution time h (i) is “0”. Thereafter, the following processing is performed for jobs 1, 2, 3, 5, 9, and 10 having a remaining execution time h (i) greater than “0”.

並列度ｐ（ｉ）での残り実行時間（ｈ（ｉ）／ｐ（ｉ））と、ｃ＞ｈ／average（ｈ（ｉ）／ｐ（ｉ））＞ｃ−１を満たす平均回転数ｃとが、記憶部３０のテーブルＴ７１に記録される（ステップＳ５４）。 The remaining execution time (h (i) / p (i)) at the degree of parallelism p (i) and the average rotational speed c satisfying c> h / average (h (i) / p (i))> c-1. Is recorded in the table T71 of the storage unit 30 (step S54).

ジョブ１及び２では並列度「３」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「１．００」となり、ジョブ３、５、９、及び１０では並列度「１」による残り実行時間（ｈ（ｉ）／ｐ（ｉ））が「１．００」となる。従って、平均回転数ｃは、「１」である。 For jobs 1 and 2, the remaining execution time (h (i) / p (i)) with the degree of parallelism “3” is “1.00”, and for jobs 3, 5, 9, and 10, the remainder with the degree of parallelism “1” The execution time (h (i) / p (i)) is “1.00”. Therefore, the average rotational speed c is “1”.

そして、各ジョブ１〜１０の平均回転数ｃを平均（average（ｈ（ｉ）／ｐ（ｉ）／ｃ））した値「１」により、投入ジョブ時間ｈａｖは「１」時間であることを得る（ステップＳ５５）。 Based on the value “1” obtained by averaging (average (h (i) / p (i) / c)) the average rotation speed c of each job 1 to 10, the input job time hav is “1” time. Obtain (step S55).

図１２は、ジョブ順の並べ替えた結果例を示す図である。図１２において、図１１に例示される全体の処理が完遂したときに予測される並列度状態において、ジョブが連続するように並べ替えられることによって、ジョブスケジュール４ｂ（図４）を得る。 FIG. 12 is a diagram illustrating an example of the result of rearranging the job order. In FIG. 12, the job schedule 4b (FIG. 4) is obtained by rearranging the jobs so as to be continuous in the parallel degree state predicted when the entire processing illustrated in FIG. 11 is completed.

図１２に例示されるように、一部のジョブの並列度を変更して実行することに影響され、他のパラメータに対する継続ジョブの実行が後回しになることもある。しかし、パラメータ空間探索のシミュレーション時間は短縮される。 As illustrated in FIG. 12, the execution of a continuous job for other parameters may be postponed by being influenced by changing the parallelism of some jobs. However, the simulation time for parameter space search is shortened.

更に、パラメータ空間の探索領域が、探索につれ広がったり、狭まったりする場合（つまり、一部のパラメータ領域の計算結果に依存して残りの探索空間が決まる場合）には、並列度調整部１２４による並列度調整処理（ステップＳ１５）は、探索計算の完了時間を短縮するのに非常に有効である。並列計算機側の物理ノード空間への論理ノード空間の割り付けを最適化する機能を組み合わせれば、更に、時間短縮効果は向上する。 Furthermore, when the search area of the parameter space expands or narrows as the search proceeds (that is, when the remaining search space is determined depending on the calculation results of some parameter areas), the parallel degree adjustment unit 124 The parallel degree adjustment process (step S15) is very effective for shortening the completion time of the search calculation. If the function of optimizing the allocation of the logical node space to the physical node space on the parallel computer side is combined, the time reduction effect is further improved.

上述した並列度調整処理（ステップＳ１５）をジョブ投入の一定経過時間（例えば、ジョブの制限時間）毎に行って、予測される並列度状態を更新し、また、並べ替える処理が行われることによって、動的に並列度が調整され、調整された並列度で継続して計算が行われる。 By performing the above-described parallelism adjustment process (step S15) at every predetermined elapsed time (for example, job time limit) of job submission, updating the predicted parallelism state, and performing the rearrangement process The degree of parallelism is dynamically adjusted, and the calculation is continuously performed with the adjusted degree of parallelism.

更に実行効率を向上させるために、ジョブ実行中に、他のジョブの進行程度と照らし合わせながら、並列度を動的に変更してもよい。 In order to further improve the execution efficiency, the degree of parallelism may be dynamically changed during job execution while checking the progress of other jobs.

以下に、本実施例が適用される、分子動力学法シミュレーションによる高誘電率酸化材料の組成と結晶化について説明する。図１３は、結晶化過程の一例を示す図である。図１３中では、ハフニウムＨｆ及びシリコンＳｉを示し、酸素Ｏ及び窒素Ｎは非表示である。図１３において、高誘電率酸化材料に関するＨｆ_１−ｘＳｉ_ｘＯ_２−ｙＮ_２ｙ／３の結晶化過程の一例として、Ｈｆ_０．９Ｓｉ_０．１Ｏ_１．９Ｎ_{０．０６７}／ｍｏｎｏｃｌｉｎｉｃ−ＨｆＯ_２、かつ１２００Ｋのシミュレーションによる過程が示されている。 Hereinafter, the composition and crystallization of a high dielectric constant oxide material by molecular dynamics simulation to which the present embodiment is applied will be described. FIG. 13 is a diagram illustrating an example of a crystallization process. In FIG. 13, hafnium Hf and silicon Si are shown, and oxygen O and nitrogen N are not shown. In FIG. 13, Hf _0.9 Si _0.1 O _1.9 N _0.067 / monoclinic as an example of the crystallization process of Hf _1-x Si _x O ₂ _-yN _{2y / 3 for} the high dielectric constant oxide material. -HfO _2, and the process simulation of 1200K is shown.

左（初期）、中央（７９７ｐｓ）、そして右（９９７ｐｓ）へと結晶化が進む状態を例示している。９９７ｐｓでは、結晶化領域からシリコンＳｉが吐き出された状態がシミュレーションされている。 A state in which crystallization proceeds to the left (initial), the center (797 ps), and the right (997 ps) is illustrated. At 997 ps, a state in which silicon Si is discharged from the crystallization region is simulated.

また、図１４は、結晶領域の体積の見積もり方法の一例を示す図である。図１４では、結晶領域の体積を、ハフニウムＨｆの原子数で見積もる例を示している。左（０ｐｓ）、中央（４００ｐｓ）、そして右（６５０ｐｓ）へのアモルファス状態から結晶化状態への各過程において、結晶基板表面からの深さをΔＲのＨｆ原子数をシミュレーションの結果として得て、結晶領域の体積を見積もる。 FIG. 14 is a diagram showing an example of a method for estimating the volume of the crystal region. FIG. 14 shows an example in which the volume of the crystal region is estimated by the number of atoms of hafnium Hf. In each process from the amorphous state to the crystallized state to the left (0 ps), the center (400 ps), and the right (650 ps), the depth from the crystal substrate surface is obtained as the number of Hf atoms of ΔR as a result of simulation, Estimate the volume of the crystalline region.

このようなシミュレーションにおいて、結晶化のし易さを検証する際には、組成（ｘ及びｙの比率）、温度Ｋ等のパラメータを変更して、パラメータの組み合せ毎に結晶化のし易さを検証することによって、例えば、窒化によるＨｆＯ_２、Ｈｆシリケートによる結晶化抑制効果を検証することができる。 In such a simulation, when verifying the ease of crystallization, parameters such as the composition (ratio of x and y) and temperature K are changed, and the ease of crystallization is improved for each combination of parameters. By verifying, for example, it is possible to verify the crystallization suppression effect by nitriding HfO ₂ and Hf silicate.

図１５は、組成と結晶化し易さとの関係の検証方法の一例を示す図である。図１５（Ａ）では、組成と温度とを変えて、結晶化率の時間変化を見積もった例をグラフで示している。図１５（Ｂ）では、図１５（Ａ）で得た結晶化率の時間変化を示す直線の傾きから組成毎にアレニウスプロットを作り、活性化エネルギーを得るためのグラフが示されている。そして、組成毎のグラフから、組成と結晶化し易さとの関係を定量化することが行われる。図１５（Ｃ）では、組成と結晶化し易さとの関係が定量化された例を示す。図１５（Ｃ）に示す定量化された値は、単位ｅＶの活性化エネルギーで示される。 FIG. 15 is a diagram showing an example of a method for verifying the relationship between composition and crystallization ease. FIG. 15A is a graph showing an example in which the change in crystallization rate with time is estimated by changing the composition and temperature. FIG. 15B shows a graph for obtaining an activation energy by making an Arrhenius plot for each composition from the slope of the straight line showing the change in crystallization rate with time obtained in FIG. 15A. Then, the relationship between the composition and crystallization ease is quantified from the graph for each composition. FIG. 15C shows an example in which the relationship between composition and crystallization ease is quantified. The quantified value shown in FIG. 15C is indicated by the activation energy of the unit eV.

上述したような計算処理を、本実施例に係る並列度調整部１２４を含む並列計算システム１０００（マルチジョブシステム）によって行うことで、処理効率を改善することができる。 Processing efficiency can be improved by performing the calculation processing as described above by the parallel computing system 1000 (multi-job system) including the parallelism adjusting unit 124 according to the present embodiment.

並列度調整部１２４を含む並列計算システム１０００では、パラメータ空間を探索するマルチジョブによるシミュレーションにおいて、ジョブの一定経過時間（例えば、予め決められた制約時間）毎に計算経過を解析し、終了までに要する時間を見積もり、その程度に応じて並列度を自動的に変更して継続計算が行われる。従って、パラメータ空間を探索するようなシミュレーションにおける計算処理のスループットを上げることができる。 In the parallel computing system 1000 including the degree of parallelism adjustment unit 124, in the multi-job simulation for searching the parameter space, the calculation progress is analyzed for each fixed elapsed time (for example, a predetermined constraint time) of the job, and the process is completed. The time required is estimated, and the parallelism is automatically changed according to the estimated time, and the continuous calculation is performed. Therefore, it is possible to increase the throughput of calculation processing in a simulation for searching the parameter space.

本発明は、具体的に開示された実施例に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the specifically disclosed embodiments, and various modifications and changes can be made without departing from the scope of the claims.

以上の実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
複数のノードを用いた並列計算によって実行されるプログラムを記憶した記憶部と、
前記記憶部から読み込んだ前記プログラムを複数のジョブで実行し、一定経過時間毎に各ジョブの計算結果を解析して、解析結果から得られる各ジョブの残り実行時間に基づいて全ジョブの終了が揃うように各ジョブの並列度を調整することによって、前記並列計算の全体のジョブスケジュールを行う並列度調整部を有し、該並列度調整部によって得られた該ジョブスケジュールに基づいて該並列度を変更する制御部と
を有することを特徴とする並列計算制御装置。
（付記２）
前記並列度調整部は、ジョブ毎にノード数と継続して行われる時間とで表されるタイリングで前記並列度を調整して前記ジョブスケジュールを行うことを特徴とする付記１記載の並列計算制御装置。
（付記３）
前記並列度調整部は、
前記記憶部から読み込んだ前記プログラムを複数のジョブで実行し、一定経過時間毎に各ジョブの計算結果を解析して、ジョブ毎の終了までに要する第１残り実行時間を見積もって、各ジョブに該第１残り実行時間を対応付けた第１テーブルを該記憶部に記憶する第１残り実行時間見積もり部と、
ジョブ毎に、前記記憶部の前記第１テーブルを参照して、前記第１残り実行時間の、２以上のノードによる並列度を調整した場合の最短実行時間を見積もる最短実行時間見積もり部と、
ジョブ毎に、前記第１残り実行時間の前記最短実行時間に対する割合からノード数を示す並列度を算出して、前記記憶部の前記第１テーブル内に、各ジョブに対応付けて該並列度を記憶する並列度算出部と、
ジョブ毎に前記並列度に基づいてジョブの第２残り実行時間を算出して全ジョブで平均し、前記最短実行時間を該平均値で除算することにより、前記最短実行時間に達するまで繰り返されるジョブの平均回転数を取得する平均回転数取得部と、
各ジョブの前記第２残り実行時間を平均回転数で除算した値を全ジョブで平均して投入ジョブ時間を算出する投入ジョブ実行時間算出部と、
前記複数のノードへのジョブの投入順に従って前記並列度を累積すると共に、該並列度に投入ジョブ時間を乗算することによって得られる並列度ジョブ時間を累積して、各ジョブに夫々の累積した値を対応付けた第２テーブルを前記記憶部に記憶する累積部とを有することを特徴とする付記２記載の並列計算制御装置。
（付記４）
前記累積部によって累積された前記並列度ジョブ時間の累積値が前記残り実行時間未満であって、前記並列度の累積値が全ノード数に達した場合、前記第１残り実行時間見積もり部と、前記最短実行時間見積もり部と、前記並列度算出部と、前記平均回転数取得部と、前記投入ジョブ実行時間算出部と、前記累積部とを繰り返すことを特徴とする付記３記載の並列計算制御装置。
（付記５）
前記並列計算は、パラメータ空間におけるパラメータの最適値を探索するマルチジョブシステムで実行されることを特徴とする付記１乃至４のいずれか一項記載の並列計算制御装置。
（付記６）
コンピュータによって実行される並列計算制御方法であって、
記憶部に記憶された複数のノードを用いた並列計算によって実行されるプログラムを読み込んで複数のジョブで実行し、
一定経過時間毎に各ジョブの計算結果を解析し、
前記解析の結果から得られる各ジョブの残り実行時間に基づいて全ジョブの終了が揃うように各ジョブの並列度を調整することによって、前記並列計算の全体のジョブスケジュールを行い、
前記ジョブスケジュールに基づいて前記並列度を変更する
ことを特徴とする並列計算制御方法。
（付記７）
記憶部に記憶された複数のノードを用いた並列計算によって実行されるプログラムを読み込んで複数のジョブで実行し、
一定経過時間毎に各ジョブの計算結果を解析し、
前記解析の結果から得られる各ジョブの残り実行時間に基づいて全ジョブの終了が揃うように各ジョブの並列度を調整することによって、前記並列計算の全体のジョブスケジュールを行い、
前記ジョブスケジュールに基づいて前記並列度を変更する、
処理をコンピュータに実行させるプログラムを記憶したコンピュータ読取可能な記憶媒体。
（付記８）
記憶部に記憶された複数のノードを用いた並列計算によって実行されるプログラムを読み込んで複数のジョブで実行し、
一定経過時間毎に各ジョブの計算結果を解析し、
前記解析の結果から得られる各ジョブの残り実行時間に基づいて全ジョブの終了が揃うように各ジョブの並列度を調整することによって、前記並列計算の全体のジョブスケジュールを行い、
前記ジョブスケジュールに基づいて前記並列度を変更する、
処理をコンピュータに実行させるプログラム。 The following additional notes are further disclosed with respect to the embodiment including the above examples.
(Appendix 1)
A storage unit storing a program executed by parallel computation using a plurality of nodes;
The program read from the storage unit is executed by a plurality of jobs, the calculation results of each job are analyzed at regular intervals, and all jobs are completed based on the remaining execution time of each job obtained from the analysis results. By adjusting the parallelism of each job so as to be uniformed, it has a parallelism adjustment unit that performs the overall job schedule of the parallel computation, and the parallelism based on the job schedule obtained by the parallelism adjustment unit And a parallel computing control device characterized by comprising:
(Appendix 2)
The parallel calculation according to claim 1, wherein the parallel degree adjustment unit adjusts the parallel degree by tiling expressed by the number of nodes and a continuous time for each job and performs the job schedule. Control device.
(Appendix 3)
The parallel degree adjustment unit
The program read from the storage unit is executed by a plurality of jobs, the calculation result of each job is analyzed at a certain elapsed time, and the first remaining execution time required until the end of each job is estimated. A first remaining execution time estimation unit that stores a first table in which the first remaining execution time is associated with the storage unit;
For each job, referring to the first table of the storage unit, a shortest execution time estimation unit that estimates the shortest execution time when the parallelism by two or more nodes of the first remaining execution time is adjusted;
For each job, a parallel degree indicating the number of nodes is calculated from a ratio of the first remaining execution time to the shortest execution time, and the parallel degree is associated with each job in the first table of the storage unit. A parallel degree calculation unit for storing;
A job that is repeated until the shortest execution time is reached by calculating the second remaining execution time of the job based on the degree of parallelism for each job, averaging all the jobs, and dividing the shortest execution time by the average value An average rotational speed acquisition unit for acquiring the average rotational speed of
A submitted job execution time calculation unit that calculates a submitted job time by averaging a value obtained by dividing the second remaining execution time of each job by an average rotation speed for all jobs;
The degree of parallelism is accumulated according to the order of submission of jobs to the plurality of nodes, and the degree of parallelism job time obtained by multiplying the degree of parallelism by the submitted job time is accumulated to each accumulated value for each job. The parallel calculation control device according to appendix 2, further comprising: an accumulating unit that stores a second table in the storage unit in the storage unit.
(Appendix 4)
When the accumulated value of the parallelism job time accumulated by the accumulating unit is less than the remaining execution time and the accumulated value of the parallelism reaches the total number of nodes, the first remaining execution time estimating unit; The parallel calculation control according to claim 3, wherein the shortest execution time estimation unit, the parallelism calculation unit, the average rotation speed acquisition unit, the submitted job execution time calculation unit, and the accumulation unit are repeated. apparatus.
(Appendix 5)
The parallel computation control device according to any one of appendices 1 to 4, wherein the parallel computation is executed by a multi-job system that searches for an optimum value of a parameter in a parameter space.
(Appendix 6)
A parallel computation control method executed by a computer,
Read a program to be executed by parallel calculation using multiple nodes stored in the storage unit and execute it in multiple jobs,
Analyze the calculation results of each job at regular intervals,
By adjusting the degree of parallelism of each job based on the remaining execution time of each job obtained from the result of the analysis, the entire job schedule of the parallel calculation is performed,
A parallel calculation control method, wherein the parallel degree is changed based on the job schedule.
(Appendix 7)
Read a program to be executed by parallel calculation using multiple nodes stored in the storage unit and execute it in multiple jobs,
Analyze the calculation results of each job at regular intervals,
By adjusting the degree of parallelism of each job based on the remaining execution time of each job obtained from the result of the analysis, the entire job schedule of the parallel calculation is performed,
Changing the degree of parallelism based on the job schedule;
A computer-readable storage medium storing a program for causing a computer to execute processing.
(Appendix 8)
Read a program to be executed by parallel calculation using multiple nodes stored in the storage unit and execute it in multiple jobs,
Analyze the calculation results of each job at regular intervals,
By adjusting the degree of parallelism of each job based on the remaining execution time of each job obtained from the result of the analysis, the entire job schedule of the parallel calculation is performed,
Changing the degree of parallelism based on the job schedule;
A program that causes a computer to execute processing.

５プロセッサ
６ネットワーク
１１ＣＰＵ
１２主記憶装置
１３補助記憶装置
１４表示装置
１５入力装置
１６通信Ｉ／Ｆ
１８ドライバ
１９記憶媒体
２０ノード
２２記憶装置
３１プログラムＡ
３２プログラムＢ
４１解析プログラムＡ'
４２解析プログラムＢ'
５１入力データファイル
５３出力データファイル
１２０制御部
１２４並列度調整部
１３０記憶部
１９０キューイングシステム
２００並列計算部 5 processor 6 network 11 CPU
12 Main storage device 13 Auxiliary storage device 14 Display device 15 Input device 16 Communication I / F
18 Driver 19 Storage medium 20 Node 22 Storage device 31 Program A
32 Program B
41 Analysis program A '
42 Analysis program B '
51 Input Data File 53 Output Data File 120 Control Unit 124 Parallelism Adjustment Unit 130 Storage Unit 190 Queuing System 200 Parallel Computing Unit

Claims

A storage unit storing a program executed by parallel computation using a plurality of nodes;
The program read from the storage unit is executed by a plurality of jobs, the calculation results of each job are analyzed at regular intervals, and all jobs are completed based on the remaining execution time of each job obtained from the analysis results. By adjusting the parallelism of each job so as to be uniformed, it has a parallelism adjustment unit that performs the overall job schedule of the parallel computation, and the parallelism based on the job schedule obtained by the parallelism adjustment unit And a parallel computing control device characterized by comprising:

The parallelism adjustment unit according to claim 1, wherein the parallelism adjustment unit adjusts the parallelism by tiling expressed by the number of nodes and the time continuously performed for each job, and performs the job schedule. Computer control unit.

The parallel degree adjustment unit
The program read from the storage unit is executed by a plurality of jobs, the calculation result of each job is analyzed at a certain elapsed time, and the first remaining execution time required until the end of each job is estimated. A first remaining execution time estimation unit that stores a first table in which the first remaining execution time is associated with the storage unit;
For each job, referring to the first table of the storage unit, a shortest execution time estimation unit that estimates the shortest execution time when the parallelism by two or more nodes of the first remaining execution time is adjusted;
For each job, a parallel degree indicating the number of nodes is calculated from a ratio of the first remaining execution time to the shortest execution time, and the parallel degree is associated with each job in the first table of the storage unit. A parallel degree calculation unit for storing;
A job that is repeated until the shortest execution time is reached by calculating the second remaining execution time of the job based on the degree of parallelism for each job, averaging all the jobs, and dividing the shortest execution time by the average value An average rotational speed acquisition unit for acquiring the average rotational speed of
A submitted job execution time calculation unit that calculates a submitted job time by averaging a value obtained by dividing the second remaining execution time of each job by an average rotation speed for all jobs;
The degree of parallelism is accumulated according to the order of submission of jobs to the plurality of nodes, and the degree of parallelism job time obtained by multiplying the degree of parallelism by the submitted job time is accumulated to each accumulated value for each job. The parallel calculation control device according to claim 2, further comprising: an accumulating unit that stores a second table in which the items are associated with each other in the storage unit.

When the accumulated value of the parallelism job time accumulated by the accumulating unit is less than the remaining execution time and the accumulated value of the parallelism reaches the total number of nodes, the first remaining execution time estimating unit; 4. The parallel calculation according to claim 3, wherein the shortest execution time estimation unit, the parallelism calculation unit, the average rotation speed acquisition unit, the submitted job execution time calculation unit, and the accumulation unit are repeated. Control device.

A parallel computation control method executed by a computer,
Read a program to be executed by parallel calculation using multiple nodes stored in the storage unit and execute it in multiple jobs,
Analyze the calculation results of each job at regular intervals,
By adjusting the degree of parallelism of each job based on the remaining execution time of each job obtained from the result of the analysis, the entire job schedule of the parallel calculation is performed,
A parallel calculation control method, wherein the parallel degree is changed based on the job schedule.