WO2013118287A1

WO2013118287A1 - Data processing method, data processing program and computer system

Info

Publication number: WO2013118287A1
Application number: PCT/JP2012/053080
Authority: WO
Inventors: 細内　昌明
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-02-10
Filing date: 2012-02-10
Publication date: 2013-08-15
Anticipated expiration: 2014-08-10

Description

Data processing method, data processing program, and computer system

本発明は、データ処理方法に関し、特に、ジョブに入力するデータを処理するデータ処理方法に関する。 The present invention relates to a data processing method, and more particularly, to a data processing method for processing data input to a job.

　メインフレームのような汎用コンピュータが大量のファイルをバッチジョブによって処理する場合、従来の汎用コンピュータは、処理対象のファイルに含まれるレコードを複数のファイルに分割し、複数のファイルをバッチジョブによって並列に処理することによって、処理効率を上げてきた。
　バッチジョブを並列に実行する場合のファイル分割方法としては、キー値によって識別される各レコードを、ファイル分割数によってあらかじめ定められたキーレンジに従って、複数のファイルに振り分ける処理が用いられる。 When a general-purpose computer such as a mainframe processes a large number of files using a batch job, the conventional general-purpose computer divides the records contained in the file to be processed into multiple files, and multiple files are processed in parallel by the batch job Processing efficiency has been improved by processing.
As a file division method when batch jobs are executed in parallel, a process of distributing each record identified by a key value into a plurality of files according to a key range determined in advance by the number of file divisions is used.

　従来、バッチジョブを分散して並列に実行するため、レコードのキー値が同じ範囲（キーレンジ）にあるレコードを同一ファイルに振り分けるファイルのキーレンジ分割方法が開示されている（例えば、引用文献１参照）。 Conventionally, since a batch job is distributed and executed in parallel, a file key range dividing method for distributing records in which the key values of records are in the same range (key range) to the same file has been disclosed (for example, cited reference 1). reference).

　また、データベースにおいてレコードを分散して配置する方法として、レコードを退避する際にレコード数をカウントしてキーレンジの平均値を求め、求められた平均値に従ったキーレンジに、退避されたレコードを分割する方法が開示されている（例えば、引用文献２参照）。 Also, as a method of distributing records in the database, the number of records is counted when the records are saved, the average value of the key range is obtained, and the saved records in the key range according to the obtained average value Is disclosed (for example, see cited document 2).

特開２００７－８６９５１号公報JP 2007-86951 A 特開平１１－３４５１５７号公報JP 11-345157 A

　前述のような、ジョブに入力するファイル（入力データ）に含まれる複数のレコードを、複数のキーレンジに従って複数のファイル（分割データ）に分割し、さらに、複数の計算機が分割データを処理することによって、ジョブを分散処理するバッチ処理が従来多く用いられている。以下に、バッチ処理における課題を示す。 As described above, a plurality of records included in a file (input data) input to a job are divided into a plurality of files (divided data) according to a plurality of key ranges, and a plurality of computers process the divided data. In many cases, batch processing for distributed processing of jobs is conventionally used. The problems in batch processing are shown below.

　各キーレンジにおけるレコード数のばらつきが大きい場合、各計算機におけるジョブ終了時間が不均一になり、その結果、全体のバッチ処理時間が長くなる。また、いずれかの分割データの規模（分割データに含まれるレコード数）が大きい状態において、いずれかの計算機において障害が発生した場合、ジョブを再実行するために必要な時間も長くなる。 When the number of records in each key range varies greatly, the job end time in each computer becomes uneven, and as a result, the entire batch processing time becomes longer. Further, when a failure occurs in any of the computers in a state where the size of any one of the divided data (the number of records included in the divided data) is large, the time required to re-execute the job becomes long.

　さらに、各分割データの規模が小さい場合、ジョブを起動及び終了させるためのオーバヘッドが増え、スループットが低下し、バッチ処理時間が長くなる。このように、バッチ処理時間を短縮するためには、分割データの規模をすべて一定範囲にする必要がある。 Furthermore, when the size of each divided data is small, the overhead for starting and ending the job increases, throughput decreases, and batch processing time increases. As described above, in order to shorten the batch processing time, it is necessary to keep the scale of the divided data within a certain range.

　しかし、ある時点で規模が一定範囲になるように最適なキーレンジを決定しても、ジョブを長期間かつ定期的に実行していくと、分割データの規模又は分割データ内の各レコードのキー値の分布が変動し、再び最適になるようにキーレンジを変更する必要が生じる。 However, even if the optimal key range is determined so that the scale is within a certain range at a certain point in time, if the job is executed periodically over a long period of time, the size of the divided data or the key of each record in the divided data It is necessary to change the key range so that the distribution of values fluctuates and becomes optimal again.

　また、入力データを分割したり、最適なキーレンジを決定したりする処理もバッチ処理時間に含まれるため、これらの処理時間を短縮する必要がある。 Also, since the batch processing time includes processing for dividing input data and determining the optimal key range, it is necessary to shorten these processing times.

　引用文献１に開示された方法において、キーレンジは、レコードのキー値を参照して決定される。このため、引用文献１に開示されたプログラムは、各分割データの規模が一定範囲になるような最適なキーレンジを決定するために、レコードを分割する入力データ全体のレコードを参照する必要がある。 In the method disclosed in the cited document 1, the key range is determined with reference to the key value of the record. For this reason, the program disclosed in the cited document 1 needs to refer to the record of the entire input data to divide the record in order to determine the optimum key range so that the scale of each divided data falls within a certain range. .

　また、引用文献１に開示されたプログラムは、キーレンジを決定後、さらに入力データに含まれるすべてのレコードを読み出してレコードを振り分ける必要がある。この結果、引用文献１に開示されたプログラムは、キーレンジの決定のための処理と入力データの振り分けのための処理との計２回の処理においてすべてのレコードを読み出すため、入力データの分割処理に時間がかかってしまう。 In addition, the program disclosed in the cited document 1 needs to read all records included in the input data after the key range is determined, and distribute the records. As a result, the program disclosed in the cited document 1 reads out all records in a total of two processes, ie, a process for determining a key range and a process for distributing input data. Takes time.

　引用文献２に開示された装置は、データの退避とともにキーレンジを決定するため、キーレンジを決定する処理において再度レコードを読み出すことがない。しかし、引用文献２における処理は、データの退避処理があるという前提が含まれる。このため、引用文献２に開示された装置が、例えば、他システムから転送される入力データを処理するバッチ処理を行う場合、正確に最適なキーレンジを決定するためには、すべてのレコードを２度読み出す必要がある。これによって、入力データの分割処理に時間がかかってしまう。 Since the apparatus disclosed in the cited document 2 determines the key range as data is saved, the record is not read again in the process of determining the key range. However, the process in the cited document 2 includes a premise that there is a data saving process. For this reason, when the apparatus disclosed in the cited document 2 performs batch processing for processing input data transferred from another system, for example, in order to accurately determine the optimum key range, all the records are set to 2 It is necessary to read it once. As a result, it takes time to divide the input data.

　また、引用文献２の方法は、すべてのキーレンジに対してキーレンジの分割が必要か否かを判定し、レコード数及び分割点を求めており、チェックのための処理オーバヘッドが生じる。 Also, the method of the cited document 2 determines whether or not the key range needs to be divided for all the key ranges, and obtains the number of records and the dividing points, which causes a processing overhead for checking.

　本発明では、ジョブに入力される各分割データを適正な規模に保ち、さらに、適正規模に保つためにキーレンジ決定処理時間、及び、入力データの分割処理時間を短縮することを目的とする。 In the present invention, it is an object to keep each divided data input to a job at an appropriate scale, and further to reduce a key range determination processing time and an input data division processing time in order to keep the appropriate scale.

　本発明の代表的な一形態によると、複数のレコードを含む入力データを分割し、分割された入力データの各々を複数のジョブの各々によって処理する計算機システムにおけるデータ処理方法であって、前記計算機システムは、プロセッサ及びメモリを備え、前記複数のレコードの各々には、キー値が割り当てられ、前記キー値は、最小値及び最大値によって定義されるキーレンジのいずれかに分類され、前記方法は、前記プロセッサが、前記各キーレンジに従って分割された前記レコードを前記複数のジョブによって処理する場合、前記処理されたレコードの数と、前記処理されたレコードに割り当てられたキー値が分類される前記キーレンジとを示す履歴を取得し、前記プロセッサが、前記取得された複数の履歴に基づいて、前記入力データに含まれるレコード数の変化のパターンを判定し、前記プロセッサが、前記判定された変化のパターンに従って、前記最小値及び前記最大値の少なくとも一つを変更する前記キーレンジを決定する。 According to a representative aspect of the present invention, there is provided a data processing method in a computer system that divides input data including a plurality of records, and processes each of the divided input data by each of a plurality of jobs. The system includes a processor and a memory, and each of the plurality of records is assigned a key value, and the key value is classified into one of a key range defined by a minimum value and a maximum value, and the method includes: When the processor processes the records divided according to the key ranges by the plurality of jobs, the number of processed records and the key value assigned to the processed records are classified. A history indicating a key range is acquired, and the processor is configured to input the input data based on the acquired plurality of histories. Determining a pattern of change in the number of records included in the data, the processor, according to the pattern of the determined change, determines the key range of changing at least one of the minimum value and the maximum value.

　本発明の代表的な一形態によると、ジョブに入力される分割データを適正な規模に保つことができる。 According to a typical embodiment of the present invention, it is possible to keep the divided data input to the job at an appropriate scale.

本発明の第１の実施形態の計算機システムのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the computer system of the 1st Embodiment of this invention. 本発明の第１の実施形態の分割データを生成する処理を示すブロック図である。It is a block diagram which shows the process which produces | generates the divided data of the 1st Embodiment of this invention. 本発明の第１の実施形態の分割履歴を示す説明図である。It is explanatory drawing which shows the division | segmentation log | history of the 1st Embodiment of this invention. 本発明の第１の実施形態のキーレンジ管理テーブルを示す説明図である。It is explanatory drawing which shows the key range management table of the 1st Embodiment of this invention. 本発明の第１の実施形態の増減型の値を示す説明図である。It is explanatory drawing which shows the increase / decrease type value of the 1st Embodiment of this invention. 本発明の第１の実施形態のパラメータ管理テーブルを示す説明図である。It is explanatory drawing which shows the parameter management table of the 1st Embodiment of this invention. 本発明の第１の実施形態の入力データにおけるレコード数の分布の一つ目の変化パターンを示す説明図である。It is explanatory drawing which shows the 1st change pattern of distribution of the number of records in the input data of the 1st Embodiment of this invention. 本発明の第１の実施形態の入力データにおけるレコード数の分布の二つ目の変化パターンを示す説明図である。It is explanatory drawing which shows the 2nd change pattern of distribution of the number of records in the input data of the 1st Embodiment of this invention. 本発明の第１の実施形態の入力データにおけるレコード数の分布の三つ目の変化パターンを示す説明図である。It is explanatory drawing which shows the 3rd change pattern of distribution of the number of records in the input data of the 1st Embodiment of this invention. 本発明の第１の実施形態の入力データにおけるレコード数の分布の四つ目の変化パターンを示す説明図である。It is explanatory drawing which shows the 4th change pattern of distribution of the number of records in the input data of the 1st Embodiment of this invention. 本発明の第１の実施形態のキーレンジ選択部による処理を示すフローチャートである。It is a flowchart which shows the process by the key range selection part of the 1st Embodiment of this invention. 本発明の第１の実施形態のキーレンジ選択部による履歴選択処理を示すフローチャートである。It is a flowchart which shows the history selection process by the key range selection part of the 1st Embodiment of this invention. 本発明の第１の実施形態のキーレンジ選択部によるキーレンジレコード数予測処理を示すフローチャートである。It is a flowchart which shows the key range record number prediction process by the key range selection part of the 1st Embodiment of this invention. 本発明の第１の実施形態のキーレンジ選択部によるキーレンジ選択処理を示すフローチャートである。It is a flowchart which shows the key range selection process by the key range selection part of the 1st Embodiment of this invention. 本発明の第１の実施形態のキーレンジ選択部によるキーレンジ再構成処理を示すフローチャートである。It is a flowchart which shows the key range reconstruction process by the key range selection part of the 1st Embodiment of this invention. 本発明の第１の実施形態のデータ分割部による入力データの分割処理を示すフローチャートである。It is a flowchart which shows the division | segmentation process of the input data by the data division part of the 1st Embodiment of this invention. 本発明の第２の実施形態のデータ分割部によるキーレンジの分割処理を示すフローチャートである。It is a flowchart which shows the division | segmentation process of the key range by the data division part of the 2nd Embodiment of this invention.

　以下、各図を参照しながら本発明の実施形態について説明する。各実施形態は、特許請求の範囲を限定するものではなく、また実施形態で説明されている特徴のすべてが発明の解決手段に必須であるとは限らない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Each embodiment does not limit the scope of the claims, and all the features described in the embodiment are not necessarily essential to the solution means of the invention.

　本実施形態の計算機システムは、ジョブに入力された複数の分割データの履歴からレコード数の変動型を判定し、判定結果に従って、レコード数を変更する分割データを決定する。そして、決定に従って、ジョブの実行において分割データを出力する。 The computer system according to the present embodiment determines the record type variation type from the history of a plurality of divided data input to the job, and determines the divided data for changing the number of records according to the determination result. Then, according to the determination, the divided data is output in the execution of the job.

　（第１の実施形態） (First embodiment)

　以下、本発明の第１の実施形態を、図面を参照して詳細に説明する。 Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings.

　図１は、本発明の第１の実施形態の計算機システム１のハードウェア構成を示すブロック図である。 FIG. 1 is a block diagram showing a hardware configuration of a computer system 1 according to the first embodiment of this invention.

　計算機システム１は、計算機１０と、記憶装置２０と、を備える。計算機システム１は、複数のジョブを実行するためのシステムである。 The computer system 1 includes a computer 10 and a storage device 20. The computer system 1 is a system for executing a plurality of jobs.

　記憶装置２０は、ハードディスクドライバ等の記憶装置である。記憶装置２０は、入力データ２１と、複数の分割データ２２と、複数の分割履歴１００とを含む。 The storage device 20 is a storage device such as a hard disk driver. The storage device 20 includes input data 21, a plurality of divided data 22, and a plurality of divided histories 100.

　入力データ２１は、複数のレコードを含む。分割データ２２は、入力データ２１に含まれるレコードが分割された後、分割されたレコードが格納される複数のファイルである。 The input data 21 includes a plurality of records. The divided data 22 is a plurality of files in which the divided records are stored after the records included in the input data 21 are divided.

　分割履歴１００は、過去にレコードが分割データ２２に分割された際の、分割データ２２に含まれるキーレンジ及びレコード数などを示す情報を含む。 The division history 100 includes information indicating a key range, the number of records, and the like included in the divided data 22 when the records are divided into the divided data 22 in the past.

　ここで、レコードは、プログラムに入力され、プログラムが処理する１件分のデータを示す。また、キーレンジは、キー値の範囲である。本実施形態のキー値は、レコードを特定するための位置を示す数値である。 Here, the record indicates one piece of data that is input to the program and processed by the program. The key range is a range of key values. The key value of this embodiment is a numerical value indicating a position for specifying a record.

　計算機１０は、レコードを処理する際にレコードを識別するためにキーを用いる。入力データ２１に含まれるレコードは、レコードのキー値が含まれるキーレンジに従って、各分割データ２２に出力される。本実施形態において、一つの分割データ２２は、一つのキーレンジに対応する。 The computer 10 uses a key to identify a record when processing the record. The record included in the input data 21 is output to each divided data 22 according to the key range including the key value of the record. In the present embodiment, one divided data 22 corresponds to one key range.

　ジョブは、レコードを処理するプログラムの実行単位である。ジョブは、毎日定まった時刻など、周期的に実行される場合が多い。この場合、いずれのジョブの実行においても実行されるプログラムは同じであるが、実行されるタイミングによってジョブへ入力される入力データは異なる。また、ジョブは、一般的に規定時間内に必ず終了することが求められる。 Job is an execution unit of a program that processes records. Jobs are often executed periodically, such as at a fixed time every day. In this case, the program executed in any job execution is the same, but the input data input to the job differs depending on the execution timing. Further, it is generally required that a job is always completed within a specified time.

　各分割データ２２を処理するジョブを並列に実行し、かつ、分割データ２２に含まれるレコード数が多すぎる場合、ジョブに後続する他のジョブは、レコード数が多い分割データ２２が処理されるまで、実行されずに待たされる。また、ジョブが実行される際に障害が起きた場合、ジョブを再実行するための時間が長くなり、ジョブが規定時間内に終了できないというリスクが増加する。 When a job for processing each divided data 22 is executed in parallel and the number of records included in the divided data 22 is too large, other jobs following the job are processed until the divided data 22 having a large number of records is processed. , Wait without being executed. Also, if a failure occurs when the job is executed, the time for re-executing the job becomes longer, and the risk that the job cannot be completed within a specified time increases.

　一方、各分割データ２２を処理するジョブを並列に実行し、かつ、分割データ２２に含まれるレコード数が少なすぎる場合、分割データ２２を処理する時間に対するジョブを起動するためのオーバヘッドの時間の割合が増え、ジョブ全体のスループットが低下する。このため、分割データ２２に含まれるレコード数は、一定範囲内であることが求められる。 On the other hand, when the job for processing each divided data 22 is executed in parallel and the number of records included in the divided data 22 is too small, the ratio of the overhead time for starting the job to the time for processing the divided data 22 Increases and the overall job throughput decreases. For this reason, the number of records included in the divided data 22 is required to be within a certain range.

　計算機１０は、主記憶装置１１と、プロセッサ１２と、入出力Ｉ／Ｆ１３とを含む。 The computer 10 includes a main storage device 11, a processor 12, and an input / output I / F 13.

　プロセッサ１２は、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）等の演算装置である。プロセッサ１２は、主記憶装置１１に含まれる命令コードをロードして実行する装置である。 The processor 12 is an arithmetic device such as a CPU (Central Processing Unit). The processor 12 is a device that loads and executes an instruction code included in the main storage device 11.

　入出力Ｉ／Ｆ１３は、記憶装置２０に接続するためのインタフェースであり、入力データ２１及び分割データ２２を、記憶装置２０から計算機１０へ転送するための装置である。 The input / output I / F 13 is an interface for connecting to the storage device 20, and is a device for transferring the input data 21 and the divided data 22 from the storage device 20 to the computer 10.

　主記憶装置１１は、キーレンジ選択部１０００及びデータ分割部１１００の命令コードを含む。また、主記憶装置１１は、レコードバッファ１４、キーレンジ管理テーブル３００及びパラメータ管理テーブル２００を含む。 The main storage device 11 includes instruction codes for the key range selection unit 1000 and the data division unit 1100. The main storage device 11 includes a record buffer 14, a key range management table 300, and a parameter management table 200.

　キーレンジ管理テーブル３００は、過去の分割データ２２のキーレンジの境界値（最小値及び最大値）とレコード数とを保持する。計算機システム１において複数の異なるジョブが実行される場合、主記憶装置１１は、ジョブごとにキーレンジ管理テーブル３００を保持する。 The key range management table 300 holds the boundary value (minimum value and maximum value) of the key range of the past divided data 22 and the number of records. When a plurality of different jobs are executed in the computer system 1, the main storage device 11 holds the key range management table 300 for each job.

　パラメータ管理テーブル２００は、キーレンジ選択部１０００及びデータ分割部１１００の処理における判定に必要なパラメータを保持する。例えば、パラメータ管理テーブル２００は、分割データ２２に含まれるレコード数の上限値及び下限値などを保持する。 The parameter management table 200 holds parameters necessary for determination in the processing of the key range selection unit 1000 and the data division unit 1100. For example, the parameter management table 200 holds an upper limit value and a lower limit value of the number of records included in the divided data 22.

　レコードバッファ１４は、入力データ２１に含まれるレコード、又は、レコードに割り当てられるキー値などを保持する。 The record buffer 14 holds a record included in the input data 21 or a key value assigned to the record.

　キーレンジ選択部１０００は、各分割データ２２に対応するキーレンジのレコード数、及び、レコード数の増減パターンを予測する機能を有する。また、キーレンジ選択部１０００は、予測の結果、含まれるレコード数が一定範囲内に収まらなくなる分割データ２２を、分割又は統合する必要性があるキーレンジに対応すると判定する機能を有する。 The key range selection unit 1000 has a function of predicting the number of records in the key range corresponding to each divided data 22 and the increase / decrease pattern of the number of records. In addition, the key range selection unit 1000 has a function of determining that the divided data 22 in which the number of records included does not fall within a certain range as a result of prediction corresponds to a key range that needs to be divided or integrated.

　データ分割部１１００は、入力データ２１を入力された場合、分割データ２２を出力する機能を有する。 The data dividing unit 1100 has a function of outputting the divided data 22 when the input data 21 is inputted.

　本実施形態の計算機１０は、分割データ２２を読み出してジョブを実行するが、計算機システム１は複数の計算機を有し、計算機１０以外の計算機が分割データ２２を読み出してジョブを実行してもよい。計算機システム１が有する計算機（計算機１０を含む）は、複数のプロセッサを有してよく、複数のジョブを並列に実行することによって、分割データ２２を処理してもよい。 The computer 10 of this embodiment reads the divided data 22 and executes a job. However, the computer system 1 may have a plurality of computers, and a computer other than the computer 10 may read the divided data 22 and execute the job. . The computer (including the computer 10) included in the computer system 1 may include a plurality of processors, and may process the divided data 22 by executing a plurality of jobs in parallel.

　また、計算機システム１は、本実施形態の後述する処理によって生成された分割データ２２を、計算機システム１とネットワークによって接続される他の計算機システムに送信してもよい。そして、他の計算機システムが、送信された分割データ２２をジョブによって並列に処理してもよい。 Further, the computer system 1 may transmit the divided data 22 generated by the processing described later in the present embodiment to another computer system connected to the computer system 1 via a network. Then, another computer system may process the transmitted divided data 22 in parallel by a job.

　図１に示すキーレンジ選択部１０００、及び、データ分割部１１００は、プログラムによって実装される。しかし、本実施形態の計算機１０は、キーレンジ選択部１０００又はデータ分割部１１００の機能を有する集積回路等の物理的な装置を備えることによって、キーレンジ選択部１０００、及び、データ分割部１１００の機能を実装してもよい。 The key range selection unit 1000 and the data division unit 1100 shown in FIG. 1 are implemented by a program. However, the computer 10 of the present embodiment includes a physical device such as an integrated circuit having the function of the key range selection unit 1000 or the data division unit 1100, so that the key range selection unit 1000 and the data division unit 1100 Functions may be implemented.

　また、キーレンジ選択部１０００及びデータ分割部１１００は、一つのプログラムによって実装されてもよい。また、キーレンジ選択部１０００及びデータ分割部１１００の各々を実装するプログラムは、メインプログラム及びサブプログラムなどの複数のプログラムによって実装されてもよい。 In addition, the key range selection unit 1000 and the data division unit 1100 may be implemented by a single program. In addition, a program for implementing each of the key range selection unit 1000 and the data division unit 1100 may be implemented by a plurality of programs such as a main program and a sub program.

　また、図１に示すキーレンジ管理テーブル３００及びパラメータ管理テーブル２００は、テーブルによって情報を保持するが、本実施形態のキーレンジ管理テーブル３００及びパラメータ管理テーブル２００は、ＣＳＶ等のいかなる方法によって情報を保持してもよい。また、分割履歴１００も、いかなる方法によって情報を保持してもよい。 Further, the key range management table 300 and the parameter management table 200 shown in FIG. 1 hold information by the table, but the key range management table 300 and the parameter management table 200 of the present embodiment can store information by any method such as CSV. It may be held. The division history 100 may also hold information by any method.

　図２は、本発明の第１の実施形態の分割データ２２を生成する処理を示すブロック図である。 FIG. 2 is a block diagram illustrating processing for generating the divided data 22 according to the first embodiment of this invention.

　キーレンジ選択部１０００は、ジョブ実行日、各キーレンジの境界値、及び、各キーレンジのレコード数などの情報を分割履歴１００から読み出し、キーレンジ管理テーブル３００に格納する。そして、キーレンジ選択部１０００は、キーレンジの増減パターン、及び、将来ジョブが実行される際のレコード数を予測し、予測された結果をパラメータ管理テーブル２００のパラメータと比較することによって、分割又は統合する必要性があるキーレンジを選択する。 The key range selection unit 1000 reads information such as the job execution date, the boundary value of each key range, and the number of records in each key range from the division history 100, and stores the information in the key range management table 300. Then, the key range selection unit 1000 predicts the increase / decrease pattern of the key range and the number of records when a future job is executed, and compares the predicted result with the parameter of the parameter management table 200, thereby dividing or Select the key range that needs to be merged.

　また、ジョブが未実行である場合など、記憶装置２０に分割履歴１００が格納されていない場合において、キーレンジ選択部１０００がすべてのキーレンジを生成し直したほうが性能上有利であると判定した場合、キーレンジ選択部１０００は、入力データ２１を読み出し、新たなキーレンジを生成する。そして、生成された新たなキーレンジを示す情報をキーレンジ管理テーブル３００に格納する。 In addition, when the division history 100 is not stored in the storage device 20 such as when the job has not been executed, it is determined that it is advantageous in terms of performance that the key range selection unit 1000 regenerates all the key ranges. In this case, the key range selection unit 1000 reads the input data 21 and generates a new key range. Then, information indicating the generated new key range is stored in the key range management table 300.

　データ分割部１１００は、キーレンジ管理テーブル３００からキーレンジの境界値を参照し、入力データ２１からレコードを読み出す。そして、参照された境界値にキー値が含まれるレコードを、各キーレンジに対応する分割データ２２に出力する。 The data dividing unit 1100 refers to the key range boundary value from the key range management table 300 and reads the record from the input data 21. Then, a record in which the key value is included in the referenced boundary value is output to the divided data 22 corresponding to each key range.

　第１の実施形態のデータ分割部１１００は、入力データ２１の一部を、新たな境界値を定めるためにレコードバッファ１４に一時的に格納する。また、データ分割部１１００は、ジョブを実行する毎に分割履歴１００を生成するために、各分割データ２２に含まれるレコード数をカウントし、各分割データ２２が対応するキーレンジと、各キーレンジに含まれるレコード数とを、分割履歴１００に格納する。 The data dividing unit 1100 of the first embodiment temporarily stores a part of the input data 21 in the record buffer 14 in order to determine a new boundary value. The data dividing unit 1100 counts the number of records included in each divided data 22 in order to generate the divided history 100 each time a job is executed, and the key range corresponding to each divided data 22 and each key range. Are stored in the division history 100.

　図３は、本発明の第１の実施形態の分割履歴１００を示す説明図である。 FIG. 3 is an explanatory diagram illustrating the division history 100 according to the first embodiment of this invention.

　分割履歴１００は、データ分割部１１００が実行されるごとにデータ分割部１１００によって出力されるファイルである。分割履歴１００の各ファイルには、値を格納する領域１０１～領域１０４が含まれる。 The division history 100 is a file output by the data dividing unit 1100 every time the data dividing unit 1100 is executed. Each file of the division history 100 includes areas 101 to 104 for storing values.

　領域１０１には、データ分割部１１００が実行された実行日時が格納される。すなわち、ジョブが実行されることによってデータ分割部１１００が実行された実行日時が格納される。すなわち、領域１０１が示す実行日時は、ジョブの実行日時である。 The area 101 stores an execution date and time when the data dividing unit 1100 is executed. That is, the execution date and time when the data dividing unit 1100 is executed by executing the job is stored. That is, the execution date and time indicated by the area 101 is the job execution date and time.

　領域１０２には、分割データ２２を入力するジョブの識別子を示すジョブ名が格納される。すなわち、データ分割部１１００が実行される際に、ユーザ又は管理者等から指定されたジョブを示すジョブ名が格納される。 In the area 102, a job name indicating an identifier of a job for inputting the divided data 22 is stored. That is, when the data dividing unit 1100 is executed, a job name indicating a job designated by a user or an administrator is stored.

　領域１０３には、ジョブに入力される入力データ２１の総データ量が格納される。領域１０４には、各分割データ２２に対応するキーレンジの境界値と、各分割データ２２のレコード数と、を示す情報が格納される。 The area 103 stores the total amount of input data 21 input to the job. Information indicating the boundary value of the key range corresponding to each divided data 22 and the number of records of each divided data 22 is stored in the area 104.

　図４は、本発明の第１の実施形態のキーレンジ管理テーブル３００を示す説明図である。 FIG. 4 is an explanatory diagram illustrating the key range management table 300 according to the first embodiment of this invention.

　キーレンジ管理テーブル３００は、計算機システム１において複数のジョブが実行される場合、各ジョブに対応して主記憶装置１１に格納される。 The key range management table 300 is stored in the main storage device 11 corresponding to each job when a plurality of jobs are executed in the computer system 1.

　キーレンジ管理テーブル３００は、各キーレンジに対応するエントリを保持する。各エントリは、最小値３０１、最大値３０２、履歴３０３、増減型３０４、予測数３０５、レコード実数３０６、及び、分割点調査フラグ３０７を含む。 The key range management table 300 holds an entry corresponding to each key range. Each entry includes a minimum value 301, a maximum value 302, a history 303, an increase / decrease type 304, a predicted number 305, a record real number 306, and a division point survey flag 307.

　最小値３０１は、キーレンジの最小値を示す。最大値３０２は、キーレンジの最大値を示す。 The minimum value 301 indicates the minimum value of the key range. The maximum value 302 indicates the maximum value of the key range.

　履歴３０３は、過去の分割履歴１００に含まれるキーレンジのレコード数を示す。履歴３０３は、過去の分割履歴１００ごとに、レコード数を示す領域を含む。 The history 303 indicates the number of records in the key range included in the past division history 100. The history 303 includes an area indicating the number of records for each past division history 100.

　図４に示す履歴３０３は、履歴１（３０３ａ）、履歴２（３０３ｂ）及び履歴３（３０３ｃ）の領域に、過去３回の分割履歴１００に含まれたキーレンジのレコード数を示す。 The history 303 shown in FIG. 4 indicates the number of records in the key range included in the past three divided history 100 in the history 1 (303a), history 2 (303b), and history 3 (303c) areas.

　増減型３０４は、キーレンジのレコード数の増減パターンを示す。予測数３０５は、レコード数の増減パターンと履歴３０３とから求められた、ジョブが将来実行される際の各キーレンジにおけるレコード数の予測数を示す。 Increase / decrease type 304 indicates an increase / decrease pattern of the number of records in the key range. The predicted number 305 indicates the predicted number of records in each key range when the job is executed in the future, which is obtained from the record number increase / decrease pattern and the history 303.

　レコード実数３０６は、実際に分割データ２２に出力されたレコード数を示す。分割点調査フラグ３０７は、キーレンジ選択部１０００が、予測数３０５が所定の上限値を上回るため、分割すべきであると判定したキーレンジに関して、"ＯＮ"を示す値を保持する領域である。 The record real number 306 indicates the number of records actually output to the divided data 22. The division point investigation flag 307 is an area that holds a value indicating “ON” for the key range that the key range selection unit 1000 has determined to be divided because the predicted number 305 exceeds a predetermined upper limit value. .

　なお、図４に示すキーレンジ管理テーブル３００は、履歴３０３が履歴１（３０３ａ）～履歴３（３０３ｃ）の三つの履歴を含むが、二つ以上の履歴を含めば、いくつ履歴を含んでもよい。また、以下に示す本実施形態の増減型３０４は、増減型０、増減型＋１、増減型＋２、増減型－１、及び、増減型－２のいずれかの値を示す。 In the key range management table 300 shown in FIG. 4, the history 303 includes three histories of history 1 (303a) to history 3 (303c), but any number of histories may be included as long as two or more histories are included. . In addition, the increase / decrease type 304 of the present embodiment shown below indicates any one of the increase / decrease type 0, the increase / decrease type + 1, the increase / decrease type + 2, the increase / decrease type-1, and the increase / decrease type-2.

　図５は、本発明の第１の実施形態の増減型３０４の値を示す説明図である。 FIG. 5 is an explanatory diagram illustrating values of the increase / decrease type 304 according to the first embodiment of this invention.

　図５に示す線グラフ５０１～線グラフ５０５は、一つのキーレンジにおけるレコード数の例を示す。図５に示す横軸が、過去の分割履歴１００が示す実行日時（図５において、履歴１～履歴３によって示す）を示す。縦軸が、各分割履歴１００が示すレコード数を示す。 The line graph 501 to line graph 505 shown in FIG. 5 show examples of the number of records in one key range. The horizontal axis shown in FIG. 5 indicates the execution date and time (indicated by history 1 to history 3 in FIG. 5) indicated by the past division history 100. The vertical axis indicates the number of records indicated by each division history 100.

　図５における履歴１、履歴２及び履歴３は、図４に示す履歴１（３０３ａ）、履歴２（３０３ｂ）、及び、履歴３（３０３ｃ）に対応する。履歴１の実行日時が最も新しく、履歴３の実行日時が最も古い。 5 corresponds to the history 1 (303a), the history 2 (303b), and the history 3 (303c) shown in FIG. The execution date and time of history 1 is the latest, and the execution date and time of history 3 is the oldest.

　線グラフ５０１～線グラフ５０５が示すレコード数は、履歴３のレコード数を基準とした相対値である。 The number of records indicated by the line graph 501 to the line graph 505 is a relative value based on the number of records in the history 3.

　線グラフ５０２は、増減型＋１の増減パターンを示す。増減型＋１は、各履歴間のレコード数の差である差ｄ２３（＝履歴２のレコード数－履歴３のレコード数）と差ｄ１２（＝履歴１のレコード数－履歴２のレコード数）とがともに正値であり、差ｄ２３と差ｄ１２との差が所定の範囲（例えば、パラメータ管理テーブル２００が有する後述のレコード数閾値２０３から算出される範囲）内である場合の増減パターンである。 The line graph 502 shows an increase / decrease pattern of increase / decrease type + 1. The increase / decrease type +1 includes a difference d23 (= the number of records in history 2−the number of records in history 3) and a difference d12 (= the number of records in history 1−the number of records in history 2) which are differences in the number of records between the respective histories. Both are positive values, and the increase / decrease pattern when the difference between the difference d23 and the difference d12 is within a predetermined range (for example, a range calculated from a later-described record number threshold 203 included in the parameter management table 200).

　線グラフ５０４は、増減型－１の増減パターンを示す。増減型－１は、差ｄ２３と差ｄ１２とがともに負値であり、差ｄ２３と差ｄ１２との差が所定の範囲以内である場合の増減パターンである。 The line graph 504 shows an increase / decrease pattern of increase / decrease type-1. The increase / decrease type-1 is an increase / decrease pattern when both the difference d23 and the difference d12 are negative values and the difference between the difference d23 and the difference d12 is within a predetermined range.

　線グラフ５０１は、増減型＋２の増減パターンを示す。増減型＋２は、差ｄ２３と差ｄ１２とがともに正値であり、差ｄ１２が差ｄ２３よりも所定の範囲を超えて大きい場合の増減パターンである。 The line graph 501 shows an increase / decrease pattern of increase / decrease type +2. The increase / decrease type +2 is an increase / decrease pattern when the difference d23 and the difference d12 are both positive values and the difference d12 is larger than the difference d23 beyond a predetermined range.

　線グラフ５０５は、増減型－２の増減パターンを示す。増減型－２は、差ｄ２３と差ｄ１２とがともに負値であり、差ｄ１２が差ｄ２３よりも所定の範囲を超えて大きい場合の増減パターンである。 A line graph 505 shows an increase / decrease pattern of increase / decrease type-2. The increase / decrease type-2 is an increase / decrease pattern in which both the difference d23 and the difference d12 are negative values and the difference d12 is larger than the difference d23 beyond a predetermined range.

　線グラフ５０３は、増減型０の増減パターンを示す。増減型０は、差ｄ２３と差ｄ１２とが所定の範囲以内の数である場合の増減パターン、又は、差ｄ２３と差ｄ１２との符号が異なる場合の増減パターンである。 The line graph 503 shows an increase / decrease pattern of the increase / decrease type 0. The increase / decrease type 0 is an increase / decrease pattern when the difference d23 and the difference d12 are numbers within a predetermined range, or an increase / decrease pattern when the signs of the difference d23 and the difference d12 are different.

　本実施形態において、増減型＋１及び増減型－１は、ほぼ線形に増加又は減少する増減パターンである。また、増減型＋２及び増減型－１は、例えば指数関数のように、非線形に増加又は減少する増減パターンである。 In this embodiment, the increase / decrease type +1 and the increase / decrease type −1 are increase / decrease patterns that increase or decrease substantially linearly. In addition, the increase / decrease type +2 and the increase / decrease type −1 are increase / decrease patterns that increase or decrease nonlinearly, such as an exponential function.

　本実施形態のキーレンジ選択部１０００は、前述のような各履歴３０３間のレコード数の増減パターンを判定し、判定された増減パターンと、あらかじめ保持された数式又は規則とによって、ジョブが将来実行された際のレコード数の予測数を算出できる。 The key range selection unit 1000 according to the present embodiment determines the increase / decrease pattern of the number of records between the histories 303 as described above, and the job is executed in the future based on the determined increase / decrease pattern and a mathematical expression or rule stored in advance. It is possible to calculate the estimated number of records when the record is made.

　なお、分割履歴１００が二つのみである場合、本実施形態のキーレンジ選択部１０００は、増減パターンとして増減型＋１、増減型０、又は、増減型―１を判定してもよい。 When there are only two division histories 100, the key range selection unit 1000 of this embodiment may determine increase / decrease type + 1, increase / decrease type 0, or increase / decrease type-1 as the increase / decrease pattern.

　図６は、本発明の第１の実施形態のパラメータ管理テーブル２００を示す説明図である。 FIG. 6 is an explanatory diagram illustrating the parameter management table 200 according to the first embodiment of this invention.

　パラメータ管理テーブル２００は、キーレンジ選択部１０００及びデータ分割部１１００による処理において用いられるパラメータを保持する。パラメータ管理テーブル２００が保持するパラメータは、ユーザ又は管理者等によってあらかじめ計算機システム１に入力された値である。 The parameter management table 200 holds parameters used in processing by the key range selection unit 1000 and the data division unit 1100. The parameters stored in the parameter management table 200 are values input to the computer system 1 in advance by a user or an administrator.

　パラメータ管理テーブル２００は、分布変動型２０１、ジョブ名２０２、レコード数閾値２０３、データ量閾値２０４、上限レコード数２０５、下限レコード数２０６、及び再構成下限率２０７を含む。 The parameter management table 200 includes a distribution variation type 201, a job name 202, a record number threshold 203, a data amount threshold 204, an upper limit record number 205, a lower limit record number 206, and a reconstruction lower limit rate 207.

　分布変動型２０１は、各キーレンジに含まれるレコード数の、入力データ２１における分布の変化のパターンを示す。 The distribution variation type 201 indicates a distribution change pattern in the input data 21 for the number of records included in each key range.

　ジョブ名２０２は、入力データ２１を入力するジョブの識別子を示す。 The job name 202 indicates the identifier of the job that inputs the input data 21.

　レコード数閾値２０３は、キーレンジ選択部１０００が、分布変動型を判定するための閾値である。データ量閾値２０４は、キーレンジ選択部１０００が、キーレンジに含まれるレコード数の分布が適切であるか否かを判定するために適した分割履歴１００を選択するための閾値である。 The record number threshold 203 is a threshold for the key range selection unit 1000 to determine the distribution variation type. The data amount threshold value 204 is a threshold value for the key range selection unit 1000 to select the division history 100 suitable for determining whether the distribution of the number of records included in the key range is appropriate.

　上限レコード数２０５は、各キーレンジのレコードの予測数と比較され、各キーレンジが分割すべきであるかを判定するための閾値である。下限レコード数２０６は、各キーレンジのレコードの予測数と比較され、各キーレンジがいずれかのキーレンジと統合されるべきかを判定するための閾値である。 The upper limit record number 205 is a threshold value for determining whether each key range should be divided by comparing with the predicted number of records in each key range. The lower limit record number 206 is a threshold value for determining whether each key range should be integrated with any key range by comparing with the predicted number of records in each key range.

　再構成下限率２０７は、入力データ２１に含まれるすべてのキーレンジを、再度生成しなおす必要があるか否かを判定するための閾値である。 The reconstruction lower limit rate 207 is a threshold value for determining whether or not it is necessary to regenerate all the key ranges included in the input data 21.

　なお、例えば、キーレンジ選択部１０００は、履歴１の総レコード数と履歴２の総レコード数との差を、履歴１における総レコード数によって割った数と、レコード数閾値２０３とを比較する。これによって、キーレンジ選択部１０００は、入力データ２１に含まれる総レコード数の多少にかかわらず、レコード数の増加率（又は減少率）によって分布変動型を判定できる。 For example, the key range selection unit 1000 compares the difference between the total number of records in the history 1 and the total number of records in the history 2 by the total number of records in the history 1 with the record number threshold 203. Accordingly, the key range selection unit 1000 can determine the distribution variation type based on the increase rate (or decrease rate) of the number of records regardless of the total number of records included in the input data 21.

　また、例えば、キーレンジ選択部１０００は、履歴１における入力データ２１の総データ量と履歴２における入力データ２１の総データ量との差を、履歴１における入力データ２１の総データ量によって割った数と、データ量閾値２０４とを比較する。これによって、キーレンジ選択部１０００は、後述する履歴選択処理１０１０において適当な分割履歴１００を選択できる。そして、キーレンジ選択部１０００は、極端に入力データ２１の総データ量が増加又は減少した場合に、入力データ２１のキーレンジを再構成することができる。 For example, the key range selection unit 1000 divides the difference between the total data amount of the input data 21 in the history 1 and the total data amount of the input data 21 in the history 2 by the total data amount of the input data 21 in the history 1. The number is compared with the data amount threshold value 204. Thereby, the key range selection unit 1000 can select an appropriate division history 100 in the history selection processing 1010 described later. The key range selection unit 1000 can reconfigure the key range of the input data 21 when the total data amount of the input data 21 is extremely increased or decreased.

　本実施形態において、レコード数閾値２０３及びデータ量閾値２０４の値は、分布変動型２０１に対応しない値である。しかし、パラメータ管理テーブル２００は、複数の上限レコード数２０５、複数の下限レコード数２０６、及び複数の再構成下限率２０７を保持してもよく、キーレンジ選択部１０００は、分布変動型２０１に格納される値に従って、上限レコード数２０５、下限レコード数２０６、又は、再構成下限率２０７の値を選択してもよい。 In this embodiment, the values of the record number threshold 203 and the data amount threshold 204 are values that do not correspond to the distribution variation type 201. However, the parameter management table 200 may hold a plurality of upper limit record numbers 205, a plurality of lower limit record numbers 206, and a plurality of reconstruction lower limit rates 207, and the key range selection unit 1000 stores them in the distribution variation type 201. The upper limit record number 205, lower limit record number 206, or reconstruction lower limit rate 207 value may be selected according to the value to be set.

　また、本実施形態のパラメータ管理テーブル２００は、後述する処理において用いられる所定の値、又は、所定の範囲を示す数値等を保持してもよい。各処理において用いられる所定の値、又は、所定の範囲を示す数値等は、ユーザ又は管理者等によって、パラメータ管理テーブル２００に格納される。 Further, the parameter management table 200 of the present embodiment may hold a predetermined value used in processing to be described later, or a numerical value indicating a predetermined range. A predetermined value or a numerical value indicating a predetermined range used in each process is stored in the parameter management table 200 by a user or an administrator.

　図７Ａは、本発明の第１の実施形態の入力データ２１におけるレコード数の分布の一つ目の変化パターンを示す説明図である。 FIG. 7A is an explanatory diagram illustrating a first change pattern of the distribution of the number of records in the input data 21 according to the first embodiment of this invention.

　図６に示す分布変動型２０１には、後述する処理によって、分布変動型Ａ、分布変動型Ｂ、分布変動型Ｃ、又は、分布変動型Ｄのいずれかを示す値が格納される。図７Ａは、分布変動型Ａの変化パターンを示す。 6 stores a value indicating one of the distribution variation type A, the distribution variation type B, the distribution variation type C, or the distribution variation type D by processing described later. FIG. 7A shows a variation pattern of the distribution variation type A.

　図７Ａ、及び、後述する図７Ｂ～図７Ｄに示す横軸は、入力データ２１に含まれるレコードのキーレンジである。また、図７Ａ、及び、後述する図７Ｂ～図７Ｄに示す縦軸は、各キーレンジにおけるレコード数である。各線グラフは、分割履歴１００が示す各キーレンジにおけるレコード数を示し、各履歴が示すジョブの実行時のレコード数の分布を示す。 7A and the horizontal axis shown in FIGS. 7B to 7D described later are the key ranges of records included in the input data 21. In FIG. Also, the vertical axis shown in FIG. 7A and FIGS. 7B to 7D described later is the number of records in each key range. Each line graph indicates the number of records in each key range indicated by the division history 100, and indicates the distribution of the number of records when the job indicated by each history is executed.

　ジョブの入力データ２１は、ジョブを実行するごとに異なり、データ量又はレコード数も変化する場合が多い。一方で、入力データ２１のレコード数は、一般的に、規則的に変化する。 The job input data 21 differs every time the job is executed, and the data amount or the number of records often changes. On the other hand, the number of records of the input data 21 generally changes regularly.

　具体的には、各入力データ２１には、継続して業務が実行された結果生成され、特定の時間ごとに区切られたレコードが含まれる。そして、この入力データ２１を処理対象としたジョブが周期的に実行される。このため、入力データ２１に含まれるレコードのデータ量の分布、及び、各キーレンジにおけるレコード数の分布は、業務が実行された時刻、日、又は、月によって変化し、また、ある程度の規則性によって変化する。 Specifically, each input data 21 includes a record that is generated as a result of continuous business execution and is divided at specific times. Then, a job that uses the input data 21 as a processing target is periodically executed. For this reason, the distribution of the data amount of the records included in the input data 21 and the distribution of the number of records in each key range vary depending on the time, day, or month when the business is executed, and have a certain degree of regularity. It depends on.

　このため、キーレンジ選択部１０００は、キーレンジ管理テーブル３００が示す履歴３０３のレコード数を参照することによって、入力データ２１の各キーレンジにおけるレコード数の分布の変化パターンを、分布変動型Ａ、分布変動型Ｂ、分布変動型Ｃ、又は、分布変動型Ｄのいずれかの型に分類することができる。 For this reason, the key range selection unit 1000 refers to the number of records in the history 303 indicated by the key range management table 300, so that the change pattern of the distribution of the number of records in each key range of the input data 21 is changed to the distribution variation type A, The distribution variation type B, the distribution variation type C, or the distribution variation type D can be classified.

　分布変動型Ａは、ジョブが実行された月、日、又は、曜日などの周期的なタイミングにおいて、総データ量が所定の範囲に含まれ、かつ、総レコード数がある所定の範囲に含まれる場合のレコード数の変化パターンである。例えば、入力データ２１が取引に関するレコードを含み、かつ、ジョブが月末の締切日に取引が集中する業務において半月に一度実行される場合、入力データ２１のレコード数は、分布変動型Ａのように変化する。 In the distribution variation type A, the total amount of data is included in a predetermined range and the total number of records is included in a predetermined range at a periodic timing such as a month, a day, or a day of the week when the job is executed. It is a change pattern of the number of records in the case. For example, when the input data 21 includes records relating to transactions, and the job is executed once every half month in a business in which transactions are concentrated on the deadline at the end of the month, the number of records in the input data 21 is as follows: Change.

　例えば、図７Ａに示す履歴１の総レコード数と履歴３の総レコード数とは、ほぼ同じ数であり、ユーザ又は管理者によってあらかじめ与えられた所定の範囲に含まれる。しかし、図７Ａに示す履歴２の総レコード数は、履歴１の総レコード数と大きく相違し、所定の範囲に含まれない。 For example, the total number of records in the history 1 and the total number of records in the history 3 shown in FIG. 7A are substantially the same, and are included in a predetermined range given in advance by the user or the administrator. However, the total number of records in history 2 shown in FIG. 7A is significantly different from the total number of records in history 1 and is not included in the predetermined range.

　このように、周期的に総レコード数が所定の範囲に含まれる場合、キーレンジ選択部１０００は、入力データ２１に含まれるレコード数の分布の変化パターンを、分布変動型Ａに分類することができる。 Thus, when the total number of records is periodically included in the predetermined range, the key range selection unit 1000 may classify the change pattern of the distribution of the number of records included in the input data 21 into the distribution variation type A. it can.

　入力データ２１が分布変動型Ａであり、かつ、レコード数が所定の値を上回るキーレンジが所定の割合で入力データ２１にある場合、キーレンジ選択部１０００は、入力データ２１におけるキーレンジを再構成する必要がある。 When the input data 21 is the distribution variation type A and the key range in which the number of records exceeds a predetermined value is present in the input data 21 at a predetermined ratio, the key range selection unit 1000 re-keys the key range in the input data 21. Must be configured.

　図７Ｂは、本発明の第１の実施形態の入力データ２１におけるレコード数の分布の二つ目の変化パターンを示す説明図である。 FIG. 7B is an explanatory diagram illustrating a second change pattern of the record number distribution in the input data 21 according to the first embodiment of this invention.

　図７Ｂは、分布変動型Ｂの変化パターンを示す。分布変動型Ｂは、入力データ２１に含まれる総データ量が、ジョブが実行されるごとに緩やかに増加若しくは減少するか、又は、各キーレンジのレコード数の割合が、ジョブが実行されるごとに緩やかに増加若しくは減少するパターンである。 FIG. 7B shows a distribution variation type B change pattern. In the distribution variation type B, the total amount of data included in the input data 21 gradually increases or decreases every time the job is executed, or the ratio of the number of records in each key range is increased every time the job is executed. It is a pattern that gradually increases or decreases.

　図７Ｂに示す各キーレンジにおけるレコード数は、実行された日が新しくなるほど緩やかに増加する。 The number of records in each key range shown in FIG. 7B increases gradually as the date of execution becomes new.

　入力データ２１が分布変動型Ｂであり、かつ、レコード数が所定の値を超えるキーレンジが所定の割合で入力データ２１にある場合、キーレンジ選択部１０００は、入力データ２１におけるキーレンジを再構成する必要がある。 When the input data 21 is the distribution variation type B and the key range in which the number of records exceeds a predetermined value is present in the input data 21 at a predetermined ratio, the key range selection unit 1000 re-keys the key range in the input data 21. Must be configured.

　図７Ｃは、本発明の第１の実施形態の入力データ２１におけるレコード数の分布の三つ目の変化パターンを示す説明図である。 FIG. 7C is an explanatory diagram illustrating a third change pattern of the distribution of the number of records in the input data 21 according to the first embodiment of this invention.

　図７Ｃは、分布変動型Ｃの変化パターンを示す。分布変動型Ｃは、最終キーレンジのレコード数の増加率が、他のキーレンジのレコード数の増加率よりも高いパターンである。本実施形態における最終キーレンジとは、キーレンジの最小値が他のどのキーレンジのキー値よりも大きいキーレンジである。 FIG. 7C shows a change pattern of the distribution variation type C. The distribution variation type C is a pattern in which the rate of increase in the number of records in the final key range is higher than the rate of increase in the number of records in other key ranges. The final key range in this embodiment is a key range in which the minimum value of the key range is larger than the key value of any other key range.

　例えば、業務が実行されるごとに、新規顧客又は新規商品を示すレコードが、入力データ２１のキーの最後尾に追加される場合、入力データ２１に含まれるレコード数は分布変動型Ｃのように変化する。 For example, when a record indicating a new customer or a new product is added to the end of the key of the input data 21 every time the business is executed, the number of records included in the input data 21 is as in the distribution variation type C. Change.

　具体的には、図７Ｃに示すキーレンジ（２０１－３００）は、最終キーレンジである。キーレンジ（２０１－３００）のレコード数は、他のキーレンジ（００１－１００、１０１－２００）のレコード数よりも、ジョブが実行されるごとに増加する数が多い。 Specifically, the key range (201-300) shown in FIG. 7C is the final key range. The number of records in the key range (201-300) increases more each time a job is executed than the number of records in the other key ranges (001-100, 101-200).

　入力データ２１が分布変動型Ｃであり、かつ、最終キーレンジにおけるレコード数が所定の値を超えた場合、データ分割部１１００は、最終キーレンジを分割する必要がある。 When the input data 21 is the distribution variation type C and the number of records in the final key range exceeds a predetermined value, the data dividing unit 1100 needs to divide the final key range.

　図７Ｄは、本発明の第１の実施形態の入力データ２１におけるレコード数の分布の四つ目の変化パターンを示す説明図である。 FIG. 7D is an explanatory diagram showing a fourth change pattern of the distribution of the number of records in the input data 21 according to the first embodiment of this invention.

　図７Ｄは、分布変動型Ｄの変化パターンを示す。分布変動型Ｄは、一つ又は複数のキーレンジのレコード数のみが急激に増加又は減少するパターンである。例えば、特定の種類の商品に取引が集中する場合に、特定の商品を示すレコードを含むキーレンジにおけるレコード数は、急激に増加する。 FIG. 7D shows a variation pattern of the distribution variation type D. The distribution variation type D is a pattern in which only the number of records in one or a plurality of key ranges rapidly increases or decreases. For example, when transactions concentrate on a specific type of product, the number of records in the key range including a record indicating the specific product increases rapidly.

　具体的には、図７Ｄに示すキーレンジ（００１－１００）及びキーレンジ（２０１－３００）のレコード数は、ジョブが実行されるごとに増加又は減少する。 Specifically, the number of records in the key range (001-100) and key range (201-300) shown in FIG. 7D increases or decreases each time the job is executed.

　入力データ２１が分布変動型Ｄであり、かつ、レコード数が所定の値を超えるキーレンジが所定の割合で入力データ２１に含まれる場合、キーレンジ選択部１０００は、レコード数が所定の範囲外となるキーレンジを分割又は統合する必要がある。 When the input data 21 is the distribution variation type D and the key range in which the number of records exceeds a predetermined value is included in the input data 21 at a predetermined ratio, the key range selection unit 1000 determines that the number of records is out of the predetermined range. It is necessary to divide or integrate the key range.

　図８は、本発明の第１の実施形態のキーレンジ選択部１０００による処理を示すフローチャートである。 FIG. 8 is a flowchart showing processing by the key range selection unit 1000 according to the first embodiment of this invention.

　キーレンジ選択部１０００は、図８に示す処理によって、入力データ２１のキーレンジを再構成するか、又は、入力データ２１に分割又は統合する必要があるキーレンジが含まれるかを、判定できる。 The key range selection unit 1000 can determine whether the key range of the input data 21 is reconfigured or the key range that needs to be divided or integrated into the input data 21 is included by the process shown in FIG.

　キーレンジ選択部１０００は、入力データ２１が入力されれば、図８に示す処理をいかなるタイミングにおいて実行してもよい。すなわち、ユーザ又は管理者によって指定されたタイミングに、最新の入力データ２１を用いて図８に示す処理を実行してもよいし、ジョブが実行された後に、最新の入力データ２１を用いて、図８に示す処理を実行してもよい。 The key range selection unit 1000 may execute the process shown in FIG. 8 at any timing as long as the input data 21 is input. That is, the processing shown in FIG. 8 may be executed using the latest input data 21 at the timing designated by the user or the administrator, or after the job is executed, the latest input data 21 is used. You may perform the process shown in FIG.

　なお、キーレンジ選択部１０００は、図８に示す処理を行う際、図８に示す処理を行う入力データ２１のジョブ名を、ユーザ又は管理者等によってあらかじめ指定される。これによって、キーレンジ選択部１０００は、ジョブ名２０２が指定されたジョブ名を示すパラメータ管理テーブル２００を、図８に示す処理の開始時において読み出す。 When the key range selection unit 1000 performs the process shown in FIG. 8, the job name of the input data 21 for performing the process shown in FIG. 8 is designated in advance by a user or an administrator. As a result, the key range selection unit 1000 reads the parameter management table 200 indicating the job name for which the job name 202 is designated at the start of the processing shown in FIG.

　また、図８に示す処理の開始時において読み出されるパラメータ管理テーブル２００は、分布変動型２０１に値を保持していない。 Also, the parameter management table 200 read at the start of the process shown in FIG.

　図８に示す処理が開始された場合、キーレンジ選択部１０００は、後述する履歴選択処理１０１０を実行する。キーレンジ選択部１０００は、履歴選択処理１０１０によって、レコード数を予測する後述の処理に用いる分割履歴１００を、記憶装置２０から複数選択する。 When the process shown in FIG. 8 is started, the key range selection unit 1000 executes a history selection process 1010 described later. The key range selection unit 1000 selects, from the storage device 20, a plurality of division histories 100 to be used for the later-described process for predicting the number of records by the history selection process 1010.

　具体的には、キーレンジ選択部１０００は履歴選択処理１０１０によって、領域１０２が示す総データ量が所定の値以内の分割履歴１００であり、かつ、キーレンジ選択部１０００にあらかじめ指定されたジョブ名を領域１０２が示す複数の分割履歴１００を、選択できる。 Specifically, the key range selection unit 1000 uses the history selection process 1010 to determine that the total data amount indicated by the area 102 is the division history 100 within a predetermined value, and the job name specified in advance in the key range selection unit 1000. A plurality of division histories 100 indicated by the area 102 can be selected.

　また、キーレンジ選択部１０００は、履歴選択処理１０１０において、選択された分割履歴１００に含まれる情報を、キーレンジ管理テーブル３００に格納する。具体的には、キーレンジ選択部１０００は、選択された分割履歴１００の領域１０３が示す各キーレンジにおけるレコード数を、キーレンジ管理テーブル３００の履歴３０３に格納する。 Also, the key range selection unit 1000 stores information included in the selected division history 100 in the key range management table 300 in the history selection processing 1010. Specifically, the key range selection unit 1000 stores the number of records in each key range indicated by the area 103 of the selected division history 100 in the history 303 of the key range management table 300.

　履歴選択処理１０１０の後、キーレンジ選択部１０００は、履歴選択処理１０１０においてキーレンジ選択部１０００が、分割履歴１００を選択できたか否かを判定する（１００１）。例えば、ジョブがまだ実行されておらず、記憶装置２０に分割履歴１００が格納されていないために、キーレンジ選択部１０００が分割履歴１００を選択できない場合、キーレンジ選択部１０００は、後述するキーレンジ再構成処理１０４０を実行する。 After the history selection process 1010, the key range selection unit 1000 determines whether or not the key range selection unit 1000 has selected the division history 100 in the history selection process 1010 (1001). For example, if the key range selection unit 1000 cannot select the division history 100 because the job has not yet been executed and the division history 100 is not stored in the storage device 20, the key range selection unit 1000 uses the key described later. A range reconstruction process 1040 is executed.

　また、記憶装置２０に格納される各分割履歴１００の総データ量が、すべて、入力データ２１の総データ量よりも大幅に異なる場合、キーレンジ選択部１０００は、履歴選択処理１０１０において分割履歴１００を選択できない。この場合、入力データ２１の総データ量が急激に増加又は減少したため、入力データ２１のキーレンジを生成し直す必要がある。このため、キーレンジ選択部１０００は、後述するキーレンジ再構成処理１０４０を実行する。 When the total data amount of each division history 100 stored in the storage device 20 is significantly different from the total data amount of the input data 21, the key range selection unit 1000 performs the division history 100 in the history selection processing 1010. Cannot be selected. In this case, since the total data amount of the input data 21 has suddenly increased or decreased, it is necessary to regenerate the key range of the input data 21. For this reason, the key range selection unit 1000 executes a key range reconstruction process 1040 described later.

　分割履歴１００が選択できたとステップ１００１において判定された場合、キーレンジ選択部１０００は、選択された各分割履歴１００が示すジョブの実行日と、選択された各分割履歴１００が示す総レコード数とを参照し、各分割履歴１００の総レコード数が周期的に所定の範囲に含まれるか否かを判定する。 When it is determined in step 1001 that the division history 100 has been selected, the key range selection unit 1000 displays the job execution date indicated by each selected division history 100 and the total number of records indicated by each selected division history 100. , It is determined whether or not the total number of records of each division history 100 is periodically included in a predetermined range.

　そして、キーレンジ選択部１０００は、各分割履歴１００の総レコード数が周期的に所定の範囲に含まれると判定した場合、選択された複数の分割履歴１００の分布変動型を、分布変動型Ａに決定する。そして、パラメータ管理テーブル２００の分布変動型２０１に分布変動型Ａを示す値を格納する（１００２）。 When the key range selection unit 1000 determines that the total number of records of each division history 100 is periodically included in the predetermined range, the distribution range type of the plurality of selected division histories 100 is changed to the distribution variation type A. To decide. Then, a value indicating the distribution variation type A is stored in the distribution variation type 201 of the parameter management table 200 (1002).

　ステップ１００２においてキーレンジ選択部１０００は、履歴選択処理１０１０によって選択された各分割履歴１００の総レコード数を特定し、分割履歴１００間の総レコード数の差を算出する。そして、算出された差を、例えば、領域１０１が新しい実行日時を示す分割履歴１００の総レコード数によって割った値が、所定の変化率以内である場合、キーレンジ選択部１０００は、各分割履歴１００における総レコード数が所定の範囲に含まれると判定する。 In step 1002, the key range selection unit 1000 specifies the total number of records in each division history 100 selected by the history selection processing 1010, and calculates the difference in the total number of records between the division histories 100. Then, when the value obtained by dividing the calculated difference by the total number of records of the division history 100 in which the area 101 indicates a new execution date and time is within a predetermined change rate, the key range selection unit 1000 determines each division history It is determined that the total number of records in 100 is included in the predetermined range.

　ここで、所定の変化率は、パラメータ管理テーブル２００のレコード数閾値２０３でもよいし、ユーザ又は管理者等によってキーレンジ選択部１０００にあらかじめ指定された値でもよい。 Here, the predetermined change rate may be the record number threshold 203 of the parameter management table 200, or may be a value designated in advance in the key range selection unit 1000 by a user or an administrator.

　そして、各分割履歴１００における総レコード数が所定の範囲に含まれると判定され、かつ、各分割履歴１００の領域１０１が示す実行日時が周期的である場合、キーレンジ選択部は、各分割履歴１００の総レコード数が周期的に所定の範囲に含まれると判定する。 When it is determined that the total number of records in each division history 100 is included in the predetermined range, and the execution date and time indicated by the area 101 of each division history 100 is periodic, the key range selection unit It is determined that the total number of 100 records is periodically included in the predetermined range.

　なお、キーレンジ選択部１０００は、ステップ１００２における判定のための条件、すなわち、実行日時の周期をあらかじめ保持する。例えば、キーレンジ選択部１０００は、カレンダーの日付と曜日とをあらかじめ保持することによって、ステップ１００２において、分割履歴１００の領域１０１が示す実行日時が、毎月末であるか、又は、毎月曜であるかを判定する。これによって、キーレンジ選択部１０００は、周期的に総レコード数が所定の範囲に含まれるか否かを判定できる。 Note that the key range selection unit 1000 holds in advance the conditions for determination in step 1002, that is, the execution date and time cycle. For example, the key range selection unit 1000 holds the calendar date and day of the week in advance, and in step 1002, the execution date and time indicated by the area 101 of the division history 100 is the end of every month or every Monday. Determine whether. Thus, the key range selection unit 1000 can determine whether or not the total number of records is periodically included in the predetermined range.

　ステップ１００２において処理される分割履歴１００は、総データ量が所定の範囲に含まれる分割履歴１００であるため、ステップ１００２において周期的に総レコード数が所定の範囲に含まれると判定された場合、キーレンジ選択部１０００は、各分割履歴１００の分布変動型を、図７Ａに示す分布変動型Ａと決定することができる。 Since the division history 100 processed in step 1002 is the division history 100 in which the total data amount is included in the predetermined range, when it is determined in step 1002 that the total number of records is periodically included in the predetermined range, The key range selection unit 1000 can determine the distribution variation type of each division history 100 as the distribution variation type A shown in FIG. 7A.

　ステップ１００２の後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００に基づいて、後述するキーレンジレコード数予測処理１０２０を実行する。キーレンジレコード数予測処理１０２０によって、キーレンジ選択部１０００は、キーレンジ管理テーブル３００の予測数３０５に値を格納する。 After step 1002, the key range selection unit 1000 executes a key range record number prediction process 1020 described later based on the key range management table 300. By the key range record number prediction process 1020, the key range selection unit 1000 stores a value in the prediction number 305 of the key range management table 300.

　キーレンジレコード数予測処理１０２０の後、キーレンジ選択部１０００は、履歴選択処理１０１０によって選択された複数の分割履歴１００の分布変動型が、分布変動型Ｂであるか、分布変動型Ｃであるか、又は、分布変動型Ｄであるかを判定する（１００３）。キーレンジ選択部１０００はステップ１００３において、判定された分布変動型を示す値を、パラメータ管理テーブル２００の分布変動型２０１に格納する。 After the key range record number prediction process 1020, the key range selection unit 1000 has the distribution variation type B or the distribution variation type C of the plurality of division histories 100 selected by the history selection process 1010. Or distribution variation type D (1003). In step 1003, the key range selection unit 1000 stores a value indicating the determined distribution variation type in the distribution variation type 201 of the parameter management table 200.

　ステップ１００３においてキーレンジ選択部１０００は、分割履歴１００間の最終キーレンジにおけるレコード数の差を算出する。そして、最終キーレンジにおけるレコード数の差が、他のキーレンジにおけるレコード数の差に比べて、いずれの分割履歴１００間においても大きい場合、キーレンジ選択部１０００は、履歴選択処理１０１０によって選択された複数の分割履歴１００の分布変動型を、分布変動型Ｃと判定する。 In step 1003, the key range selection unit 1000 calculates the difference in the number of records in the final key range between the division histories 100. If the difference in the number of records in the final key range is greater than the difference in the number of records in the other key ranges, the key range selection unit 1000 is selected by the history selection processing 1010. The distribution variation type of the plurality of division histories 100 is determined as the distribution variation type C.

　例えば、図７Ｃに示す最終キーレンジにおける差ｄ１２が、図７Ｃに示すキーレンジ（００１－１００、１０１－２００）における差ｄ１２よりも大きく、かつ、図７Ｃに示す最終キーレンジにおける差ｄ２３が、図７Ｃに示すキーレンジ（００１－１００、１０１－２００）における差ｄ２３よりも大きい場合、キーレンジ選択部１０００は、履歴選択処理１０１０によって選択された複数の分割履歴１００の分布変動型を、分布変動型Ｃと判定する。 For example, the difference d12 in the final key range shown in FIG. 7C is larger than the difference d12 in the key ranges (001-100, 101-200) shown in FIG. 7C, and the difference d23 in the final key range shown in FIG. When the difference d23 is larger than the difference d23 in the key ranges (001-100, 101-200) shown in FIG. 7C, the key range selection unit 1000 determines the distribution variation type of the plurality of division histories 100 selected by the history selection processing 1010 as distribution. It is determined as variable type C.

　ステップ１００３においてキーレンジ選択部１０００は、さらに、分割履歴１００間の各キーレンジにおけるレコード数の差を、領域１０１が示す実行日時が新しい分割履歴１００の総レコード数によって割った値が、すべて、パラメータ管理テーブル２００のレコード数閾値２０３が示す値以内である場合、履歴選択処理１０１０によって選択された複数の分割履歴１００の分布変動型を、分布変動型Ｂと判定する。 In step 1003, the key range selection unit 1000 further determines that the values obtained by dividing the difference in the number of records in each key range between the division histories 100 by the total number of records in the division history 100 with the new execution date and time indicated by the area 101 are all When the value is within the value indicated by the record number threshold 203 of the parameter management table 200, the distribution variation type of the plurality of division histories 100 selected by the history selection processing 1010 is determined as the distribution variation type B.

　そして、ステップ１００３においてキーレンジ選択部１０００は、ステップ１００２において分布変動型Ａと決定されず、かつ、前述の判定によって分布変動型Ｂ又は分布変動型Ｃと判定されなかった場合、履歴選択処理１０１０によって選択された複数の分割履歴１００の分布変動型を、分布変動型Ｄと判定する。 In step 1003, the key range selection unit 1000 determines that the distribution variation type A is not determined in step 1002, and if the distribution variation type B or the distribution variation type C is not determined by the above-described determination, the history selection processing 1010. The distribution variation type of the plurality of division histories 100 selected by is determined as the distribution variation type D.

　ステップ１００３の後、キーレンジ選択部１０００は、パラメータ管理テーブル２００の分布変動型２０１が分布変動型Ａ又は分布変動型Ｂを示すか否かを判定する（１００４）。分布変動型２０１が分布変動型Ａ又は分布変動型Ｂを示す場合、キーレンジ選択部１０００は、ステップ１００５を実行する。 After step 1003, the key range selection unit 1000 determines whether or not the distribution variation type 201 of the parameter management table 200 indicates the distribution variation type A or the distribution variation type B (1004). When the distribution variation type 201 indicates the distribution variation type A or the distribution variation type B, the key range selection unit 1000 executes Step 1005.

　キーレンジ選択部１０００は、ステップ１００５において、パラメータ管理テーブル２００の、上限レコード数２０５と下限レコード数２０６と再構成下限率２０７とを取得する。 In step 1005, the key range selection unit 1000 acquires the upper limit record number 205, the lower limit record number 206, and the reconstruction lower limit rate 207 of the parameter management table 200.

　そして、ステップ１００５においてキーレンジ選択部１０００は、取得された上限レコード数２０５を上回るか、又は、取得された下限レコード数２０６を下回る予測数３０５を含むエントリの数が、キーレンジ管理テーブル３００の全エントリに占める割合を算出する。そして、キーレンジ選択部１０００は、算出された割合が再構成下限率２０７を上回るか否かを判定する。 In step 1005, the key range selection unit 1000 determines that the number of entries including the predicted number 305 exceeding the acquired upper limit record number 205 or lower than the acquired lower limit record number 206 is the key range management table 300. Calculate the percentage of all entries. Then, the key range selection unit 1000 determines whether or not the calculated ratio exceeds the reconstruction lower limit ratio 207.

　ステップ１００５において、算出された割合が再構成下限率２０７を上回ると判定された場合、すべてのキーレンジに含まれるキー値を変更する必要があるため、キーレンジ選択部１０００は、キーレンジ再構成処理１０４０を実行する。 If it is determined in step 1005 that the calculated ratio exceeds the reconstruction lower limit rate 207, it is necessary to change the key values included in all the key ranges. Processing 1040 is executed.

　ステップ１００４において分布変動型２０１が分布変動型Ａ又は分布変動型Ｂを示さないと判定された場合、又は、ステップ１００５において算出された割合が再構成下限率２０７以下であると判定された場合、キーレンジ選択部１０００は、分布変動型２０１が分布変動型Ｃを示すか否かを判定する（１００６）。 When it is determined in step 1004 that the distribution variation type 201 does not indicate the distribution variation type A or the distribution variation type B, or when the ratio calculated in step 1005 is determined to be the reconstruction lower limit rate 207 or less, The key range selection unit 1000 determines whether or not the distribution variation type 201 indicates the distribution variation type C (1006).

　ステップ１００６において分布変動型２０１が分布変動型Ｃを示すと判定された場合、キーレンジ選択部１０００は、キーレンジ管理テーブル３００の最終キーレンジに相当するエントリの分割点調査フラグ３０７に、"ＯＮ"を格納する（１００７）。これによって、図８に示す処理の開始時に指定されたジョブが、図８に示す処理の終了後に実行される場合、後述するデータ分割部１１００が最終キーレンジを分割して、新たな分割履歴１００を出力する。 When it is determined in step 1006 that the distribution variation type 201 indicates the distribution variation type C, the key range selection unit 1000 sets “ON” in the division point investigation flag 307 of the entry corresponding to the final key range of the key range management table 300. "Is stored (1007). Accordingly, when the job specified at the start of the process shown in FIG. 8 is executed after the end of the process shown in FIG. 8, the data dividing unit 1100 described later divides the final key range, and a new division history 100 is obtained. Is output.

　ステップ１００６において分布変動型２０１が分布変動型Ｃを示さないと判定された場合、又は、ステップ１００７の後、キーレンジ選択部１０００は、分布変動型２０１が分布変動型Ｄを示すか否かを判定する（１００８）。 When it is determined in step 1006 that the distribution variation type 201 does not indicate the distribution variation type C, or after step 1007, the key range selection unit 1000 determines whether or not the distribution variation type 201 indicates the distribution variation type D. Determine (1008).

　ステップ１００８において、分布変動型２０１が分布変動型Ｄを示すと判定された場合、キーレンジ選択部１０００は、後述するキーレンジ選択処理１０３０を実行する。キーレンジ選択処理１０３０によって、キーレンジ選択部１０００は、いずれのキーレンジを分割するかを決定することができる。 When it is determined in step 1008 that the distribution variation type 201 indicates the distribution variation type D, the key range selection unit 1000 executes a key range selection process 1030 described later. By the key range selection processing 1030, the key range selection unit 1000 can determine which key range is to be divided.

　キーレンジ選択処理１０３０の後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００の分割点調査フラグ３０７が"ＯＮ"を示すエントリの数を、キーレンジ管理テーブル３００のエントリの総数によって割った値を算出する。そして、算出された値が、再構成下限率２０７が示す値を上回るか否かを判定する（１００９）。 After the key range selection processing 1030, the key range selection unit 1000 divides the number of entries in which the division point investigation flag 307 of the key range management table 300 indicates “ON” by the total number of entries in the key range management table 300. Is calculated. Then, it is determined whether or not the calculated value exceeds the value indicated by the reconstruction lower limit rate 207 (1009).

　ステップ１００９において算出された値が、再構成下限率２０７が示す値を上回る場合、キーレンジ選択部は、キーレンジ再構成処理１０４０を実行する。 When the value calculated in Step 1009 exceeds the value indicated by the reconstruction lower limit rate 207, the key range selection unit executes a key range reconstruction process 1040.

　ステップ１００８において分布変動型２０１が分布変動型Ｄを示さないと判定された場合、ステップ１００９において算出された値が、再構成下限率２０７が示す値以下である場合、又は、キーレンジ再構成処理１０４０の後、キーレンジ選択部１０００は、図８に示す処理を終了する。ここで、キーレンジ選択部１０００は、図８に示す処理によって分布変動型２０１に値を格納されたパラメータ管理テーブル２００を、記憶装置２０に格納する。 If it is determined in step 1008 that the distribution variation type 201 does not indicate the distribution variation type D, the value calculated in step 1009 is less than or equal to the value indicated by the reconstruction lower limit rate 207, or the key range reconstruction processing After 1040, the key range selection unit 1000 ends the process shown in FIG. Here, the key range selection unit 1000 stores, in the storage device 20, the parameter management table 200 in which values are stored in the distribution variation type 201 by the processing shown in FIG.

　なお、パラメータ管理テーブル２００がステップ１００５における再構成下限率２０７とステップ１００９における再構成下限率２０７とを保持し、キーレンジ選択部１０００は、ステップ１００９及びステップ１００５において、異なる値の再構成下限率２０７を用いてもよい。 The parameter management table 200 holds the reconstruction lower limit rate 207 in step 1005 and the reconstruction lower limit rate 207 in step 1009, and the key range selection unit 1000 determines that the reconstruction lower limit rate with different values in step 1009 and step 1005. 207 may be used.

　また、パラメータ管理テーブル２００が複数の異なる値の再構成下限率２０７を保持し、特定の条件をユーザから入力された場合、キーレンジ選択部１０００は、入力された条件に従い、ステップ１００５及びステップ１００９において用いる再構成下限率２０７の値をパラメータ管理テーブル２００から選択してもよい。 When the parameter management table 200 holds the reconstruction lower limit rate 207 having a plurality of different values and a specific condition is input from the user, the key range selection unit 1000 performs steps 1005 and 1009 according to the input condition. The value of the reconstruction lower limit rate 207 used in the above may be selected from the parameter management table 200.

　例えば、計算機システム１が、図８に示す処理の開始からジョブの終了規定時刻までの時間が少ないなどの条件をユーザからあらかじめ入力された場合、キーレンジ再構成処理１０４０の処理時間を削減するために、キーレンジ選択部１０００は、値の大きい再構成下限率２０７を選択してもよい。 For example, to reduce the processing time of the key range reconfiguration process 1040 when the computer system 1 is input in advance by a user such as a condition that the time from the start of the process shown in FIG. In addition, the key range selection unit 1000 may select the reconstruction lower limit rate 207 having a large value.

　図９は、本発明の第１の実施形態のキーレンジ選択部１０００による履歴選択処理１０１０を示すフローチャートである。 FIG. 9 is a flowchart showing the history selection processing 1010 by the key range selection unit 1000 according to the first embodiment of this invention.

　図８に示す処理の開始後の履歴選択処理１０１０において、キーレンジ選択部１０００は、現在日時と入力データ２１の総データ量とを取得する（１０１１）。 In the history selection process 1010 after the start of the process shown in FIG. 8, the key range selection unit 1000 acquires the current date and time and the total data amount of the input data 21 (1011).

　ステップ１０１１の後、キーレンジ選択部１０００は、記憶装置２０から、領域１０２が図８に示す処理の開始時、又は、あらかじめキーレンジ選択部１０００に指定されたジョブ名を示す分割履歴１００を抽出する。そして、抽出された分割履歴１００の領域１０１及び領域１０３を参照し、各分割履歴１００の実行日時及び総データ量を取得する（１０１２）。 After step 1011, the key range selection unit 1000 extracts from the storage device 20 the division history 100 indicating the job name specified in the key range selection unit 1000 when the area 102 starts the processing shown in FIG. 8 or in advance. To do. Then, the execution date and total data amount of each division history 100 is acquired with reference to the extracted area 101 and area 103 of the division history 100 (1012).

　ステップ１０１２の後、キーレンジ選択部１０００は、入力データ２１の総データ量と分割履歴１００の総データ量との差を、入力データ２１の総データ量によって割った値が、パラメータ管理テーブル２００のデータ量閾値２０４以内である分割履歴１００を特定する。そして、キーレンジ選択部１０００は、特定された分割履歴１００のうち、ジョブの実行日時が最新の分割履歴（実行日時と現在日時との差が最も少ない分割履歴）から所定の数の分割履歴１００を選択する（１０１３）。 After step 1012, the key range selection unit 1000 determines that the value obtained by dividing the difference between the total data amount of the input data 21 and the total data amount of the division history 100 by the total data amount of the input data 21 is the parameter management table 200. The division history 100 that is within the data amount threshold 204 is specified. Then, the key range selection unit 1000 selects a predetermined number of division histories 100 from the division histories with the latest job execution date / time among the identified division histories 100 (division histories with the smallest difference between the execution date / time and the current date / time). Is selected (1013).

　ステップ１０１３によって、キーレンジ選択部１０００は、現在日時と実行日時とが離れ過ぎた分割履歴１００、又は、総データ量が異なり過ぎる分割履歴１００などのレコード数の変化の状況を取得するために不適当な分割履歴１００を選択することを、回避することができる。 In step 1013, the key range selection unit 1000 uses the division history 100 in which the current date / time and the execution date / time are too far apart, or the change in the number of records such as the division history 100 in which the total data amount is too different. Selecting an appropriate division history 100 can be avoided.

　なお、ステップ１０１３において選択される分割履歴１００の数（前述の所定の数）は、ユーザ又は管理者等によってあらかじめ指定されてもよい。ステップ１０１３において選択される分割履歴１００の数が多ければ多いほど、キーレンジ選択部１０００は、図８に示すキーレンジレコード数予測処理１０２０において、予測数３０５を精度よく求めることができる。 Note that the number of division histories 100 selected in step 1013 (the predetermined number described above) may be designated in advance by a user or an administrator. As the number of division histories 100 selected in step 1013 increases, the key range selection unit 1000 can obtain the predicted number 305 with higher accuracy in the key range record number prediction process 1020 shown in FIG.

　ステップ１０１３の後、キーレンジ選択部１０００は、ステップ１０１３において選択されたすべての分割履歴１００から、各キーレンジの境界値（キー値の最小値及び最大値）と各キーレンジにおけるレコード数とを抽出し、抽出された情報を、各分割履歴１００に対応するキーレンジ管理テーブル３００のエントリの、最小値３０１、最大値３０２、及び、履歴３０３に格納する（１０１４）。 After step 1013, the key range selection unit 1000 calculates the boundary value (minimum value and maximum value of the key value) of each key range and the number of records in each key range from all the division histories 100 selected in step 1013. The extracted information is stored in the minimum value 301, the maximum value 302, and the history 303 of the entry of the key range management table 300 corresponding to each division history 100 (1014).

　なお、ステップ１０１４においてキーレンジ選択部１０００は、キーレンジ管理テーブル３００の各履歴３０３に、各分割履歴１００の実行日時の古い順、又は、新しい順に各分割履歴１００のレコード数を格納する。 In step 1014, the key range selection unit 1000 stores the number of records of each division history 100 in the history 303 of the key range management table 300 in the order of the execution date and time of each division history 100 or in the new order.

　ステップ１０１４の後、キーレンジ選択部１０００は、履歴選択処理１０１０を終了する。 After step 1014, the key range selection unit 1000 ends the history selection process 1010.

　前述の通り、キーレンジ選択部１０００は、図９に示す処理によって、総データ量が入力データ２１の総データ量と異なり過ぎる分割履歴１００を、レコード数の変化の状況を取得するための分割履歴１００として選択することを回避することができる。 As described above, the key range selection unit 1000 uses the process shown in FIG. 9 to obtain the division history 100 in which the total data amount is too different from the total data amount of the input data 21 and the division history for acquiring the change state of the number of records. Selecting as 100 can be avoided.

　しかし、過去に複数回実行されたジョブの結果から、入力データ２１の総データ量に大きな変動がないことをキーレンジ選択部１０００があらかじめ取得できた場合、キーレンジ選択部１０００は、ステップ１０１３において、入力データ２１の総データ量と分割履歴１００の総データ量とを比較しなくてもよい。そして、キーレンジ選択部１０００は、ステップ１０１３において実行日時のみによって分割履歴１００を選択してもよい。 However, if the key range selection unit 1000 has previously acquired from the results of jobs executed a plurality of times in the past that the total data amount of the input data 21 does not vary greatly, the key range selection unit 1000 determines in step 1013. The total data amount of the input data 21 and the total data amount of the division history 100 need not be compared. And the key range selection part 1000 may select the division | segmentation log | history 100 only by the execution date in step 1013.

　図１０は、本発明の第１の実施形態のキーレンジ選択部１０００によるキーレンジレコード数予測処理１０２０を示すフローチャートである。 FIG. 10 is a flowchart showing the key range record number prediction process 1020 by the key range selection unit 1000 according to the first embodiment of this invention.

　ステップ１００２の後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００の予測数３０５に値を格納するため、キーレンジレコード数予測処理１０２０を開始する。キーレンジレコード数予測処理１０２０の開始後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００のエントリごとに、各履歴３０３間のレコード数の差を算出する（１０２１）。 After step 1002, the key range selection unit 1000 starts the key range record number prediction process 1020 in order to store a value in the prediction number 305 of the key range management table 300. After the key range record number prediction process 1020 starts, the key range selection unit 1000 calculates the difference in the number of records between the histories 303 for each entry of the key range management table 300 (1021).

　例えば、キーレンジ選択部１０００は、ステップ１０２１において、図４に示す履歴１（３０３ａ）のレコード数ｒ１と履歴２（３０３ｂ）のレコード数ｒ２との差ｄ１２＝ｒ１－ｒ２を算出する。また、キーレンジ選択部１０００は、履歴２（３０３ｂ）のレコード数ｒ２と履歴３（３０３ｃ）のレコード数ｒ３との差ｄ２３＝ｒ２－ｒ３を算出する。 For example, in step 1021, the key range selection unit 1000 calculates the difference d12 = r1−r2 between the record number r1 of the history 1 (303a) and the record number r2 of the history 2 (303b) shown in FIG. Further, the key range selection unit 1000 calculates a difference d23 = r2−r3 between the record number r2 of the history 2 (303b) and the record number r3 of the history 3 (303c).

　ステップ１０２１の後、キーレンジ選択部１０００は、ステップ１０２１において算出された履歴間のレコード数の差を各キーレンジにおいて比較し、各エントリの増減型３０４を判定する（１０２２）。 After step 1021, the key range selection unit 1000 compares the difference in the number of records between the histories calculated in step 1021 in each key range, and determines the increase / decrease type 304 of each entry (1022).

　具体的には、キーレンジ選択部１０００は、ステップ１０２２において、レコード数の差ｄ１２と差ｄ２３とを、各キーレンジにおいて比較する。そして、キーレンジ選択部１０００は、比較結果に従って増減型３０４を判定する。 Specifically, in step 1022, the key range selection unit 1000 compares the record number difference d12 and the difference d23 in each key range. Then, the key range selection unit 1000 determines the increase / decrease type 304 according to the comparison result.

　増減型３０４を判定する方法は、ほぼ線形にレコード数が増加又は減少する増減パターンと、例えば指数関数のように、急激に増加又は減少する増減パターンと、ほとんどレコード数が変わらない増減パターンとを判定できれば、いかなる方法を用いてもよい。以下に、本実施形態における増減型３０４を判定する方法の例を示す。 The method of determining the increase / decrease type 304 includes an increase / decrease pattern in which the number of records increases or decreases approximately linearly, an increase / decrease pattern in which the number of records increases or decreases abruptly, such as an exponential function, and an increase / decrease pattern in which the number of records hardly changes. Any method may be used as long as it can be determined. Below, the example of the method of determining the increase / decrease type | mold 304 in this embodiment is shown.

　ステップ１０２２において、あるキーレンジにおける差ｄ１２の符号と差ｄ２３の符号とが正であり、かつ、差ｄ１２と差ｄ２３とが等しい場合、キーレンジ選択部１０００は、そのキーレンジのエントリの増減型３０４を増減型＋１に決定する。 In step 1022, when the sign of the difference d12 and the sign of the difference d23 in a certain key range are positive and the difference d12 and the difference d23 are equal, the key range selection unit 1000 increases or decreases the entry of the key range. 304 is determined to be the increment / decrement type +1.

　また、あるキーレンジにおける差ｄ１２の符号と差ｄ２３の符号とが正であり、かつ、差ｄ１２と差ｄ２３との差が第１の所定の値以内である場合、キーレンジ選択部１０００は、そのキーレンジのエントリの増減型３０４を増減型＋１に決定する。 In addition, when the sign of the difference d12 and the sign of the difference d23 in a certain key range are positive and the difference between the difference d12 and the difference d23 is within a first predetermined value, the key range selection unit 1000 The increase / decrease type 304 of the key range entry is determined to be the increase / decrease type + 1.

　また、ステップ１０２２において、あるキーレンジの差ｄ１２の符号と差ｄ２３の符号とが負であり、かつ、差ｄ１２と差ｄ２３とが等しい場合、キーレンジ選択部１０００は、そのエントリの増減型３０４を増減型－１に決定する。 In step 1022, if the sign of the difference d12 and the sign of the difference d23 of a certain key range are negative and the difference d12 and the difference d23 are equal, the key range selection unit 1000 determines the increase / decrease type 304 of the entry. Is determined to be increase / decrease type-1.

　また、あるキーレンジにおける差ｄ１２の符号と差ｄ２３の符号とが負であり、かつ、差ｄ１２と差ｄ２３との差が第１の所定の値以内である場合、キーレンジ選択部１０００は、そのエントリの増減型３０４を増減型－１に決定する。 In addition, when the sign of the difference d12 and the sign of the difference d23 in a certain key range are negative and the difference between the difference d12 and the difference d23 is within a first predetermined value, the key range selection unit 1000 The increase / decrease type 304 of the entry is determined to be increase / decrease type-1.

　また、ステップ１０２２において、あるエントリの差ｄ１２の値と差ｄ２３の値とが０であるか、又は、差ｄ１２及び差ｄ２３が第２の所定の値以内である場合、キーレンジ選択部１０００は、そのエントリの増減型３０４を増減型０に決定する。 In step 1022, if the value of the difference d12 and the value of the difference d23 of a certain entry is 0, or if the difference d12 and the difference d23 are within the second predetermined value, the key range selection unit 1000 The increase / decrease type 304 of the entry is determined to be the increase / decrease type 0.

　また、ステップ１０２２において、差ｄ１２と差ｄ２３との符号が異なる場合、キーレンジ選択部１０００は、そのエントリの増減型３０４を増減型０に決定する。 In step 1022, when the signs of the difference d12 and the difference d23 are different, the key range selection unit 1000 determines the increase / decrease type 304 of the entry as the increase / decrease type 0.

　また、ステップ１０２２において、あるエントリの差ｄ１２の符号と差ｄ２３の符号とが正であり、かつ、差ｄ１２と差ｄ２３との差が第１の所定の値を上回る場合、キーレンジ選択部１０００は、そのエントリの増減型３０４を増減型＋２に決定する。 In step 1022, if the sign of the difference d12 and the sign of the difference d23 of a certain entry are positive and the difference between the difference d12 and the difference d23 exceeds the first predetermined value, the key range selection unit 1000 Determines the increase / decrease type 304 of the entry to increase / decrease type +2.

　また、ステップ１０２２において、あるエントリの差ｄ１２の符号と差ｄ２３の符号とが負であり、かつ、差ｄ１２と差ｄ２３との差が第１の所定の値を上回る場合、キーレンジ選択部１０００は、そのエントリの増減型３０４を増減型－２に決定する。 In step 1022, if the sign of the difference d12 and the sign of the difference d23 of a certain entry are negative and the difference between the difference d12 and the difference d23 exceeds the first predetermined value, the key range selection unit 1000 Determines the increase / decrease type 304 of the entry to increase / decrease type-2.

　なお、キーレンジ選択部１０００は、ステップ１０２２において、前述の方法以外の方法を用いて増減パターンを判定してもよい。例えば、複数の分割履歴１００が示す各キーレンジのレコード数の分布に近似する曲線を求め、曲線の形から増減パターンを判定してもよい。 Note that the key range selection unit 1000 may determine the increase / decrease pattern in step 1022 using a method other than the method described above. For example, a curve that approximates the distribution of the number of records in each key range indicated by the plurality of division histories 100 may be obtained, and the increase / decrease pattern may be determined from the shape of the curve.

　また、ステップ１０２２において増減パターンを判定するための第１の閾値及び第２の閾値も、判定される増減パターンによって異なる値が用いられてもよい。さらに、第１の閾値及び第２の閾値は、同じ値でもよく、入力データ２１に含まれるレコード数と、パラメータ管理テーブル２００のレコード数閾値２０３とを乗算することよって求められてもよい。 Also, different values may be used for the first threshold and the second threshold for determining the increase / decrease pattern in step 1022 depending on the determined increase / decrease pattern. Furthermore, the first threshold value and the second threshold value may be the same value, and may be obtained by multiplying the number of records included in the input data 21 by the record number threshold 203 of the parameter management table 200.

　ステップ１０２２の後、キーレンジ選択部１０００は、ステップ１０２２において判定された増減パターンに従ってレコード数の予測値を算出し、算出された予測値をキーレンジ管理テーブル３００の予測数３０５に格納する（１０２３）。 After step 1022, the key range selection unit 1000 calculates the predicted number of records according to the increase / decrease pattern determined in step 1022, and stores the calculated predicted value in the predicted number 305 of the key range management table 300 (1023). ).

　キーレンジ選択部１０００は、増減パターンに従ってレコード数の予測値を算出するための数式、又は、規則等を、あらかじめ保持する。予測値を算出するための数式、又は、規則等は、増減パターンに対応すればいかなる数式、又は、規則等でもよい。 The key range selection unit 1000 holds in advance formulas or rules for calculating the predicted value of the number of records according to the increase / decrease pattern. The mathematical formula or rule for calculating the predicted value may be any mathematical formula or rule as long as it corresponds to the increase / decrease pattern.

　以下に、予測値を算出する処理の例を示す。 The following shows an example of processing for calculating a predicted value.

　例えば、キーレンジ選択部１０００は、ステップ１０２３において、増減型３０４が増減型０を示すエントリの予測数３０５を、履歴１のレコード数ｒ１とする。 For example, in step 1023, the key range selection unit 1000 sets the predicted number 305 of entries where the increase / decrease type 304 indicates the increase / decrease type 0 as the record number r1 of the history 1.

　また、キーレンジ選択部１０００は、ステップ１０２３において、増減型３０４が増減型＋１を示すエントリの予測数３０５として、ｒ１＋（ｄ１２＋ｄ２３）／２を算出する。また、増減型３０４が増減型－１を示すエントリの予測数３０５として、ｒ１－（ｄ１２＋ｄ２３）／２を算出する。 In step 1023, the key range selection unit 1000 calculates r1 + (d12 + d23) / 2 as the predicted number 305 of entries in which the increase / decrease type 304 indicates increase / decrease type + 1. Further, r1− (d12 + d23) / 2 is calculated as the predicted number 305 of entries in which the increase / decrease type 304 indicates the increase / decrease type−1.

　また、キーレンジ選択部１０００は、ステップ１０２３において、増減型３０４が＋２を示すエントリの予測数３０５として、ｒ１＋ｄ１２＊（ｄ１２／ｄ２３）を算出する。また、増減型３０４が－２を示すエントリの予測数３０５として、ｒ１－ｄ１２＊（ｄ１２／ｄ２３）を算出する。 In step 1023, the key range selection unit 1000 calculates r1 + d12 * (d12 / d23) as the predicted number 305 of entries in which the increase / decrease type 304 indicates +2. Further, r1−d12 * (d12 / d23) is calculated as the predicted number 305 of entries where the increase / decrease type 304 indicates −2.

　そして、キーレンジ選択部１０００は、ステップ１０２３において算出された各エントリの予測値を予測数３０５に格納する。そして、ステップ１０２３の後、図１０に示す処理を終了する。 Then, the key range selection unit 1000 stores the predicted value of each entry calculated in step 1023 in the predicted number 305. Then, after step 1023, the processing shown in FIG.

　なお、図１０に示す処理において、増減型３０４の値は、増減型＋２、増減型＋１、増減型０、増減型－１、及び、増減型－２の五つであったが、キーレンジ選択部１０００は、増減型＋１、増減型０、及び、増減型－１の三つのみを判定してもよい。これによって、算出される予測数３０５の精度は低下するが、例えば、分割履歴１００が二つのみしか選択されていない場合などにおいても、キーレンジ選択部１０００は、増減パターンを判定できる。 In the process shown in FIG. 10, the increase / decrease type 304 has five values of increase / decrease type +2, increase / decrease type + 1, increase / decrease type 0, increase / decrease type-1, and increase / decrease type-2. The unit 1000 may determine only three of increase / decrease type + 1, increase / decrease type 0, and increase / decrease type-1. As a result, the accuracy of the calculated predicted number 305 decreases, but the key range selection unit 1000 can determine the increase / decrease pattern even when only two division histories 100 are selected, for example.

　図１１は、本発明の第１の実施形態のキーレンジ選択部１０００によるキーレンジ選択処理１０３０を示すフローチャートである。 FIG. 11 is a flowchart showing the key range selection processing 1030 by the key range selection unit 1000 according to the first embodiment of this invention.

　図１１に示す処理によって、キーレンジ選択部１０００は、分布変動型２０１が分布変動型Ｄを示す場合、分割又は統合するべきキーレンジを判定できる。 11, when the distribution variation type 201 indicates the distribution variation type D, the key range selection unit 1000 can determine the key range to be divided or integrated.

　ステップ１００８において、分布変動型２０１が分布変動型Ｄであると判定された場合、又は、ステップ１０３３の後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００のエントリのうち、ステップ１０３１が実行されていないエントリを抽出する。そして、抽出されたエントリの予測数３０５が、分布変動型Ｄのパラメータ管理テーブル２００の上限レコード数２０５を上回るか否かを判定する（１０３１）。 If it is determined in step 1008 that the distribution variation type 201 is the distribution variation type D, or after step 1033, the key range selection unit 1000 executes step 1031 among the entries of the key range management table 300. Extract the entries that are not. Then, it is determined whether or not the predicted number 305 of the extracted entries exceeds the upper limit record number 205 of the parameter management table 200 of the distribution variation type D (1031).

　キーレンジ管理テーブル３００の各エントリの予測数３０５が、分布変動型Ｄのパラメータ管理テーブル２００の上限レコード数２０５を上回る場合、キーレンジ選択部１０００は、そのエントリの分割点調査フラグ３０７にＯＮを格納する（１０３２）。 When the predicted number 305 of each entry in the key range management table 300 exceeds the upper limit record number 205 of the parameter management table 200 of the distribution variation type D, the key range selection unit 1000 turns ON the division point investigation flag 307 of the entry. Store (1032).

　また、キーレンジ管理テーブル３００の各エントリの予測数３０５が、分布変動型Ｄのパラメータ管理テーブル２００の上限レコード数２０５以下である場合、キーレンジ選択部１０００は、そのエントリの分割点調査フラグ３０７にＯＦＦを格納する（１０３４）。 If the predicted number 305 of each entry in the key range management table 300 is less than or equal to the upper limit number of records 205 in the parameter management table 200 of the distribution variation type D, the key range selection unit 1000 determines the division point investigation flag 307 for that entry. OFF is stored in (1034).

　ステップ１０３２又はステップ１０３４の後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００のすべてのエントリに、ステップ１０３１の処理を行ったか否かを判定する（１０３３）。 After step 1032 or step 1034, the key range selection unit 1000 determines whether or not the processing of step 1031 has been performed on all entries of the key range management table 300 (1033).

　そして、キーレンジ管理テーブル３００のすべてのエントリに、ステップ１０３１の処理を行った場合、キーレンジ選択部１０００は、図１１に示す処理を終了する。また、キーレンジ管理テーブル３００のエントリに、ステップ１０３１の処理を行っていないエントリがある場合、キーレンジ選択部１０００は、ステップ１０３１を行う。 When the processing in step 1031 is performed on all entries in the key range management table 300, the key range selection unit 1000 ends the processing shown in FIG. If there is an entry in the key range management table 300 that has not been subjected to the processing of step 1031, the key range selection unit 1000 performs step 1031.

　図１２は、本発明の第１の実施形態のキーレンジ選択部１０００によるキーレンジ再構成処理１０４０を示すフローチャートである。 FIG. 12 is a flowchart showing the key range reconfiguration processing 1040 by the key range selection unit 1000 according to the first embodiment of this invention.

　キーレンジ選択部１０００は、入力データ２１から、各レコードのキー値をレコードバッファ１４に読み出す（１０４１）。ステップ１０４１の後、キーレンジ選択部１０００は、レコードバッファ１４に読み出されたキー値を、例えば昇順に、ソートする（１０４２）。 The key range selection unit 1000 reads the key value of each record from the input data 21 to the record buffer 14 (1041). After step 1041, the key range selection unit 1000 sorts the key values read to the record buffer 14 in ascending order, for example (1042).

　ステップ１０４２の後、キーレンジ選択部１０００は、上限レコード数２０５及び下限レコード数２０６をパラメータ管理テーブル２００から読み出す。そして、ステップ１０４２においてソートされたキー値を、（上限レコード数２０５＋下限レコード数２０６）／２ごとに分割することによって、複数のキーレンジを生成する（１０４３）。 After step 1042, the key range selection unit 1000 reads the upper limit record number 205 and the lower limit record number 206 from the parameter management table 200. Then, a plurality of key ranges are generated by dividing the key values sorted in step 1042 by (upper limit record number 205 + lower limit record number 206) / 2 (1043).

　ステップ１０４２において読み出される上限レコード数２０５及び下限レコード数２０６は、ステップ１００２及びステップ１００３において決定又は判定された分布変動型に従った値でもよく、また、ステップ１００１において分割履歴１００が存在しない場合にキーレンジ再構成処理を行うためにあらかじめ定められた値でもよい。 The upper limit record number 205 and the lower limit record number 206 read in step 1042 may be values according to the distribution variation type determined or determined in step 1002 and step 1003, and when the division history 100 does not exist in step 1001. A predetermined value may be used for performing the key range reconstruction processing.

　ステップ１０４３の後、キーレンジ選択部１０００は、キーレンジ管理テーブル３００に格納された値をすべてｎｕｌｌ値等に更新することによって、キーレンジ管理テーブル３００の値を削除する（１０４４）。 After step 1043, the key range selection unit 1000 deletes the values in the key range management table 300 by updating all the values stored in the key range management table 300 to null values or the like (1044).

　ステップ１０４４の後、キーレンジ選択部１０００は、ステップ１０４３において生成されたキーレンジの最小値及び最大値を、キーレンジ管理テーブル３００の最小値３０１及び最大値３０２に格納する（１０４５）。 After step 1044, the key range selection unit 1000 stores the minimum value and maximum value of the key range generated in step 1043 in the minimum value 301 and maximum value 302 of the key range management table 300 (1045).

　図１３は、本発明の第１の実施形態のデータ分割部１１００による入力データ２１の分割処理を示すフローチャートである。 FIG. 13 is a flowchart showing the dividing process of the input data 21 by the data dividing unit 1100 according to the first embodiment of this invention.

　図１３に示す処理は、ジョブに入力データ２１が入力された際に実行される処理である。計算機システム１は、図１３に示す処理の実行後にジョブを実行する。データ分割部１１００は、ジョブに入力される入力データ２１を取得し、図１３に示す処理を行う。 The process shown in FIG. 13 is a process executed when the input data 21 is input to the job. The computer system 1 executes the job after executing the processing shown in FIG. The data dividing unit 1100 acquires the input data 21 input to the job and performs the processing shown in FIG.

　なお、データ分割部１１００は、図１３に示す処理の開始時において、取得された入力データ２１が入力されるジョブを示すジョブ名を取得する。そして、データ分割部１１００は、取得されたジョブ名をジョブ名２０２に含むパラメータ管理テーブル２００を読み出す。 The data dividing unit 1100 acquires a job name indicating a job to which the acquired input data 21 is input at the start of the processing illustrated in FIG. Then, the data division unit 1100 reads the parameter management table 200 that includes the acquired job name in the job name 202.

　まず、データ分割部１１００は、取得された入力データ２１からレコードを一つ読み出す。そして、データ分割部１１００は、読み出されたレコードのキー値と、キーレンジ管理テーブル３００の各エントリの最小値３０１及び最大値３０２とを比較し、読み出されたレコードのキー値が最小値３０１以上かつ最大値３０２以下に含まれるエントリを特定する（１１０１）。 First, the data division unit 1100 reads one record from the acquired input data 21. Then, the data dividing unit 1100 compares the key value of the read record with the minimum value 301 and the maximum value 302 of each entry of the key range management table 300, and the key value of the read record is the minimum value. The entry included in 301 or more and the maximum value 302 or less is specified (1101).

　ステップ１１０１の後、データ分割部１１００は、ステップ１１０１において特定されたエントリの分割点調査フラグ３０７がＯＮであるか否かを判定する（１１０２）。 After step 1101, the data division unit 1100 determines whether or not the division point investigation flag 307 of the entry identified in step 1101 is ON (1102).

　特定されたエントリの分割点調査フラグ３０７がＯＮではない場合、特定されたエントリに対応するキーレンジは、分割又は統合する必要がない。このため、データ分割部１１００は、特定されたエントリのキーレンジに対応する分割データ２２に、ステップ１１０１において読み出されたレコードを出力する（１１０４）。 When the division point investigation flag 307 of the identified entry is not ON, the key range corresponding to the identified entry does not need to be divided or integrated. Therefore, the data dividing unit 1100 outputs the record read in step 1101 to the divided data 22 corresponding to the key range of the identified entry (1104).

　ステップ１１０２において、特定されたエントリの分割点調査フラグ３０７がＯＮであると判定された場合、特定されたエントリに対応するキーレンジは、分割又は統合する必要がある。このため、データ分割部１１００は、ステップ１１０１において読み出されたレコードをレコードバッファ１４に格納する（１１０３）。 If it is determined in step 1102 that the division point investigation flag 307 of the identified entry is ON, the key range corresponding to the identified entry needs to be divided or integrated. Therefore, the data dividing unit 1100 stores the record read in step 1101 in the record buffer 14 (1103).

　ステップ１１０３においてレコードバッファ１４に格納されたレコードを、後述するキーレンジの分割又は統合の後に分割データ２２に出力し、かつ、ステップ１１０４において分割又は統合しないレコードを分割データ２２に出力することによって、データ分割部１１００は、図１３における処理によって発生するオーバヘッドを低減できる。 By outputting the record stored in the record buffer 14 in step 1103 to the divided data 22 after dividing or integrating the key range described later, and outputting the record that is not divided or integrated to the divided data 22 in step 1104, The data dividing unit 1100 can reduce the overhead generated by the processing in FIG.

　すなわち、データ分割部１１００は、ステップ１１０３及びステップ１１０４によって、キーレンジの分割又は統合の処理と、分割データ２２にレコードを出力する処理との二つの処理において、入力データ２１をそれぞれ読み出す必要がない。このため、データ分割部１１００は、分割データ２２の出力処理、及び、キーレンジの分割又は統合処理を速やかに行うことができる。 That is, the data dividing unit 1100 does not need to read the input data 21 in the two processes of the key range dividing or integrating process and the process of outputting the record to the divided data 22 in steps 1103 and 1104, respectively. . Therefore, the data dividing unit 1100 can quickly perform the output processing of the divided data 22 and the key range division or integration processing.

　ステップ１１０３又はステップ１１０４の後、データ分割部１１００は、ステップ１１０１において特定されたエントリのレコード実数３０６に１を加える（１１０５）。これによって、データ分割部１１００は、入力データ２１に含まれ、かつ、ステップ１１０３又はステップ１１０４の処理が行われたレコードの数を、キーレンジごとに算出することができる。 After step 1103 or step 1104, the data dividing unit 1100 adds 1 to the record real number 306 of the entry specified in step 1101 (1105). As a result, the data dividing unit 1100 can calculate the number of records included in the input data 21 and subjected to the processing of step 1103 or step 1104 for each key range.

　ステップ１１０５の後、データ分割部１１００は、入力データ２１に含まれるすべてのレコードが、レコードバッファ１４に格納されたか、又は、分割データ２２に出力されたかを判定する（１１０６）。入力データ２１に含まれるレコードに、レコードバッファ１４にも格納されておらず、また、分割データ２２にも出力されていないレコードがある場合、データ分割部１１００は、ステップ１１０１に戻る。 After step 1105, the data dividing unit 1100 determines whether all records included in the input data 21 are stored in the record buffer 14 or output to the divided data 22 (1106). If there is a record included in the input data 21 that is not stored in the record buffer 14 and is not output in the divided data 22, the data dividing unit 1100 returns to Step 1101.

　入力データ２１に含まれるすべてのレコードが、レコードバッファ１４に格納されたか、又は、分割データ２２に出力されたと判定された場合、データ分割部１１００は、ステップ１１０７を実行する。 When it is determined that all the records included in the input data 21 have been stored in the record buffer 14 or output to the divided data 22, the data dividing unit 1100 executes Step 1107.

　ステップ１１０７において、データ分割部１１００は、レコードバッファ１４に格納された複数のレコードを、レコードのキー値順にソートする（１１０７）。 In step 1107, the data dividing unit 1100 sorts the plurality of records stored in the record buffer 14 in the order of the key values of the records (1107).

　ステップ１１０７の後、データ分割部１１００は、キーレンジ管理テーブル３００の各エントリ（各キーレンジに相当）に、レコード実数３０６が上限レコード数２０５を上回るエントリがあるか否かを判定する（１１０８）。レコード実数３０６が上限レコード数２０５を上回るエントリが示すキーレンジを、以下において、第１キーレンジと記載する。 After step 1107, the data division unit 1100 determines whether each entry (corresponding to each key range) in the key range management table 300 has an entry in which the record real number 306 exceeds the upper limit record number 205 (1108). . The key range indicated by the entry in which the record real number 306 exceeds the upper limit record number 205 is hereinafter referred to as a first key range.

　本実施形態におけるキーレンジ選択部１０００は、図８に示す処理において、予測数３０５に基づいて分割点調査フラグ３０７を更新することによって、分割されるキーレンジの候補を抽出する。そして、データ分割部１１００が図１３に示すステップ１１０８において、レコード実数３０６に基づいて、抽出された候補の中から実際に分割が必要なキーレンジを特定する。 The key range selection unit 1000 in the present embodiment extracts the key range candidates to be divided by updating the division point survey flag 307 based on the prediction number 305 in the process shown in FIG. In step 1108 shown in FIG. 13, the data dividing unit 1100 specifies a key range that actually needs to be divided from the extracted candidates based on the record real number 306.

　これによって、図１３に示す処理によって、分割又は統合する必要がないキーレンジを分割することを回避できる。また、図８に示す処理によって、分割されるキーレンジの候補を抽出することによって、ステップ１１０８においてすべての入力データ２１のキーレンジを判定しないため、ステップ１１０８における処理時間が短縮される。 Thus, it is possible to avoid dividing the key range that does not need to be divided or integrated by the process shown in FIG. Also, by extracting the key range candidates to be divided by the processing shown in FIG. 8, the key range of all the input data 21 is not determined in step 1108, so the processing time in step 1108 is shortened.

　しかし、図８に示す履歴選択処理１０１０において多くの分割履歴１００を選択した場合など、予測数３０５が精度よく求められている場合、キーレンジ選択部１０００は、図１３に示すステップ１１０８の処理を省略してもよい。そして、キーレンジ選択部１０００は、ステップ１１０９において、分割点調査フラグ３０７がＯＮであるすべてのキーレンジを分割してもよい。 However, when the predicted number 305 is obtained with high accuracy, such as when a large number of division histories 100 are selected in the history selection process 1010 shown in FIG. 8, the key range selection unit 1000 performs the process of step 1108 shown in FIG. It may be omitted. In step 1109, the key range selection unit 1000 may divide all key ranges for which the division point survey flag 307 is ON.

　ステップ１１０８において、キーレンジ管理テーブル３００の各エントリに、レコード実数３０６が上限レコード数２０５を上回るエントリがないと判定された場合、データ分割部１１００は、レコードバッファ１４に格納されたレコードを、キーレンジ管理テーブル３００に従って、分割データ２２に出力する。そして、データ分割部１１００は、ステップ１１１０を実行する。これは、分割点調査フラグ３０７がＯＮであるエントリに、実際に分割が必要なキーレンジを示すエントリがないためである。 If it is determined in step 1108 that each entry in the key range management table 300 has no entry whose record real number 306 exceeds the upper limit record number 205, the data dividing unit 1100 uses the record stored in the record buffer 14 as a key. According to the range management table 300, the data is output to the divided data 22. Then, the data dividing unit 1100 executes Step 1110. This is because there is no entry indicating the key range that actually needs to be divided among the entries for which the division point investigation flag 307 is ON.

　ステップ１１０８において、キーレンジ管理テーブル３００の各エントリに、レコード実数３０６が上限レコード数２０５を上回るエントリがあると判定された場合、データ分割部１１００は、第１キーレンジに基づいて新たなキーレンジの最大値及び最小値を決定し、新たなキーレンジによってキーレンジ管理テーブル３００を更新する。そして、データ分割部１１００は、新たに決定されたキーレンジの最大値及び最小値に従って、レコードバッファ１４の第１キーレンジのレコードを、複数の分割データ２２に出力する（１１０９）。 If it is determined in step 1108 that each entry in the key range management table 300 includes an entry whose record real number 306 exceeds the upper limit record number 205, the data dividing unit 1100 determines a new key range based on the first key range. And the key range management table 300 is updated with the new key range. Then, the data dividing unit 1100 outputs the records of the first key range in the record buffer 14 to the plurality of divided data 22 according to the newly determined maximum value and minimum value of the key range (1109).

　ステップ１１０９によって、第１キーレンジは分割され、すべての分割データ２２のレコードの数は、上限レコード数２０５以下になる。以下において、ステップ１１０９における分割の例を示す。 In step 1109, the first key range is divided, and the number of records of all the divided data 22 becomes the upper limit record number 205 or less. In the following, an example of division in step 1109 is shown.

　ステップ１１０９においてデータ分割部１１００は、将来において分割が起こりにくいように、第１キーレンジの中でもレコード数の増加が将来見込まれるキー値の範囲を、一つのキーレンジとして分割するように、第１キーレンジを分割する。 In step 1109, the data dividing unit 1100 sets the first key range so as to divide the key value range in which the increase in the number of records is expected in the future as one key range so that division is unlikely to occur in the future. Divide the key range.

　例えば、第１キーレンジに、同一のキー値をもつレコードの数が第１キーレンジに含まれるレコードの数の１／２以上含まれる場合、データ分割部１１００は、ステップ１１０９において、レコードの数が第１キーレンジのレコードの数の１／２以上であるキー値とその他のキー値とにレコードを分割するように、第１キーレンジを分割する。 For example, when the number of records having the same key value is included in the first key range is ½ or more of the number of records included in the first key range, the data dividing unit 1100 determines the number of records in step 1109. The first key range is divided so that the record is divided into a key value that is 1/2 or more of the number of records in the first key range and other key values.

　また例えば、分布変動型２０１が分布変動型Ｃを示し、かつ、第１キーレンジが最終キーレンジである場合、データ分割部１１００は、ステップ１１０９において、最終キーレンジを、最終キーレンジの最小値から最大値までのキーレンジと、最終キーレンジの最大値に１を加算した値を最小値とするキーレンジとに分割する。 For example, when the distribution variation type 201 indicates the distribution variation type C and the first key range is the final key range, the data dividing unit 1100 sets the final key range to the minimum value of the final key range in step 1109. Is divided into a key range having a minimum value obtained by adding 1 to the maximum value of the final key range.

　また前述の例以外の場合、データ分割部１１００は、ステップ１１０９において、第１キーレンジの最小値から中央値（第１キーレンジに含まれるレコードの数を二分するキー値）と、中央値から最大値とに、第１キーレンジを分割する。 In cases other than the above example, in step 1109, the data dividing unit 1100 calculates the median value (key value that bisects the number of records included in the first key range) from the minimum value of the first key range and the median value. The first key range is divided into the maximum value.

　なお、ステップ１１０９において、予測数３０５が示す値が上限レコード数２０５の３倍以上になると判定された場合、データ分割部１１００は、第１キーレンジを三つ以上のキーレンジに分割してもよい。 If it is determined in step 1109 that the value indicated by the predicted number 305 is three times or more the upper limit record number 205, the data dividing unit 1100 may divide the first key range into three or more key ranges. Good.

　ここで、ステップ１１０８における判定に用いられる上限レコード数２０５は、ステップ１０３１における判定に用いられる上限レコード数２０５と異なる値でもよい。 Here, the upper limit record number 205 used for the determination in step 1108 may be a value different from the upper limit record number 205 used for the determination in step 1031.

　また、パラメータ管理テーブル２００が上限レコード数２０５を複数含み、データ分割部１１００が、ジョブの終了時刻としてあらかじめ指定された時刻までの時間が少ないと判定した場合、データ分割部１１００は、ステップ１１０９における処理時間を削減するため、パラメータ管理テーブル２００から値が大きい上限レコード数２０５を選択してもよい。そして、データ分割部１１００は、選択された上限レコード数２０５を、ステップ１１０８における判定に用いてもよい。 If the parameter management table 200 includes a plurality of upper limit record numbers 205 and the data dividing unit 1100 determines that the time until the time designated in advance as the job end time is short, the data dividing unit 1100 In order to reduce the processing time, the upper limit record number 205 having a large value may be selected from the parameter management table 200. Then, the data dividing unit 1100 may use the selected upper limit record number 205 for the determination in step 1108.

　ステップ１１０８においてキーレンジ管理テーブル３００の各エントリにレコード実数３０６が上限レコード数２０５を上回るエントリがないと判定された場合、又は、ステップ１１０９の後、データ分割部１１００は、キーレンジ管理テーブル３００の各エントリにレコード実数３０６が下限レコード数２０６を下回るエントリがあるか否かを判定する（１１１０）。レコード実数３０６が下限レコード数２０６を下回るエントリが示すキーレンジを、以下において、第２キーレンジと記載する。 If it is determined in step 1108 that each entry in the key range management table 300 has no entry whose record real number 306 exceeds the upper limit record number 205, or after step 1109, the data dividing unit 1100 includes the key range management table 300. It is determined whether each entry has an entry whose record real number 306 is lower than the lower limit record number 206 (1110). The key range indicated by the entry whose record real number 306 is lower than the lower limit record number 206 will be referred to as a second key range below.

　キーレンジ管理テーブル３００の各エントリにレコード実数３０６が下限レコード数２０６を下回るエントリがある場合、データ分割部１１００は、第２キーレンジに隣接するキーレンジのいずれかと、第２キーレンジとを統合する（１１１１）。そして、データ分割部１１００は、統合された結果に従って、キーレンジ管理テーブル３００を更新する。 When each entry in the key range management table 300 includes an entry whose record real number 306 is lower than the lower limit record number 206, the data dividing unit 1100 integrates one of the key ranges adjacent to the second key range and the second key range. (1111). Then, the data dividing unit 1100 updates the key range management table 300 according to the integrated result.

　具体的には、データ分割部１１００は、ステップ１１１１において、第２キーレンジの最小値３０１から１を減算した値を最大値３０２とするキーレンジ、及び、第２キーレンジの最大値３０２に１を加算した値を最小値３０１とするキーレンジを、第２キーレンジに隣接するキーレンジとして抽出する。そして、データ分割部１１００は、第２キーレンジに隣接するキーレンジのうち、第２キーレンジのレコード実数３０６が加算されても上限レコード数２０５を上回らないレコード実数３０６を有するキーレンジと、第２キーレンジとを統合する。 Specifically, in step 1111, the data dividing unit 1100 sets the value obtained by subtracting 1 from the minimum value 301 of the second key range to the maximum value 302 and 1 to the maximum value 302 of the second key range. A key range in which the value obtained by adding the minimum values 301 is extracted as a key range adjacent to the second key range. The data dividing unit 1100 includes a key range having a record real number 306 that does not exceed the upper limit record number 205 even if the record real number 306 of the second key range is added among the key ranges adjacent to the second key range; Integrate with 2 key range.

　例えば、第２キーレンジに隣接するキーレンジを示すキーレンジ管理テーブル３００のエントリのうち、最小値３０１が小さいエントリをエントリＡ、最小値３０１が大きいエントリをエントリＢと記載する。ここで、エントリＡのレコード実数３０６が、エントリＢのレコード実数３０６よりも小さく、エントリＡのレコード実数３０６と第２キーレンジのレコード実数３０６とを加算した結果が上限レコード数２０５以下である場合、データ分割部１１００は、エントリＡと第２キーレンジとを統合する。 For example, among the entries of the key range management table 300 indicating the key range adjacent to the second key range, an entry having a small minimum value 301 is described as an entry A and an entry having a large minimum value 301 is described as an entry B. Here, the record real number 306 of the entry A is smaller than the record real number 306 of the entry B, and the result of adding the record real number 306 of the entry A and the record real number 306 of the second key range is the upper limit record number 205 or less. The data dividing unit 1100 integrates the entry A and the second key range.

　より具体的には、データ分割部１１００は、キーレンジを統合するため、ステップ１１１１において、エントリＡの最大値３０２に第２キーレンジの最大値３０２の値を格納し、エントリＡのレコード実数３０６と第２キーレンジを示すエントリのレコード実数３０６との和をエントリＡのレコード実数３０６に格納する。そして、データ分割部１１００は、第２キーレンジを示すエントリを削除する。 More specifically, in order to integrate the key range, the data dividing unit 1100 stores the value of the maximum value 302 of the second key range in the maximum value 302 of the entry A in step 1111 and records the real number 306 of the entry A. And the record real number 306 of the entry indicating the second key range is stored in the record real number 306 of the entry A. Then, the data dividing unit 1100 deletes the entry indicating the second key range.

　ここで、第２キーレンジに隣接するキーレンジのうち、第２キーレンジのレコード実数３０６を加算されても上限レコード数２０５を上回らないレコード実数３０６を有するキーレンジがない場合、本実施形態のデータ分割部１１００は、第２キーレンジと他のキーレンジとを統合しない。 Here, if there is no key range having a record real number 306 that does not exceed the upper limit record number 205 even if the record real number 306 of the second key range is added among the key ranges adjacent to the second key range, The data dividing unit 1100 does not integrate the second key range and other key ranges.

　ステップ１１１１における処理は、キーレンジ管理テーブル３００を更新する処理であり、第２キーレンジが他のキーレンジと統合されて分割データ２２に出力される処理は、図１３に示す処理が次に実行された際に行われる。このため、計算機システム１は、ステップ１１０９の後ジョブを実行してもよく、これにより、ジョブの処理時間を短縮できる。 The processing in step 1111 is processing for updating the key range management table 300, and the processing shown in FIG. 13 is executed next when the second key range is integrated with other key ranges and output to the divided data 22. When it is done. For this reason, the computer system 1 may execute the job after step 1109, thereby reducing the job processing time.

　また、図１３に示すステップ１１１０において、データ分割部１１００は、レコード実数３０６に基づいて、入力データ２１に含まれるすべてのキーレンジの中から第２キーレンジを特定した。しかし、データ分割部１１００は、図８に示す処理において、予測数３０５に基づいて、統合するキーレンジの候補を抽出し、ステップ１１１０において、統合するキーレンジの候補の中から第２キーレンジを特定してもよい。 In step 1110 shown in FIG. 13, the data dividing unit 1100 specifies the second key range from all the key ranges included in the input data 21 based on the record real number 306. However, in the process shown in FIG. 8, the data dividing unit 1100 extracts key range candidates to be integrated based on the predicted number 305, and in step 1110, selects the second key range from the key range candidates to be integrated. You may specify.

　ステップ１１１０において、キーレンジ管理テーブル３００の各エントリにレコード実数３０６が下限レコード数２０６を下回るエントリがないと判定された場合、又は、ステップ１１１１の後、データ分割部１１００は、図１３の処理の開始時に取得されたジョブ名、現在日時、及び、入力データ２１のバイト数と、キーレンジ管理テーブル３００の全エントリの最小値３０１、最大値３０２及びレコード実数３０６とを、分割履歴１００に出力する（１１１２）。 If it is determined in step 1110 that each entry in the key range management table 300 has no entry whose record real number 306 is lower than the lower limit record number 206, or after step 1111, the data dividing unit 1100 performs the processing of FIG. The job name acquired at the start, the current date and time, the number of bytes of the input data 21, the minimum value 301, the maximum value 302, and the record real number 306 of all entries in the key range management table 300 are output to the division history 100. (1112).

　また、データ分割部１１００は、ステップ１１１２において、キーレンジ管理テーブル３００の分割点調査フラグ３０７の値をすべてＯＦＦに更新する。 In step 1112, the data division unit 1100 updates all the values of the division point investigation flag 307 of the key range management table 300 to OFF.

　ステップ１１１２の後、データ分割部１１００は、図１３に示す処理を終了する。 After step 1112, the data dividing unit 1100 ends the process shown in FIG.

　第１の実施形態によれば、計算機システム１は、分割履歴１００から入力データ２１の分布変動型を求め、さらに、各キーレンジにおけるレコード数の予測数を求めることによって、分割又は統合するキーレンジを決定する。このため、ジョブに入力される分割データ２２を適正な規模に保つことができる。 According to the first embodiment, the computer system 1 obtains the distribution variation type of the input data 21 from the division history 100, and further obtains the predicted number of records in each key range, thereby dividing or integrating the key range. To decide. For this reason, the divided data 22 input to the job can be maintained at an appropriate scale.

　さらに、第１の実施形態の計算機システム１は、ジョブを実行する直前において、入力データ２１全体を読む出すことなく決定されたキーレンジに従って、分割データ２２を出力する。このため、各分割データ２２を適正規模に保つための処理時間を、ジョブの実行時において短縮することができる。 Further, the computer system 1 of the first embodiment outputs the divided data 22 according to the determined key range without reading the entire input data 21 immediately before executing the job. For this reason, the processing time for maintaining each divided data 22 at an appropriate scale can be shortened when the job is executed.

　さらに具体的には、分布変動型、及び、予測数を求めることによって、分割又は統合するキーレンジの候補を特定することができ、ジョブの実行時において、候補のキーレンジのみを分割又は統合するか否かの判定の対象とするため、ジョブの実行時において入力データ２１に含まれるすべてのキーレンジを無駄に検証することを回避し、処理時間を短縮することができる。 More specifically, by finding the distribution variation type and the predicted number, the key range candidates to be divided or integrated can be specified, and only the candidate key ranges are divided or integrated at the time of job execution. Therefore, it is possible to avoid wastefully verifying all the key ranges included in the input data 21 at the time of job execution, and to shorten the processing time.

　（第２の実施形態） (Second embodiment)

　第２の実施形態におけるデータ分割部１１００は、ジョブの実行時において、キーレンジを分割又は統合した後に分割データ２２を出力せず、次回のジョブの実行時において、キーレンジが分割又は統合された結果に従って分割データ２２を出力する。これによって、第２の実施形態におけるデータ分割部１１００は、分割データ２２を出力するための処理に必要な処理時間を短縮できる。 The data dividing unit 1100 according to the second embodiment does not output the divided data 22 after dividing or integrating the key range at the time of executing the job, and the key range is divided or integrated at the time of executing the next job. The divided data 22 is output according to the result. As a result, the data dividing unit 1100 according to the second embodiment can reduce the processing time required for the process for outputting the divided data 22.

　図１４は、本発明の第２の実施形態のデータ分割部１１００によるキーレンジの分割処理を示すフローチャートである。 FIG. 14 is a flowchart showing a key range dividing process by the data dividing unit 1100 according to the second embodiment of this invention.

　図１４に示す処理は、図１３に示す処理と同じく、実行されるジョブに入力データ２１が入力された際に実行される処理である。 The process shown in FIG. 14 is a process executed when the input data 21 is input to the job to be executed, like the process shown in FIG.

　図１４のステップ１１０１～ステップ１１０２は、図１３と同じである。 Step 1101 to step 1102 in FIG. 14 are the same as those in FIG.

　第２の実施形態におけるデータ分割部１１００は、特定されたエントリの分割点調査フラグ３０７がＯＮであるとステップ１１０２において判定された場合、ステップ１１０１において読み出されたレコードのキー値をレコードバッファ１４に格納する（１１０３ｂ）。 If the data division unit 1100 in the second embodiment determines in step 1102 that the division point investigation flag 307 of the identified entry is ON, the data division unit 1100 uses the key value of the record read in step 1101 as the record buffer 14. (1103b).

　また、特定されたエントリの分割点調査フラグ３０７がＯＮではないとステップ１１０２において判定された場合、又は、ステップ１１０３ｂの後、データ分割部１１００は、キーレンジ管理テーブル３００が示すキーレンジに従って、レコードを分割データ２２に出力する（１１０４）。 When it is determined in step 1102 that the division point investigation flag 307 of the identified entry is not ON, or after step 1103b, the data division unit 1100 records according to the key range indicated by the key range management table 300. Is output to the divided data 22 (1104).

　ステップ１１０４において、第２の実施形態のデータ分割部１１００は、入力データ２１に含まれるすべてのレコードを分割データ２２に出力する。これは、第２の実施形態における図１４に示す処理は、キーレンジ管理テーブル３００における各エントリを生成しなおすための処理であり、分割又は統合された結果に従って入力データ２１を分割データ２２に出力する処理ではないためである。 In step 1104, the data dividing unit 1100 of the second embodiment outputs all records included in the input data 21 to the divided data 22. The process shown in FIG. 14 in the second embodiment is a process for regenerating each entry in the key range management table 300, and the input data 21 is output to the divided data 22 in accordance with the result of division or integration. This is because it is not a process to perform.

　このため、図１４に示すステップ１１０４の後ジョブが実行されてもよいため、データ分割部１１００が図１４に示す処理を用いる場合、図１３に示す処理を用いる場合よりもジョブが開始するまでの時間を短縮することができる。 For this reason, since the job after step 1104 shown in FIG. 14 may be executed, when the data dividing unit 1100 uses the process shown in FIG. 14, the job is started more than when the process shown in FIG. 13 is used. Time can be shortened.

　図１４のステップ１１０５及びステップ１１０６は、図１３のステップ１１０５及びステップ１１０６と同じである。 Steps 1105 and 1106 in FIG. 14 are the same as steps 1105 and 1106 in FIG.

　ステップ１１０６において、入力データ２１に含まれるすべてのレコードが分割データ２２に出力されたと判定された場合、データ分割部１１００は、レコードバッファ１４に格納されるレコードのキー値を、例えば昇順などに、ソートする（１１０７ｂ）。 When it is determined in step 1106 that all records included in the input data 21 have been output to the divided data 22, the data dividing unit 1100 sets the key values of the records stored in the record buffer 14 in ascending order, for example. Sort (1107b).

　図１４のステップ１１０８及びステップ１１１０は、図１３のステップ１１０８及びステップ１１１０と同じである。 Steps 1108 and 1110 in FIG. 14 are the same as steps 1108 and 1110 in FIG.

　図１４のステップ１１０９ｂは、図１３のステップ１１０９と同様であるが、分割データ２２にレコードを出力しない点のみが図１３のステップ１１０９及びステップ１１１１と異なる。 14 is the same as step 1109 in FIG. 13 except that a record is not output to the divided data 22 in steps 1109 and 1111 in FIG.

　図１４のステップ１１１１及びステップ１１１２は、図１３のステップ１１１１及びステップ１１１２と同じである。 Step 1111 and step 1112 in FIG. 14 are the same as step 1111 and step 1112 in FIG.

　図１４に示す処理によって、キーレンジ管理テーブル３００は更新される。そして、ジョブが次回実行される際、更新されたキーレンジ管理テーブル３００によって図１４に示すステップ１１０４が実行され、更新されたキーレンジ管理テーブル３００に従って、分割データ２２が生成される。 The key range management table 300 is updated by the processing shown in FIG. When the job is executed next time, step 1104 shown in FIG. 14 is executed by the updated key range management table 300, and the divided data 22 is generated according to the updated key range management table 300.

　第２の実施形態によれば、ジョブを実行する際に分割又は統合された後のキーレンジを示すキーレンジ管理テーブル３００を生成し、次にジョブを実行する際に、先のジョブの実行の際に生成されたキーレンジ管理テーブル３００によって、入力データ２１のレコードを分割データに出力する。このため、ジョブの実行時に行われる、キーレンジを分割又は統合するための処理を短縮することができる。 According to the second embodiment, the key range management table 300 indicating the key range after being divided or integrated when the job is executed is generated, and the next job is executed when the job is executed next. The record of the input data 21 is output to the divided data by the key range management table 300 generated at that time. For this reason, it is possible to shorten the processing for dividing or integrating the key range, which is performed when the job is executed.

　本実施形態によれば、計算機システム１は、分割履歴１００から入力データ２１の分布変動型を求め、さらに、各キーレンジにおけるレコード数の予測数を求めることによって、分割又は統合するキーレンジを決定する。このため、ジョブに入力される分割データ２２を適正な規模に保つことができる。 According to this embodiment, the computer system 1 determines a distribution range type of the input data 21 from the division history 100, and further determines a key range to be divided or integrated by obtaining a predicted number of records in each key range. To do. For this reason, the divided data 22 input to the job can be maintained at an appropriate scale.

　そして、第１の実施形態の計算機システム１は、ジョブを実行する直前において、入力データ２１全体を読む出すことなく決定されたキーレンジに従って、分割データ２２を出力する。このため、各分割データ２２を適正規模に保つための処理時間を、ジョブの実行時において短縮することができる。 The computer system 1 according to the first embodiment outputs the divided data 22 according to the determined key range without reading the entire input data 21 immediately before executing the job. For this reason, the processing time for maintaining each divided data 22 at an appropriate scale can be shortened when the job is executed.

　以上、本発明を添付の図面を参照して詳細に説明したが、本発明はこのような具体的構成に限定されるものではなく、添付した請求の範囲の趣旨内における様々な変更及び同等の構成を含むものである。 Although the present invention has been described in detail with reference to the accompanying drawings, the present invention is not limited to such specific configurations, and various modifications and equivalents within the spirit of the appended claims Includes configuration.

　本願発明は、分散ジョブを処理する計算機システムにおいて利用することができる。 The present invention can be used in a computer system that processes distributed jobs.

Claims

A data processing method in a computer system for dividing input data including a plurality of records and processing each of the divided input data by each of a plurality of jobs,
The computer system includes a processor and a memory,
A key value is assigned to each of the plurality of records,
The key value is classified into one of key ranges defined by a minimum value and a maximum value,
The method
When the processor processes the records divided according to the key ranges by the plurality of jobs, the number of the processed records and the key value assigned to the processed records are classified. Get history showing range and
The processor determines a pattern of change in the number of records included in the input data based on the acquired plurality of histories,
The data processing method, wherein the processor determines the key range for changing at least one of the minimum value and the maximum value according to the determined change pattern.

A data processing method according to claim 1, comprising:
The method
If each of the histories indicates that a predetermined number of records are processed periodically by the job, the processor determines that the pattern of change in the number of records is a first pattern;
When each history indicates that the amount of change in the number of records classified into each key range is equal to or less than a predetermined value in all the key ranges, the processor records the records included in the input data. The number change pattern is determined to be the second pattern,
In each of the histories, the amount of change in the number of records classified into the first key range in which the key value having the largest value is classified is recorded in a record classified into a key range other than the first key range. When indicating that the amount of change is greater than the number change amount, the processor determines that the pattern of change in the number of records included in the input data is the third pattern,
If it is not determined that the pattern of change in the number of records included in the input data is the first pattern, the second pattern, or the third pattern, the processor includes the input data. The pattern of change in the number of records to be determined is the fourth pattern,
When the determined change pattern is the third pattern, the processor determines to change at least one of the minimum value and the maximum value of the first key range. .

A data processing method according to claim 2, comprising:
The computer system holds an upper limit value of the number of records classified into each key range,
The method
The processor predicts the number of records classified into each key range when the input data is processed by the job based on the number of records classified into each key range indicated by each history,
The processor calculates a ratio of the second number of key ranges in which the number of predicted records exceeds the upper limit value to the number of all the key ranges;
The processor determines whether the calculated percentage exceeds a predetermined percentage;
As a result of the determination, the processor determines that the calculated ratio exceeds the predetermined ratio, and the determined change pattern is the first pattern, the second pattern, and the fourth pattern. A data processing method comprising: determining that at least one of the minimum value and the maximum value of all the key ranges is changed when it is determined that the pattern is one of the patterns.

A data processing method according to claim 3, comprising:
The computer system holds the minimum value and the maximum value of the key range in the memory,
The method
When the input data is input to the computer system,
The processor divides a record classified into a key range other than the second key range included in the input data according to a minimum value and a maximum value of the key range;
The processor stores in the memory a record classified into a second key range of the input data;
The data, wherein the processor divides the record classified in the second key range stored in the memory according to at least one of the changed minimum value and maximum value of the second key range. Processing method.

A data processing method according to claim 3, comprising:
In the method, the processor changes at least one of the minimum value and the maximum value of the second key range so that the number of records classified into the second key range is equal to or less than the predetermined upper limit value. By doing so, the key value classified into the second key range is classified into the second key range and the third key range.

A data processing method according to claim 3, comprising:
The computer system holds a lower limit value of the number of records classified into the key ranges,
The method
The processor identifies a fourth key range in which the number of records is below the lower limit;
Identifying a fifth key range adjacent to the fourth key range;
If the sum of the number of records classified into the fourth key range and the number of records classified into the fifth key range does not exceed the upper limit value, it is classified into the fifth key range. A data processing method comprising: changing at least one of a minimum value and a maximum value of the fourth key range so that the key value is classified into the fourth key range.

A data processing method according to claim 3, comprising:
The computer system maintains a rule for predicting the number of records according to an increase / decrease pattern of the number of records classified into each key range,
The method
The processor specifies an increase / decrease pattern of the number of records classified into each key range indicated by each history,
Data processing characterized by predicting the number of records classified into each key range when the input data is processed by the job, using the specified increase / decrease pattern and the retained rule Method.

A data processing program in a computer system that divides input data including a plurality of records and processes each of the divided input data by each of a plurality of jobs,
The computer system includes a processor and a memory,
A key value is assigned to each of the plurality of records,
The key value is classified into one of key ranges defined by a minimum value and a maximum value,
The data processing program is stored in the computer system.
When the records divided according to each key range are processed by the plurality of jobs, the number of the processed records and the key range into which the key values assigned to the processed records are classified The steps to get the history,
A procedure for determining a pattern of change in the number of records included in the input data based on the plurality of acquired histories;
A data processing program for executing a procedure for determining the key range for changing at least one of the minimum value and the maximum value in accordance with the determined change pattern.

A data processing program according to claim 8, wherein
The data processing program is
A procedure for determining that the pattern of change in the number of records is a first pattern when each of the histories indicates that a predetermined number of records are periodically processed by the job;
When each history indicates that the amount of change in the number of records classified into each key range is equal to or less than a predetermined value in all the key ranges, the change in the number of records included in the input data A procedure for determining that the pattern is the second pattern;
In each of the histories, the amount of change in the number of records classified into the first key range in which the key value having the largest value is classified is recorded in a record classified into a key range other than the first key range. A procedure for determining that the pattern of change in the number of records included in the input data is a third pattern when indicating that the amount of change is greater than the number of changes;
The number of records included in the input data when the pattern of change in the number of records included in the input data is not determined to be the first pattern, the second pattern, or the third pattern A procedure for determining that the pattern of change is the fourth pattern;
A data processing program for executing a procedure for determining that at least one of the minimum value and the maximum value of the first key range is changed when the determined change pattern is the third pattern.

A data processing program according to claim 9, wherein
The computer system holds an upper limit value of the number of records classified into each key range,
The data processing program is stored in the computer system.
A procedure for predicting the number of records classified into each key range when the input data is processed by the job based on the number of records classified into each key range indicated by each history;
Calculating a ratio of the second number of key ranges in which the number of predicted records exceeds the upper limit to the number of all the key ranges;
A procedure for determining whether the calculated ratio exceeds a predetermined ratio;
As a result of the determination, the calculated ratio exceeds the predetermined ratio, and the determined change pattern is one of the first pattern, the second pattern, and the fourth pattern. A data processing program for executing a procedure for determining to change at least one of the minimum value and the maximum value of all the key ranges when it is determined that

A data processing program according to claim 10, wherein
The computer system holds the minimum value and the maximum value of the key range in the memory,
The data processing program is stored in the computer system.
When the input data is input to the computer system,
Dividing a record classified into a key range other than the second key range included in the input data according to a minimum value and a maximum value of the key range;
Storing a record classified into a second key range of the input data in the memory;
Data processing for executing a procedure for dividing a record classified into the second key range stored in the memory according to at least one of the changed minimum value and maximum value of the second key range program.

A computer system that divides input data including a plurality of records and processes each of the divided input data by each of a plurality of jobs,
The computer system includes a processor and a memory,
A key value is assigned to each of the plurality of records,
The key value is classified into one of key ranges defined by a minimum value and a maximum value,
The computer system is
When the records divided according to each key range are processed by the plurality of jobs, the number of the processed records and the key range into which the key values assigned to the processed records are classified Get history,
Based on the plurality of acquired histories, determine a pattern of change in the number of records included in the input data,
A computer system, wherein the key range for changing at least one of the minimum value and the maximum value is determined according to the determined change pattern.

A computer system according to claim 12, comprising:
The computer system is
When each of the histories indicates that a predetermined number of records are periodically processed by the job, it is determined that the pattern of change in the number of records is the first pattern;
When each history indicates that the amount of change in the number of records classified into each key range is equal to or less than a predetermined value in all the key ranges, the change in the number of records included in the input data Determine that the pattern is the second pattern,
In each of the histories, the amount of change in the number of records classified into the first key range in which the key value having the largest value is classified is recorded in a record classified into a key range other than the first key range. If it indicates that the amount of change is greater than the number of changes, the pattern of change in the number of records included in the input data is determined to be the third pattern,
The number of records included in the input data when the pattern of change in the number of records included in the input data is not determined to be the first pattern, the second pattern, or the third pattern It is determined that the pattern of change is the fourth pattern,
When the determined change pattern is the third pattern, it is determined to change at least one of the minimum value and the maximum value of the first key range.

A computer system according to claim 13, comprising:
The computer system is
Holding an upper limit of the number of records classified into each of the key ranges;
Based on the number of records classified into each key range indicated by each history, predict the number of records classified into each key range when the input data is processed by the job,
Calculating the ratio of the second number of key ranges in which the number of predicted records exceeds the upper limit to the number of all the key ranges;
Determining whether the calculated percentage exceeds a predetermined percentage;
As a result of the determination, the calculated ratio exceeds the predetermined ratio, and the determined change pattern is one of the first pattern, the second pattern, and the fourth pattern. When it is determined that the minimum value and the maximum value of all the key ranges are determined, it is determined to change.

A computer system according to claim 14, wherein
The computer system is
Holding the minimum and maximum values of the key range in the memory;
When the input data is input to the computer system,
Dividing a record classified into a key range other than the second key range included in the input data according to a minimum value and a maximum value of the key range;
Storing a record classified in a second key range of the input data in the memory;
A computer system that divides records classified into the second key range stored in the memory according to at least one of the changed minimum value and maximum value of the second key range.