JP2000322334A

JP2000322334A - Automatic monitoring system for input and output performance

Info

Publication number: JP2000322334A
Application number: JP11132408A
Authority: JP
Inventors: Masahiro Shirasaka; 雅浩白坂
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 1999-05-13
Filing date: 1999-05-13
Publication date: 2000-11-24

Abstract

PROBLEM TO BE SOLVED: To provide an automatic monitoring system for input and output performance surely and early monitoring the deterioration of the input and output performance of an input and output device. SOLUTION: In this automatic monitoring system for input and output performance, the estimated maximum number of input and output competition, maximum length of transmission data, and performance monitor interval value are stored in a main storage device 20 by an inputting means 11 of information specific to a system, and input and output execution with the maximum input and output data length to a real device and estimated input and output performance upper limit calculation is operated by an input and output time sampling means 12, and the load of each input and output path is averaged by an input and output load distribution controlling means 13, and the accumulating totals of the necessary input and output time of each input and output path and the number of times of executed input and output are stored into the device 20 by an input and output statistical information obtaining means 14, and an abnormal part is specified from an input and output performance upper limit value by an input and output performance monitoring means 15, and the path of the abnormal part is closed at the time of the invalidity of use by a device re-constituting means 16, and the fault contents and fault parts are communicated to a fault phenomenon communicating means 17.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、計算機システム
の周辺装置に対する複数の入出力経路の負荷分散制御を
実施しているシステムにおいて、入出力ごとに要した時
間をアクセス経路ごとに常時監視を行うことで性能低下
を検出し、装置構成制御機能と連動してシステム運用へ
の影響の局所化を図るとともに、その要因情報を取得し
て予防保守を行う契機を保守・運用部門に通知すること
により、安定したシステム運用の可用性を向上させるよ
うにした入出力自動性能監視システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for performing load distribution control of a plurality of input / output paths for peripheral devices of a computer system, and constantly monitors the time required for each input / output for each access path. In addition to detecting the degradation in performance and localizing the effect on system operation in conjunction with the device configuration control function, the maintenance and operation department is notified of the opportunity to perform preventive maintenance by acquiring the factor information. The present invention relates to an automatic input / output performance monitoring system designed to improve the availability of stable system operation.

【０００２】[0002]

【従来の技術】従来、周辺制御装置に対する入出力性能
の低下は、キャッシュの有無やＲＡＩＤ（Redundant A
rray Of Inexpensive Discs）構成などの装置特有機
能（静的な情報）により大きな影響を受けるために、動
的に個々の入出力に対する性能低下と判断するのは困難
であった。また、性能低下は上位アプリケーションなど
の遅延や、上位アプリケーションのタイムアウトに依存
するケースも多いが、いずれも事後事象として検出し、
判断をせざるを得なかった。2. Description of the Related Art Conventionally, a decrease in input / output performance with respect to a peripheral control device is caused by the presence or absence of a cache or RAID (Redundant
Since it is greatly affected by device-specific functions (static information) such as a configuration of rray of inexpensive discs, it has been difficult to dynamically judge that the performance of each input / output is degraded. In addition, the performance degradation often depends on the delay of the higher-level application and the timeout of the higher-level application.
I had to judge.

【０００３】[0003]

【発明が解決しようとする課題】これらの問題点は、第
１に、性能低下の原因はスラッシング多発しているケー
スや、ジョブのＣＰＵループなどの多くの要因が考えら
れるため、一概に障害と断定することができず、具体的
に障害発生と判断するためにはトレース資料を解析し、
原因究明を行う必要があり、実際に性能低下が障害に起
因している状態でも長期に亘って気付かない可能性があ
り得たという課題がある。この結果、予防保守の契機を
失うと同時に、その後の装置障害によるシステムへの影
響が懸念されるという課題があった。また、第２に、性
能低下の際、原因を調査して障害内容を特定、回避策を
実行する作業は人手によって行う必要があるという課題
があった。These problems are as follows. First, the cause of performance degradation is frequent thrashing and many factors such as a job CPU loop can be considered. Analyze the trace data to determine that a failure has occurred,
There is a problem that it is necessary to investigate the cause, and there is a possibility that even if the performance degradation is actually caused by a failure, it may not be noticed for a long time. As a result, there is a problem that the opportunity for preventive maintenance is lost, and at the same time, there is a concern that the subsequent device failure may affect the system. Second, when the performance is degraded, there is a problem that the work of investigating the cause, identifying the content of the fault, and executing the workaround must be performed manually.

【０００４】この発明は、上記のような従来の課題を解
決するためになされたもので、より確実に、早期に、容
易に入出力装置に対する入出力性能低下を監視すること
ができ、自動的に障害経路を使用不可状態とすることに
より、入出力経路障害による性能低下の自動回避を装置
種別に依存せずに行うことができる入出力自動性能監視
システムを提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned conventional problems, and it is possible to more reliably, early, and easily monitor a decrease in input / output performance with respect to an input / output device. It is an object of the present invention to provide an input / output automatic performance monitoring system that can automatically avoid performance degradation due to an input / output path failure by disabling a failed path.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に、この発明の入出力自動性能監視システムは、あらか
じめ想定される最大入出力競合数を性能影響要因として
取得するとともに、論理的な単一入出力の対象となり得
る最大転送データ長と、個々のシステム運用に応じた適
切な性能監視インターバル値を取得し、かつ取得した情
報を主記憶装置に格納するシステム固有情報の入力手段
と、上記主記憶装置に格納されているシステム固有情報
を参照して実装置に対する最大入出力データ長での入出
力を実施し、性能結果と最大入出力競合数と上記主記憶
装置に格納されているハードウェア構成情報に基づき予
想される入出力性能の上限値を算出して上記主記憶装置
に格納する入出力時間のサンプリング手段と、上記主記
憶装置から使用可能な径路情報を取得し、入出力要求ご
とに使用径路をラウンドロビン方式により決定し、ホス
ト接続状態、入出力種別によらず、各入出力径路の負荷
を平均化する入出力負荷分散制御手段と、個々の入出力
径路ごとに要した入出力時間の累計と、実行された入出
力回数を上記主記憶装置に記憶する入出力統計情報取得
手段と、定期的に上記主記憶装置に格納されている入出
力統計情報を参照して上記主記憶装置に格納されている
上記入出力性能上限値から異常箇所の特定化を行う入出
力性能監視手段と、上記入出力性能監視手段により上記
入出力径路の異常箇所を特定して使用不可能と判断した
場合に異常箇所の径路を閉塞するとともに、上記ハード
ウェア構成情報を変更して対象径路を入出力での使用不
可能径路とする装置再構成制御手段と、遠隔地の保守端
末、運用部門に対し遠隔監視端末を通して予測する障害
内容と障害箇所を通知する障害事象通知手段とを備える
ことを特徴とする。In order to achieve the above object, an automatic input / output performance monitoring system according to the present invention obtains a maximum number of input / output conflicts assumed in advance as a performance influence factor, and obtains a logical unit. An input means for system-specific information for acquiring a maximum transfer data length that can be subjected to one input / output and an appropriate performance monitoring interval value according to each system operation, and storing the acquired information in a main storage device; Performs input / output with the maximum input / output data length to / from the actual device with reference to the system-specific information stored in the main storage device, and executes the performance result, the maximum number of input / output conflicts, and the hardware stored in the main storage device. An input / output time sampling unit for calculating an upper limit value of expected input / output performance based on the hardware configuration information and storing the calculated upper limit value in the main storage device, and usable from the main storage device I / O load distribution control means that obtains appropriate route information, determines a use route for each I / O request by a round robin method, and averages the load of each I / O route regardless of the host connection state and the I / O type. A total of input / output time required for each input / output path, and input / output statistical information acquisition means for storing the number of executed input / output in the main storage device, and periodically stored in the main storage device. Input / output performance monitoring means for specifying an abnormal point from the input / output performance upper limit value stored in the main storage device with reference to the input / output statistical information, and the input / output path by the input / output performance monitoring means. Device reconfiguration control that, when it is determined that the abnormal path is unusable and the path is abnormal, the path of the abnormal point is closed, and the hardware configuration information is changed to make the target path an unusable path for input / output. means , Characterized in that it comprises a remote maintenance terminal, and a fault event notification means for notifying the fault content and fault location predicting through remote monitoring terminal to operations department.

【０００６】上記の実現のため、システム固有情報の入
力手段によりあらかじめ想定される最大入出力競合数、
論理的な単一入出力の対象となり得る最大転送データ
長、個々のシステム運用に応じた適切な性能監視インタ
ーバル値を取得して主記憶装置に格納し、入出力時間の
サンプリング手段により実装置に対する最大入出力デー
タ長での入出力を実施し、かつ、性能結果と最大入出力
競合数と主記憶装置に格納されているハードウェア構成
情報に基づき予想される入出力性能の上限値を算出して
主記憶装置に格納し、入出力負荷分散制御手段により主
記憶装置から使用可能な径路情報を取得し、入出力要求
ごとに使用径路をラウンドロビン方式により決定し、ホ
スト接続状態、入出力種別によらず、各入出力径路の負
荷を平均化し、入出力統計情報取得手段により個々の入
出力径路ごとに要した入出力時間の累計と、実行された
入出力回数を主記憶に記憶し、入出力性能監視手段によ
り定期的に主記憶装置に格納されている入出力統計情報
を参照して主記憶装置に格納されている入出力性能上限
値から異常箇所の特定化を行い、装置再構成制御手段で
使用不可能と判断した異常径路を閉塞するとともに、ハ
ードウェア構成情報を変更して対象径路を入出力での使
用不可能径路とし、遠隔地の保守端末、運用部門に対し
遠隔監視端末を通して予測する障害内容と障害箇所を障
害事象通知手段により通知するようにしたので、より確
実に、早期に容易に入出力装置に対する入出力性能低下
を監視することができ、自動的に障害経路を使用不可状
態にして、入出力経路障害による性能低下の自動回避を
装置種別に依存せずに行うことができる。In order to realize the above, the maximum number of input / output conflicts assumed in advance by the input means of the system-specific information,
Obtain the maximum transfer data length that can be the target of a logical single input / output, an appropriate performance monitoring interval value according to each system operation, store it in the main memory, and use the input / output time sampling means to Perform I / O with the maximum I / O data length, and calculate the expected upper limit of I / O performance based on the performance results, the maximum number of I / O conflicts, and the hardware configuration information stored in the main storage device. The available path information is obtained from the main storage apparatus by the input / output load distribution control means, the used path is determined by a round robin method for each input / output request, the host connection state, the input / output type Regardless of this, the load of each input / output path is averaged, and the total input / output time required for each input / output path by the input / output statistical information acquisition means and the number of executed input / output times are stored in main memory. The I / O performance monitoring means periodically specifies the abnormal part from the I / O performance upper limit value stored in the main storage device with reference to the I / O statistical information stored in the main storage device, In addition to closing the abnormal path determined to be unusable by the device reconfiguration control means, changing the hardware configuration information to make the target path an unusable path for input / output, the remote maintenance terminal, operation department Since the failure content and the location of the failure to be predicted are notified through the remote monitoring terminal using the failure event notification means, it is possible to more reliably, early, and easily monitor the input / output performance degradation of the input / output device, and automatically By setting the faulty path to the unusable state, it is possible to automatically avoid performance degradation due to an input / output path fault without depending on the device type.

【０００７】[0007]

【発明の実施の形態】次に、この発明による入出力自動
性能監視システムの実施の形態について図面に基づき説
明する。この発明による第１実施の形態では、図１に示
すように、システム固有情報の入力手段１１、入出力時
間のサンプリング手段１２、入出力要求に対する入出力
負荷分散制御手段１３、入出力経路ごとの入出力統計情
報取得手段１４、入出力性能監視手段１５、装置再構成
制御手段１６、外部への障害事象通知手段１７および主
記憶装置２０を具備している。これらの詳細について
は、図１により後に説明する。Next, an embodiment of an automatic input / output performance monitoring system according to the present invention will be described with reference to the drawings. In the first embodiment according to the present invention, as shown in FIG. 1, input means 11 for system-specific information, sampling means 12 for input / output time, input / output load distribution control means 13 for input / output requests, The system includes an input / output statistical information acquisition unit 14, an input / output performance monitoring unit 15, a device reconfiguration control unit 16, an external failure event notification unit 17, and a main storage device 20. These details will be described later with reference to FIG.

【０００８】一般的に入出力性能は入出力装置の処理性
能、入出力負荷および障害などの異常事象に起因する。
そこで、この第１実施の形態では、複数のノードからの
入出力要求を同時に処理し得る入出力装置に対して入出
力経路単位でその性能を監視し、異常事象による性能低
下、および、異常な入出力要求負荷増に起因した性能低
下を動的に検出し、装置の再構成、予防保守作業および
運用機能に対する警告通知も可能とするものである。In general, input / output performance is caused by abnormal events such as processing performance of input / output devices, input / output load, and faults.
Therefore, in the first embodiment, the performance of an input / output device that can simultaneously process input / output requests from a plurality of nodes is monitored for each input / output path, and performance degradation due to an abnormal event and abnormal It is intended to dynamically detect a performance drop caused by an increase in the load required for input / output, and also to perform a reconfiguration of a device, a preventive maintenance operation, and a warning notification for an operation function.

【０００９】次に、この第１実施の形態が適用される動
作環境構成例について説明する。図２はこの動作環境の
構成例を示すブロック図である。この図２において、入
出力サブシステム５は複数の入出力装置６を有してお
り、接続されている複数のホスト＃１〜ホスト＃３から
の入出力要求を同時に処理することが可能な入出力シス
テムである。また、ホスト＃１〜ホスト＃３と入出力サ
ブシステム５の間には、複数の入出力経路４を有してい
る。ホスト＃１〜ホスト＃３には遠隔地からの運用監視
を目的とした監視端末７を有している。Next, an example of an operating environment configuration to which the first embodiment is applied will be described. FIG. 2 is a block diagram showing a configuration example of this operation environment. 2, the input / output subsystem 5 has a plurality of input / output devices 6, and is capable of simultaneously processing input / output requests from a plurality of connected hosts # 1 to # 3. Output system. Further, a plurality of input / output paths 4 are provided between the host # 1 to the host # 3 and the input / output subsystem 5. The host # 1 to the host # 3 have a monitoring terminal 7 for monitoring operation from a remote place.

【００１０】次に、図１に示すこの発明による第１実施
の形態の構成例について詳細に説明する。この図１はブ
ロック図として示しており、図１において、システム固
有情報の入力手段１１は、ＨＷ（ハードウェア）特性に
依存しない入出力性能要因情報を外部から取り込む手段
である。この発明のシステムが動作する環境では、図２
に示したように、複数のホスト＃１〜ホスト＃３からの
入出力要求による競合が起こり得るため、あらかじめ入
出力サブシステム５内での入出力待ち合わせを入出力性
能低下要因として考慮する必要がある。Next, the configuration example of the first embodiment according to the present invention shown in FIG. 1 will be described in detail. FIG. 1 is a block diagram. In FIG. 1, the input unit 11 for system-specific information is a unit for externally inputting input / output performance factor information that does not depend on HW (hardware) characteristics. In an environment in which the system of the present invention operates, FIG.
As described above, since a conflict may occur due to input / output requests from a plurality of hosts # 1 to # 3, it is necessary to consider input / output queuing in the input / output subsystem 5 in advance as a factor of input / output performance degradation. is there.

【００１１】そのため、システム固有情報の入力手段１
１ではあらかじめ想定される最大入出力競合数を性能影
響要因として取得する。また、論理的な単一入出力の対
象となり得る最大転送データ長と、個々のシステム運用
に応じた適切な性能監視インターバル値を取得する。こ
れらは、入出力サブシステムのＨＷ性能要因とは別に、
システムの運用方法に起因した性能要因である。システ
ム固有情報の入力手段１１で取得した情報は、主記憶装
置２０上のシステム固有情報領域２１、主記憶装置２０
上の性能監視インターバル値領域２２に格納される。[0011] Therefore, input means 1 for system-specific information
In step 1, the maximum number of input / output conflicts assumed in advance is acquired as a performance affecting factor. In addition, a maximum transfer data length that can be a target of logical single input / output and an appropriate performance monitoring interval value according to each system operation are acquired. These are separate from the HW performance factors of the I / O subsystem,
This is a performance factor due to the system operation method. The information acquired by the system unique information input unit 11 is stored in the system unique information area 21 on the main storage device 20 and the main storage device 20.
It is stored in the performance monitoring interval value area 22 above.

【００１２】入出力時間のサンプリング手段１２では、
システム固有情報領域２１に格納されているシステム固
有情報を参照して、実装置に対する最大入出力データ長
での入出力を実施し、この性能結果と最大入出力競合
数、主記憶装置２０内のＨＷ構成情報領域２３に格納さ
れているＨＷ構成情報に基づき、当該システムで予想さ
れる入出力性能の上限値を算出し、主記憶装置２０内の
入出力性能上限値領域２４に格納する。主記憶装置２０
のＨＷ構成情報領域２３に格納されているＨＷ構成情報
はこのシステムのＨＷ構成に起因した固定情報（サブシ
ステム５に対する入出力経路、入出力装置台数等）であ
り、静的情報として主記憶装置２０内にあらかじめ格納
されているものとする。In the input / output time sampling means 12,
With reference to the system-specific information stored in the system-specific information area 21, input / output with the maximum input / output data length for the real device is performed. Based on the HW configuration information stored in the HW configuration information area 23, the upper limit of the input / output performance expected in the system is calculated and stored in the input / output performance upper limit area 24 in the main storage device 20. Main storage device 20
The HW configuration information stored in the HW configuration information area 23 is fixed information (input / output paths to the subsystem 5, the number of input / output devices, etc.) due to the HW configuration of this system, and is stored as static information in the main storage device. 20 is stored in advance.

【００１３】入出力負荷分散制御手段１３では、ＨＷ構
成情報領域２３に格納されているＨＷ構成情報から使用
可能な経路情報を取得し、入出力要求ごとに使用経路を
ラウンドロビン方式により決定する。この入出力負荷分
散制御手段１３によりホスト接続状態、入出力種別によ
らず、各入出力経路の負荷は平均化される。The input / output load distribution control means 13 obtains usable path information from the HW configuration information stored in the HW configuration information area 23, and determines a use path for each input / output request by a round robin method. The load of each input / output path is averaged by the input / output load distribution control means 13 irrespective of the host connection state and the input / output type.

【００１４】入出力統計情報取得手段１４では、個々の
入出力経路ごとに要した入出力時間の累計と、実行され
た入出力回数を主記憶装置２０内の入出力統計情報領域
２５に記憶する。入出力性能監視手段１５では、定期的
に主記憶装置２０上の入出力統計情報領域２５に格納さ
れている入出力統計情報を参照し、単一入出力当たりに
要した時間の平均を入出力経路ごとに算出し、算出した
結果と入出力性能上限値領域２４に格納されている入出
力性能上限値から異常箇所の特定化を行う。ここで、異
常個所が特定された場合は装置再構成制御手段１６を通
して異常経路を閉塞し、障害事象通知手段１７にてその
障害内容を通知する。The input / output statistical information acquiring means 14 stores the total input / output time required for each input / output path and the number of executed input / output operations in the input / output statistical information area 25 in the main storage device 20. . The input / output performance monitoring means 15 periodically refers to the input / output statistical information stored in the input / output statistical information area 25 on the main storage device 20 and calculates the average of the time required for a single input / output. The calculation is performed for each route, and the abnormal point is specified based on the calculation result and the input / output performance upper limit value stored in the input / output performance upper limit area 24. Here, when an abnormal part is specified, the abnormal path is closed through the device reconfiguration control means 16 and the failure event notification means 17 notifies the details of the failure.

【００１５】装置再構成制御手段１６は、経路が障害な
どにより使用不可と判断された場合、ＨＷ構成情報領域
２３のＨＷ構成情報を変更することにより対象経路を入
出力での使用不可経路とする。障害事象通知手段１７で
は、遠隔地の保守端末、運用部門に対し、図２の遠隔監
視端末７を通して予測する障害内容と障害箇所を通知す
る。これにより、速やかな保守作業を可能とする。When it is determined that the route is unusable due to a failure or the like, the apparatus reconfiguration control means 16 changes the HW configuration information in the HW configuration information area 23 to make the target route an unusable route for input / output. . The fault event notifying means 17 notifies the maintenance terminal and the operation department in the remote place of the fault content and the fault location to be predicted through the remote monitoring terminal 7 in FIG. This enables quick maintenance work.

【００１６】この第１実施の形態では、入出力負荷分散
制御手段１３により入出力種別による負荷の偏りをなく
し、入出力経路ごとの単一入出力に対する平均所要時間
を動的に監視することにより、一般的にキャッシュ機構
やＲＡＩＤ構成などの装置特有機能により影響を受ける
ために困難とされている周辺制御装置に対する入出力性
能低下の判断を容易にし、装置種別に依存しない性能低
下事象を監視することを可能とするものである。In the first embodiment, the input / output load distribution control means 13 eliminates load imbalance depending on the type of input / output and dynamically monitors the average required time for a single input / output for each input / output path. In general, it is easy to judge a decrease in input / output performance of a peripheral control device which is difficult to be affected by device-specific functions such as a cache mechanism and a RAID configuration, and a performance degradation event independent of a device type is monitored. It is possible to do that.

【００１７】次に、以上のように構成されたこの第１実
施の形態の動作について図３のフローチャートに沿って
説明する。システム固有情報の入力手段１１によるシス
テム固有情報の入力（ステップ３１）では、システム運
用方法に起因した予測可能な性能影響要因情報として、
入出力競合数、最大入出力転送データ長、性能監視イン
ターバル値を取得し、主記憶装置２０内のシステム固有
情報領域４１、性能監視インターバル値領域４２に格納
する。Next, the operation of the first embodiment configured as described above will be described with reference to the flowchart of FIG. In the input of the system-specific information by the system-specific information input unit 11 (step 31), as the predictable performance influence factor information due to the system operation method,
The number of input / output conflicts, the maximum input / output transfer data length, and the performance monitoring interval value are acquired and stored in the system specific information area 41 and the performance monitoring interval value area 42 in the main storage device 20.

【００１８】入出力時間のサンプリング手段１２による
入出力時間のサンプリング（ステップ３２）では、この
情報を元に最大入出力データ長での入出力性能の実測値
を計測する。さらに、主記憶装置２０のＨＷ構成情報領
域２３のＨＷ構成情報４３に含まれる入出力経路の数、
サブシステム５内の入出力装置６の数などを考慮して、
期待される入出力性能の上限値を算出する。また、算出
した性能期待値の上限は主記憶装置２０上の入出力性能
上限値領域２４の入出力性能上限値４４に性能監視の閾
値として使用するために格納される。In the sampling of the input / output time by the input / output time sampling means 12 (step 32), an actually measured value of the input / output performance at the maximum input / output data length is measured based on this information. Further, the number of input / output paths included in the HW configuration information 43 in the HW configuration information area 23 of the main storage device 20,
Considering the number of input / output devices 6 in the subsystem 5,
Calculate the upper limit of expected input / output performance. The calculated upper limit of the expected performance value is stored in the input / output performance upper limit value 44 of the input / output performance upper limit area 24 on the main storage device 20 so as to be used as a performance monitoring threshold value.

【００１９】入出力負荷分散制御手段１３の具体的な処
理としては、主記憶装置２０におけるＨＷ構成情報領域
２３のＨＷ構成情報４３内の入出力経路情報を基に、ラ
ウンドロビン方式で入出力に使用する入出力可能径路選
択を実行し（ステップ３３１）、入出力径路の決定を行
い（ステップ３３２）、各入出力経路の負荷分散を図
る。The specific processing of the input / output load distribution control means 13 is based on the input / output path information in the HW configuration information 43 of the HW configuration information area 23 in the main storage device 20, and performs input / output in a round-robin manner. The input / output possible path to be used is selected (step 331), the input / output path is determined (step 332), and the load of each input / output path is distributed.

【００２０】さらに、入出力統計情報取得手段１４によ
り、入出力を開始した時点での時刻を取得し、主記憶装
置２０に入出力開始時間の記憶を行い（ステップ３３
３）、入出力装置に対する入出力を実行し（ステップ３
３４）、実際の入出力が完了した後は、その時点での時
刻との差分から入出力時間を算出し、主記憶装置２０内
の入出力総所要時間４５に加算し、主記憶装置２０内の
入出力総回数４６も更新する（ステップ３３５）。ここ
で、入出力総所要時間４５と、入出力総回数４６は入出
力経路ごとに分けて管理される。Further, the input / output statistical information obtaining means 14 obtains the time when the input / output is started, and stores the input / output start time in the main storage device 20 (step 33).
3) Execute input / output with respect to the input / output device (step 3)
34) After the actual input / output is completed, the input / output time is calculated from the difference from the time at that time, and added to the total required input / output time 45 in the main storage device 20. Is updated (step 335). Here, the total required input / output time 45 and the total number of input / output times 46 are managed separately for each input / output path.

【００２１】入力性能監視手段１５による入出力性能監
視処理（ステップ３４）は、主記憶装置２０上の性能監
視インターバル値４２に相当する時間を待ちあわせ、そ
れが終了するごとに動作する（ステップ３４０）。処理
としては図３に示すようにまず、入出力情報取得処理
（ステップ３３）によって記録された主記憶装置２０上
の入出力経路ごとの入出力総所要時間４５と、入出力総
回数４６を元に、論理的な単一入出力に対する平均入出
力時間を入出力経路ごとに算出する（ステップ３４
１）。The input / output performance monitoring processing by the input performance monitoring means 15 (step 34) waits for a time corresponding to the performance monitoring interval value 42 in the main storage device 20, and operates every time the processing is completed (step 340). . As shown in FIG. 3, first, the total input / output required time 45 for each input / output path on the main storage device 20 recorded by the input / output information acquisition processing (step 33) and the total number of input / outputs 46 are used. First, an average input / output time for a single logical input / output is calculated for each input / output path (step 34).
1).

【００２２】また、この算出した経路ごとの平均入出力
時間を比較し（ステップ３４２）、入出力経路ごとに明
確な差があった場合（ステップ３４８）には、入出力時
間が偏っている経路がサブシステム５内の特定装置につ
ながっている（ステップ３４３）もののみかどうかを判
断し、障害個所の特定化を行う。この後、図１の装置再
構成制御手段１６にて障害経路を閉塞し（ステップ３４
４）、障害事象通知手段１７により特定化された障害情
報、閉塞した経路情報などを図２の遠隔監視端末７に通
知する（ステップ３４５）。The calculated average input / output time for each route is compared (step 342). If there is a clear difference between the input / output routes (step 348), the route with the biased input / output time is determined. Is determined to be only the device connected to the specific device in the subsystem 5 (step 343), and the failure location is specified. Thereafter, the failure route is closed by the device reconfiguration control means 16 of FIG. 1 (step 34).
4), the failure information specified by the failure event notification means 17 and the blocked path information are notified to the remote monitoring terminal 7 in FIG. 2 (step 345).

【００２３】上記ステップ３４２における経路ごとの平
均入出力時間の比較で特定経路への偏りが見られなかっ
たと判断した場合（ステップ３４８）には、主記憶装置
２０上の入出力性能上限値４４と比較し（ステップ３４
６）、その比較の結果、経路ごとの平均入出力時間が入
出力性能上限値４４を超えている場合には、特定経路の
障害以外の事象による性能低下として、遠隔監視端末７
に警告を通知する（ステップ３４７）。If it is determined in the comparison of the average input / output time for each route in the step 342 that no deviation to a specific route is found (step 348), the upper limit 44 of the input / output performance in the main storage device 20 Compare (Step 34)
6) As a result of the comparison, if the average input / output time of each path exceeds the input / output performance upper limit value 44, the performance of the remote monitoring terminal 7 is determined as performance degradation due to an event other than a failure of the specific path.
Is notified (step 347).

【００２４】このように、第１実施の形態では、複数の
ノードからの入出力要求を同時に処理し得る入出力装置
に対して入出力経路単位でその性能を監視し、異常事象
による性能低下、および異常な入出力要求負荷増に起因
した性能低下を動的に検出し、装置の再構成、予防保守
作業および運用機能に対する警告通知も可能とするもの
である。As described above, in the first embodiment, the performance of an input / output device that can simultaneously process input / output requests from a plurality of nodes is monitored for each input / output path, and performance degradation due to an abnormal event is prevented. In addition, it is possible to dynamically detect a performance decrease caused by an abnormal increase in the load on an input / output request, and also to perform a reconfiguration of a device, a preventive maintenance operation, and a warning notification to an operation function.

【００２５】次に、この発明の第２実施の形態について
説明する。システムの運用によっては、特定の経路障害
以外に起因した入出力性能の低下が発生した場合でも、
速やかに被疑装置をシステムから切り離し、他の入出力
装置または、他のホストによる代替運用が可能であるよ
うなシステム運用形態が想定される。そこで、図３にお
けるステップ３１でのシステム固有情報の入力におい
て、システムの運用形態によって決定される固有情報を
取得する以外に、該当する閾値に抵触した場合の装置再
構成制御手段も情報としてあらかじめ登録しておくこと
により、ステップ３４での入出力性能監視処理におい
て、個々のシステム運用に適合した装置再構成が可能と
なる。Next, a second embodiment of the present invention will be described. Depending on the operation of the system, even if the input / output performance decreases due to something other than a specific path failure,
It is assumed that the suspected device is quickly disconnected from the system, and another input / output device or another host can be used as a substitute for the system. Therefore, in the input of the system unique information in step 31 in FIG. 3, in addition to acquiring the unique information determined by the operation mode of the system, the device reconfiguration control means in the case where the corresponding threshold is violated is also registered as information in advance. By doing so, in the input / output performance monitoring processing in step 34, it is possible to perform device reconfiguration suitable for each system operation.

【００２６】図４にその場合の実施の形態、すなわち第
２実施の形態の動作を説明するためのフローチャートを
示す。この図４の処理は図３の処理と以下の点で異な
る。図４において、図3のフローチャートと同じ処理ス
テップは同一処理ステップ符号を付すのみにとどめる。
システム固有情報の入力（ステップ５１）は、性能監視
を行うのに必要な予測可能な性能に影響する要因情報を
取得する以外に、入出力性能が入出力性能上限値を超え
た場合の装置再構成処理に対する動作を取得する。この
取得した性能上限値を超えたときの装置再構成情報は、
図１で示した主記憶装置２０上の装置再構成情報領域に
装置再構成情報５２を格納する。FIG. 4 is a flow chart for explaining the operation of this embodiment, that is, the operation of the second embodiment. The processing in FIG. 4 differs from the processing in FIG. 3 in the following points. In FIG. 4, the same processing steps as those in the flowchart of FIG.
The input of the system-specific information (step 51) is performed when the input / output performance exceeds the upper limit of the input / output performance, in addition to acquiring the factor information affecting the predictable performance required for performance monitoring. Get the operation for the configuration process. The device reconfiguration information when the acquired performance upper limit value is exceeded,
The device reconfiguration information 52 is stored in the device reconfiguration information area on the main storage device 20 shown in FIG.

【００２７】入出力性能監視処理（ステップ５３）で
は、図３のときと同様にまず、主記憶装置２０上の入出
力経路ごとの入出力総所要時間と入出力総回数から入出
力経路毎の単一入出力動作単位の所要時間を算出・比較
し、特定化を行う。この結果、性能低下が特定の入出力
経路障害、特定の装置障害以外の原因で起こっていると
認識された場合には、装置再構成の処理（ステップ５
４）により装置再構成情報５２を参照し、装置閉塞処
理、システム強制停止処理などの指定された装置再構成
を実施する。このように、原因が特定化できずに、性能
低下の手動回避が行えない場合でも、あらかじめシステ
ム運用に応じた適切な回避策を定義しておくことによ
り、システム運用形態に沿った自動的な性能低下を監視
する手段が提供できる。In the input / output performance monitoring process (step 53), as in the case of FIG. 3, first, the total required input / output time and the total number of input / output times for each input / output path on the main storage device 20 are used. Calculate and compare the time required for a single input / output operation unit and specify it. As a result, if it is recognized that the performance degradation is caused by a cause other than the specific input / output path failure and the specific device failure, the device reconfiguration processing (step 5)
By referring to the device reconfiguration information 52 according to 4), the specified device reconfiguration such as the device closing process and the system forced stop process is performed. In this way, even if the cause cannot be specified and manual avoidance of performance degradation cannot be performed, by defining appropriate workarounds in advance according to system operation, automatic A means for monitoring performance degradation can be provided.

【００２８】[0028]

【発明の効果】以上のように、この発明によれば、入出
力性能異常が特定経路の障害によるものか、特定装置の
障害によるものかを判断するようにしたので、ＨＷ構成
情報と関連付けて装置再構成制御により速やかに被疑経
路の閉塞が可能であり、性能異常発生時にシステム運用
への影響が局所化できる。また、入出力要求に対する負
荷分散手段により入出力種別による性能の偏りによる誤
判断が軽減された状態で、単一入出力要求に対する応答
時間を統計的に比較するようにしたので、クラスタ構成
などの入出力サブシステム接続形態および、装置種別に
依存せずに性能異常を検出できる。さらに、入出力性能
低下を判定するための特別なハードウェアが不要であ
り、実際の運用システムにおいての実測データに基づい
た性能比較を行うようにしたので、運用に関する情報の
みを入力し、ＨＷ機器特性などを意識せずに、性能異常
を検出できる。加えて、正常な入出力システムと依存す
る箇所を少なくし、性能監視機能を独立した機能として
実現可能である。As described above, according to the present invention, it is determined whether the input / output performance abnormality is due to a failure in a specific route or a failure in a specific device. The suspected route can be blocked immediately by the device reconfiguration control, and the effect on system operation can be localized when a performance abnormality occurs. In addition, the response time for a single I / O request is statistically compared in a state in which erroneous judgments due to bias in performance depending on the I / O type are reduced by the load distribution means for I / O requests, so that the cluster configuration etc. A performance abnormality can be detected without depending on the connection configuration of the input / output subsystem and the device type. Furthermore, no special hardware is required for determining the deterioration of the input / output performance, and the performance comparison based on the actually measured data in the actual operation system is performed. Abnormal performance can be detected without being aware of characteristics. In addition, the number of parts that depend on the normal input / output system is reduced, and the performance monitoring function can be realized as an independent function.

[Brief description of the drawings]

【図１】この発明による入出力自動性能監視システムの
第１実施の形態の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a first embodiment of an input / output automatic performance monitoring system according to the present invention.

【図２】この発明による入出力自動性能監視システムに
適用される動作環境の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of an operating environment applied to the input / output automatic performance monitoring system according to the present invention.

【図３】この発明による入出力自動性能監視システムの
第１実施の形態の動作を説明するためのフローチャート
である。FIG. 3 is a flowchart for explaining the operation of the first embodiment of the input / output automatic performance monitoring system according to the present invention;

【図４】この発明による入出力自動性能監視システムの
第２実施の形態の動作を説明するためのフローチャート
である。FIG. 4 is a flowchart for explaining the operation of the second embodiment of the input / output automatic performance monitoring system according to the present invention;

[Explanation of symbols]

１１……システム固有情報の入力手段、１２……入出力
時間のサンプリング手段、１３……入出力負荷分散制御
手段、１４……入出力統計情報取得手段、１５……入出
力性能監視手段、１６……装置再構成制御手段、１７…
…障害事象通知手段、２０……主記憶装置、２１……シ
ステム固有情報領域、２２……性能監視インターバル領
域、２３……ＨＷ構成情報領域、２４……入出力性能上
限値、２５……入出力統計情報。11 input means for system-specific information, 12 input / output time sampling means, 13 input / output load distribution control means, 14 input / output statistical information acquisition means, 15 input / output performance monitoring means, 16 ... Device reconfiguration control means, 17 ...
... Failure event notification means, 20... Main storage device, 21... System specific information area, 22... Performance monitoring interval area, 23... HW configuration information area, 24. Output statistics.

Claims

[Claims]

The present invention acquires a maximum number of input / output conflicts assumed in advance as a performance influencing factor, a maximum transfer data length that can be a target of a logical single input / output, and an appropriate performance according to each system operation. A means for inputting system-specific information for acquiring a monitoring interval value and storing the acquired information in the main storage; and a maximum input / output data length for the real device with reference to the system-specific information stored in the main storage. The I / O is performed, and the upper limit value of the expected I / O performance is calculated based on the performance result, the maximum number of I / O conflicts, and the hardware configuration information stored in the main storage device. Sampling means for input / output time to be stored, obtainable path information from the main storage device, determine a path to be used for each input / output request by a round robin method, I / O load distribution control means for averaging the load on each I / O path regardless of the connection status and I / O type, total I / O time required for each I / O path, and I / O executed I / O statistical information acquisition means for storing the number of times in the main storage device; and the I / O performance stored in the main storage device by periodically referring to the I / O statistical information stored in the main storage device An input / output performance monitoring means for specifying an abnormal point from the upper limit value, and when the input / output performance monitoring means specifies an abnormal point of the input / output path and determines that the path cannot be used, the path of the abnormal point is closed. In addition, the device configuration control means that changes the above hardware configuration information to make the target path an unusable path for input / output, the maintenance terminal at a remote location, and the failure content predicted through the remote monitoring terminal to the operation department Obstacle O automatic performance monitoring system comprising: the fault event notification means for notifying Tokoro, a.

2. The input device according to claim 1, wherein the input means for the system-specific information is a performance factor caused by the way of operating the system, separately from a hardware performance factor of the input / output subsystem. Output automatic performance monitoring system.

3. The input / output time sampling means,
2. The automatic input / output performance monitoring system according to claim 1, wherein the hardware configuration information is fixed information resulting from the hardware configuration of the system and is stored in the main storage device in advance.

4. The input / output performance monitoring means, when an abnormal point is identified, closes an abnormal path through the apparatus reconfiguration control means and notifies the failure event notification means of the content of the failure. The input / output automatic performance monitoring system according to claim 1, wherein:

5. The input / output statistical information obtaining means obtains a time at the time of starting the input / output, and calculates an input / output time from a difference from the time at the time when the actual input / output is completed. 2. The automatic input / output performance monitoring system according to claim 1, further comprising adding the total required input / output time stored in said main storage device to update the total number of input / output times stored in said main storage device. .

6. The automatic input / output performance monitoring system according to claim 5, wherein the total required input / output time is managed separately for each input / output path.

7. The automatic input / output performance monitoring system according to claim 5, wherein the total number of input / output operations is managed separately for each input / output path.

8. The input / output performance monitoring means, based on the total required input / output time and the total number of input / output times for each input / output path of the main storage device recorded by the input / output information acquisition processing, performs a logical single input / output operation. Calculate the average input / output time for each input / output path, compare the average input / output time for each path, and if there is a clear difference between the input / output paths, 2. The input / output automatic performance monitoring system according to claim 1, wherein it is determined whether or not only one of the devices is connected to the specific device and whether or not the failure is specified.

9. The input / output performance monitoring means, based on the total required input / output time and the total number of input / output times for each input / output path of the main storage device recorded by the input / output information acquisition processing, performs logical single input / output. Calculate the average input / output time for each input / output path, compare the average input / output time for each path, and compare the average input / output time for each input / output path. 2. The automatic input / output performance monitoring system according to claim 1, wherein a warning is notified to the remote monitoring terminal that the performance is degraded due to an event other than a failure.

10. The input / output performance monitoring means, when inputting the system-specific information by the system-specific information input means, conflicts with a threshold value other than acquiring specific information determined by an operation mode of the system. 2. The input / output automatic performance monitoring system according to claim 1, wherein the device reconfiguration control means in the case is registered in advance as information, thereby enabling device reconfiguration adapted to each system operation.

11. The input / output performance monitoring means performs a single input / output operation for each input / output path in the input / output performance monitoring process based on a total required input / output time and an input / output count for each input / output path of the main storage device. 2. The input / output automatic performance monitoring system according to claim 1, wherein the required time of the unit is calculated and compared to perform the specification.

12. The input / output performance monitoring means performs a single input / output operation for each input / output path in the input / output performance monitoring processing based on the total required input / output time and the number of input / output operations for each input / output path of the main storage device. If it is recognized that the performance degradation is caused by a cause other than a specific input / output path failure or a specific device failure by calculating and comparing the required time of the unit and performing the specification, the device is reconfigured by the device reconfiguration processing. 2. The input / output automatic performance monitoring system according to claim 1, wherein the specified device reconfiguration is performed with reference to the reconfiguration information.