JP2002358170A

JP2002358170A - Disk storage device, computer system provided with the disk storage device, and error notification method at retry processing in the computer system

Info

Publication number: JP2002358170A
Application number: JP2001164699A
Authority: JP
Inventors: Hidetoshi Makino; 秀敏牧野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-05-31
Filing date: 2001-05-31
Publication date: 2002-12-13

Abstract

(57)【要約】【課題】ホストシステムでのタイムアウトエラーの発生
を抑える。【解決手段】ＨＤＤ２０の制御部２２に設けられたマイ
クロプロセッサ２２１は、ホストシステム１０から与え
られたアクセスコマンドの実行でエラーとなった場合、
リトライステップ群からなるリトライ処理を開始する。
ホストシステム１０はアクセスコマンドを送出するとＨ
ＤＤ２０からの応答を設定監視時間を上限に監視し、タ
イムアウトとなった場合、或いは監視時間内に異常終了
が通知された場合、まずリセット要求を、次に前回と同
一のコマンドをＨＤＤ２０に送る。マイクロプロセッサ
２２１は、リセット処理中にリセット要求を受け取る
と、リトライ管理エリア２２４中のリトライ中リセット
フラグをＯＮし、この状態で前回と同一のコマンドを受
け取ると、ホストシステム１０に特定エラーコードを通
知する。ホストシステム１０は、この通知に応じて監視
時間を延長設定する。 (57) [Summary] [Problem] To suppress the occurrence of a timeout error in a host system. A microprocessor provided in a control unit of an HDD, when an error occurs in execution of an access command given from a host system,
A retry process including a retry step group is started.
When the host system 10 sends an access command,
The response from the DD 20 is monitored with the set monitoring time as the upper limit, and if a timeout occurs or an abnormal end is notified within the monitoring time, a reset request is sent first, and then the same command as before is sent to the HDD 20. When receiving the reset request during the reset process, the microprocessor 221 turns on the reset flag during retry in the retry management area 224, and notifies the host system 10 of the specific error code when receiving the same command as the previous command in this state. I do. The host system 10 extends the monitoring time according to the notification.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ディスク記憶装置
を備えた計算機システムに係り、特にホストシステムか
らディスク記憶装置に与えられた当該ディスク記憶装置
内のデータを対象とするアクセス要求の実行により内部
エラーが発生してリトライ処理が実行された場合に好適
なディスク記憶装置、同ディスク記憶装置を備えた計算
機システム及び同計算機システムにおけるリトライ処理
時のエラー通知方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer system having a disk storage device, and more particularly, to a computer system provided by a host system for executing an access request for data in the disk storage device which is given to the disk storage device. The present invention relates to a disk storage device suitable when an error occurs and retry processing is executed, a computer system including the disk storage device, and an error notification method at the time of retry processing in the computer system.

【０００２】[0002]

【従来の技術】ディスク記憶装置、例えば磁気ディスク
装置（以下、ＨＤＤと称する）を備えた計算機システム
では、ＨＤＤと当該ＨＤＤを利用するホストシステムと
は、ホストインタフェース、例えばＳＣＳＩ（Small Co
mputer System Interface）バスにより相互接続される
のが一般的である。2. Description of the Related Art In a computer system provided with a disk storage device, for example, a magnetic disk device (hereinafter, referred to as an HDD), an HDD and a host system using the HDD have a host interface, for example, a SCSI (Small Co.).
Generally, they are interconnected by a bus (mputer System Interface).

【０００３】このような計算機システムにおいて、ＨＤ
Ｄは、ホストシステムからアクセス要求が与えられた場
合、要求されたアクセス処理を実行する。もし、要求さ
れたアクセス処理が正常に完了せずエラー（内部エラ
ー）となった場合、ＨＤＤ内部では予め定められた手順
でエラーリトライが行われる。このＨＤＤ内部で行われ
るエラーリトライをＨＤＤリトライと呼ぶ。In such a computer system, the HD
When the access request is given from the host system, D executes the requested access processing. If the requested access processing is not completed normally and an error (internal error) occurs, an error retry is performed inside the HDD according to a predetermined procedure. The error retry performed inside the HDD is called HDD retry.

【０００４】さて近年は、ＨＤＤのエラーリトライ（Ｈ
ＤＤリトライ）機能は非常に強化されてきている。つま
り近年のＨＤＤでは、膨大な数のリトライを予め定めら
れた順序で実行するように構成されている。したがっ
て、ホストシステムから要求されたアクセス処理でＨＤ
Ｄ内部でエラーが発生しても、この強化されたエラーリ
トライにより、エラーが救済される可能性が高くなって
いる。In recent years, HDD error retries (H
The DD retry function has been greatly enhanced. In other words, recent HDDs are configured to execute an enormous number of retries in a predetermined order. Therefore, in the access processing requested by the host system, the HD
Even if an error occurs inside D, there is a high possibility that the error will be remedied by this enhanced error retry.

【０００５】一方、ホストシステムは、ＨＤＤに対して
アクセス要求を発行すると、そのアクセス要求に対して
ＨＤＤから例えばステータス・フェーズで応答が返され
るのを時間監視する。つまりホストシステムは、ＨＤＤ
の処理時間を監視する。もし、一定の監視時間を経過し
てもＨＤＤからの応答が返されなかった場合、ホストシ
ステムはタイムアウトを検出する。この場合、ホストシ
ステムはまず、ＨＤＤを強制的にリセット（初期化）す
るためのリセット要求（例えばＳＣＳＩリセット）を当
該ＨＤＤに送出する。ＨＤＤは、ホストシステムからの
リセット要求により当該ＨＤＤを初期化する。[0005] On the other hand, when the host system issues an access request to the HDD, the host system monitors the response of the access request from the HDD in, for example, a status phase for a time. That is, the host system is HDD
Monitor the processing time of. If no response is returned from the HDD even after a certain monitoring time has elapsed, the host system detects a timeout. In this case, the host system first sends a reset request (for example, SCSI reset) for forcibly resetting (initializing) the HDD to the HDD. The HDD initializes the HDD in response to a reset request from the host system.

【０００６】ホストシステムは、タイムアウト検出に応
じてＨＤＤをリセットすると、タイムアウトとなったの
と同じアクセス要求をＨＤＤに対して発行するリトライ
を行う。このホストシステムのリトライをホストリトラ
イと呼ぶ。ＨＤＤは、ホストシステムから再度発行され
たアクセス要求を実行するWhen the host system resets the HDD in response to the detection of the timeout, the host system performs a retry for issuing the same access request as the timeout has occurred to the HDD. This retry of the host system is called host retry. The HDD executes the access request issued again from the host system.

【０００７】[0007]

【発明が解決しようとする課題】上記したように、近年
のＨＤＤではエラーリトライ機能が強化され、これによ
り、それ以前のＨＤＤに比べてエラーが救済される確率
が高くなっている。As described above, in recent HDDs, the error retry function has been strengthened, and as a result, there is a higher probability that errors will be relieved as compared with earlier HDDs.

【０００８】ところが、ＨＤＤのエラーリトライ機能の
強化により、リトライステップ数が例えば著しく増加す
ることから、リトライに要する処理時間も増加する方向
にある。最悪の場合、つまり最終ステップ近くまでリト
ライ（ＨＤＤリトライ）が行われた場合、十数秒から数
十秒のオーダーの処理時間となってしまう。However, the number of retry steps is significantly increased, for example, due to the enhancement of the error retry function of the HDD, and the processing time required for retry is also increasing. In the worst case, that is, when the retry (HDD retry) is performed up to the final step, the processing time is on the order of tens of seconds to tens of seconds.

【０００９】このため、エラーリトライ機能が強化され
たＨＤＤを備えた計算機システムでは、ＨＤＤのリトラ
イ処理中にホストシステムでタイムアウト（タイムアウ
トエラー）が検出されてしまう虞があった。もし、タイ
ムアウトエラーとなった場合、ホストシステムからのリ
セット要求送出と、それに続くホストリトライにより、
ＨＤＤは先にエラーとなったのと同一のアクセス処理を
再び実行する。この場合、前回と同じ箇所（アドレス）
でエラーとなる可能性がある。もし、再びエラーとなっ
た場合、ＨＤＤはリトライ処理を先頭のリトライステッ
プから開始するため処理に長時間を要し、再びタイムア
ウトエラーとなる虞がある。Therefore, in a computer system having an HDD with an enhanced error retry function, a timeout (timeout error) may be detected in the host system during HDD retry processing. If a timeout error occurs, a reset request is sent from the host system and subsequent host retry
The HDD again executes the same access processing as that in which the error occurred first. In this case, the same location (address) as before
May cause an error. If an error occurs again, the HDD takes a long time to start the retry process from the first retry step, and there is a possibility that a timeout error occurs again.

【００１０】このように近年は、ＨＤＤのエラーリトラ
イ機能の強化によりエラーが救済される確率は向上した
が、リトライ処理に要する時間が増大するために、ホス
トシステムの監視時間以上の処理時間がかかることがあ
り、タイムアウトエラーが検出される虞があった。ま
た、タイムアウトエラーが検出された場合、ホストシス
テムはリトライを実施して再度同じ要求を磁気ディスク
装置に発行するが、同じ箇所でリトライが発生し再度タ
イムアウトエラーになる可能性がある。As described above, in recent years, the error retry function of the HDD has been enhanced to improve the probability that an error will be rescued. However, since the time required for the retry process increases, it takes more processing time than the monitoring time of the host system. In some cases, a timeout error may be detected. When a timeout error is detected, the host system performs a retry and issues the same request to the magnetic disk device again. However, there is a possibility that a retry occurs at the same location and a timeout error occurs again.

【００１１】そこで最近は、例えば特開平１０−１６１
８１８号公報に記載されているように、ホストシステム
からのアクセス要求と当該要求で指定されたアドレスと
ホストシステムでのタイムアウトエラー検出に伴うリセ
ット要求送出時に実行中のリトライステップ（の番号）
をＨＤＤ内に記憶しておき、ホストリトライによりホス
トシステムから同一のアクセス要求が発行された場合に
は、前回のリセット時に実行中であったアドレスの直前
までは通常の処理を行い、当該アドレスでは前回のリセ
ット時に実行中であったリトライステップからＨＤＤリ
トライを実行することで、リトライ処理時間を短縮する
技術（以下、先行技術と称する）が提案されている。Therefore, recently, for example, Japanese Patent Application Laid-Open No. H10-161
As described in Japanese Patent Publication No. 818, an access request from a host system, an address specified by the request, and a retry step (number) being executed when a reset request is transmitted upon detection of a timeout error in the host system
Is stored in the HDD, and when the same access request is issued from the host system due to the host retry, normal processing is performed until immediately before the address being executed at the time of the previous reset, and There has been proposed a technique for shortening the retry processing time by executing the HDD retry from the retry step that was being executed at the time of the previous reset (hereinafter, referred to as prior art).

【００１２】しかし、上記先行技術においては、再度の
アクセス要求の実行でエラーとなった場合の再度のリト
ライ処理に要する時間は確かに短縮できるものの、ホス
トシステム側で設定されている監視時間によっては、当
該再度のリトライ処理の最中に再びホストシステムでタ
イムアウトエラーが検出される虞がある。However, in the above-mentioned prior art, although the time required for the retry processing again when an error occurs during execution of the access request again can be shortened, depending on the monitoring time set on the host system side. During the retry process, a timeout error may be detected again in the host system.

【００１３】ホストシステムでタイムアウトエラーが検
出される要因として、（１）ＨＤＤリトライに長時間要
する場合の他に、（２）ＨＤＤ自体が故障またはロック
状態にある場合が考えられる。ところが上記先行技術で
は、タイムアウトエラーが検出される要因を特定できな
いという問題があった。The cause of the detection of the timeout error in the host system may be (1) the case where the HDD retry takes a long time, and (2) the case where the HDD itself is in failure or is in a locked state. However, in the above-described prior art, there is a problem that a factor for detecting a timeout error cannot be specified.

【００１４】また、タイムアウトエラーの発生を低減す
るために、ホストシステム側での監視時間を、ＨＤＤ側
でのリトライ処理が最終リトライステップまで確実に行
われるのに必要な時間より十分に長く設定することが考
えられる。しかし、監視時間内にＨＤＤから応答が返ら
ない要因には、上記の如くＨＤＤ自体が故障している場
合もあり、このような場合でも監視時間を固定的に長く
設定することは無駄である。Further, in order to reduce the occurrence of a timeout error, the monitoring time on the host system side is set to be sufficiently longer than the time necessary for the retry processing on the HDD side to be reliably performed until the final retry step. It is possible. However, the cause of the HDD not responding within the monitoring time may be that the HDD itself has failed as described above. Even in such a case, it is useless to set the monitoring time to a fixed length.

【００１５】本発明は上記事情を考慮してなされたもの
でその目的は、ディスク記憶装置でのリトライ処理中に
ホストシステムでのタイムアウトエラー検出により当該
ホストシステムからディスク記憶装置に対してリセット
要求が与えられた場合に、その事象を記録しておき、そ
の状態でホストシステムからディスク記憶装置に対して
先にリトライとなったのと同一のアクセス要求が与えら
れた場合に、ディスク記憶装置からホストシステムに特
定エラーコードを通知する構成とすることで、ホストシ
ステムでの監視時間を延長させてタイムアウトエラー検
出の発生を抑えるようにしたディスク記憶装置、同ディ
スク記憶装置を備えた計算機システム及び同計算機シス
テムにおけるリトライ処理時のエラー通知方法を提供す
ることにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to detect a time-out error in a host system during a retry process in a disk storage device and to issue a reset request from the host system to the disk storage device. If given, the event is recorded, and in that state, if the same access request as the one that made the retry earlier is given from the host system to the disk storage device, A disk storage device having a configuration in which a specific error code is notified to the system to extend the monitoring time in the host system to suppress occurrence of timeout error detection, a computer system including the disk storage device, and the computer An object of the present invention is to provide an error notification method at the time of retry processing in a system.

【００１６】[0016]

【課題を解決するための手段】本発明に係るディスク記
憶装置は、ホストシステムから受け取ったアクセス要求
を実行する手段であって、当該アクセス要求の実行でエ
ラーとなった場合に、所定のリトライステップ群からな
るリトライ処理を実行する手段と、特定領域が確保され
た記憶手段と、ホストシステムが送出したアクセス要求
の実行に関するディスク記憶装置からの応答が設定監視
時間内に返らなかったタイムアウトエラーが当該ホスト
システムにて検出されることにより、リトライ処理の実
行中に当該ホストシステムからディスク記憶装置に送出
されるリセット要求を受け取った場合、その旨を示すリ
トライ中リセット情報を上記記憶手段の特定領域に登録
するリトライ情報管理手段と、ホストシステムから送出
されたアクセス要求を上記アクセス要求実行手段により
実行させる手段であって、当該アクセス要求が前回と同
一要求で、且つ上記記憶手段の特定領域に上記リトライ
中リセット情報が登録されている場合には、リトライ処
理の実行中に上記リセット要求を受け取ったことを示す
と共にホストシステムに対して監視時間の延長を要求す
る特定のエラーコードを当該ホストシステムに通知して
上記アクセス要求実行手段によりアクセス要求を実行さ
せる手段とを備えたことを特徴とする。A disk storage device according to the present invention is a means for executing an access request received from a host system, and executes a predetermined retry step when an error occurs in execution of the access request. Means for executing a retry process consisting of a group, storage means for securing a specific area, and a time-out error in which a response from the disk storage device regarding execution of an access request sent by the host system was not returned within the set monitoring time. When a reset request sent to the disk storage device is received from the host system during execution of the retry process by being detected by the host system, retry-in-progress reset information indicating that fact is stored in the specific area of the storage means. The retry information management means to be registered and the access request sent from the host system Is executed by the access request executing means. If the access request is the same request as the previous request and the reset information during retry is registered in a specific area of the storage means, execution of retry processing is performed. Means for notifying the host system of a specific error code indicating that the reset request has been received and requesting the host system to extend the monitoring time, and causing the access request execution means to execute the access request. It is characterized by having.

【００１７】ここで、上記リトライ中リセット情報が上
記記憶手段の特定領域に登録されている状態で上記リセ
ット要求に従う初期化処理を行う場合、少なくとも当該
特定領域は対象外として、リトライ中リセット情報が保
存される構成とするとよい。Here, when the initialization processing according to the reset request is performed in a state where the reset information during retry is registered in the specific area of the storage means, at least the specific area is excluded and the reset information during retry is It is recommended that the configuration be saved.

【００１８】また、上記同一要求（前回と同一のアクセ
ス要求）とは、前回エラーとなったアクセス要求と開始
アドレス及びサイズを含めて完全に同一のアクセス要求
に限るものではなく、少なくとも前回リトライ処理が行
われたアドレスを含むアクセス要求を指すものとする。The same request (the same access request as the previous one) is not limited to the same access request including the start address and the size of the access request in which the previous error occurred, but at least the last retry processing. Refers to an access request that includes the address at which the request was made.

【００１９】このような構成のディスク記憶装置では、
ホストから与えられたアクセス要求の実行でエラーが発
生すると、所定のリトライステップ群からなるリトライ
処理が開始される。近年のリトライ処理では、従来技術
の欄で述べたようにディスク記憶装置の内部リトライの
強化により、処理時間が増大している。したがってリト
ライ処理に要する時間が、ホストシステムに設定されて
いる監視時間を超えることも多い。このような場合、リ
トライ処理の実行中に、ホストシステムでタイムアウト
エラーが検出される可能性が高い。ホストシステムは、
タイムアウトエラーの検出時、或いはディスク記憶装置
でリトライ処理を最後まで実行しても、つまり最終リト
ライステップまで実行してもエラーが解消されなかった
ためにディスク記憶装置からエラー要因を示す異常終了
が通知された場合、ディスク記憶装置に対してリセット
要求を送り、しかる後に前回と同一のアクセス要求をデ
ィスク記憶装置に送るホストリトライを実行する。In the disk storage device having such a configuration,
When an error occurs during execution of an access request given from the host, a retry process including a predetermined retry step group is started. In the recent retry processing, as described in the section of the related art, the processing time is increasing due to the strengthening of the internal retry of the disk storage device. Therefore, the time required for the retry process often exceeds the monitoring time set in the host system. In such a case, a timeout error is likely to be detected in the host system during execution of the retry processing. The host system is
When a time-out error is detected or the retry processing is executed to the end in the disk storage device, that is, until the final retry step, the error has not been resolved and the disk storage device reports an abnormal end indicating an error factor. In this case, a reset request is sent to the disk storage device, and thereafter, a host retry for sending the same access request to the disk storage device as before is executed.

【００２０】ディスク記憶装置では、リセット処理の実
行中にホストシステムからリセット要求を受け取った場
合、その旨を示すログ情報としてのリトライ中リセット
情報が特定領域に保持される。この状態で、ディスク記
憶装置がホストシステムから前回と同一のアクセス要求
を受け取った場合、ディスク記憶装置からホストシステ
ムに対し、リトライ処理の実行中にリセット要求を受け
取ったことを示すと共に監視時間の延長を要求する特定
エラーコードが通知される。ホストシステムは、このリ
セット要求に対する特定エラーコード通知により、リセ
ット要求時にディスク記憶装置がリトライ処理中にあ
り、つまり現在設定されている監視時間が短いためにリ
トライ処理中にタイムアウトエラーとなったために監視
時間の延長が要求されているものと判断し、当該監視時
間を延長設定する。この監視時間の延長設定によりホス
トシステムでのタイムアウトエラーの発生を低減するこ
とが可能となる。またホストシステムは、自身のタイム
アウトエラー検出に伴うディスク記憶装置からの特定エ
ラーコード通知により、タイムアウトエラーの要因が監
視時間が短いことにあり、当該ディスク記憶装置自体は
故障していないと判断できる。逆に、タイムアウトエラ
ー検出に伴うディスク記憶装置からの特定エラーコード
通知がない場合、ホストシステムはディスク記憶装置が
故障（またはロック）している可能性があると判断でき
る。In the disk storage device, when a reset request is received from the host system during execution of the reset process, retry-in-progress reset information as log information indicating this is held in a specific area. In this state, when the disk storage device receives the same access request from the host system as before, it indicates that the disk storage device received the reset request during the execution of the retry process and extends the monitoring time. Is notified. The host system is notified by the specific error code notification to the reset request that the disk storage device is in the process of retrying at the time of the reset request, that is, monitoring is performed because a timeout error occurs during the retry process because the currently set monitoring time is short. It is determined that extension of the time is required, and the monitoring time is set to be extended. The extended setting of the monitoring time makes it possible to reduce the occurrence of a timeout error in the host system. Further, the host system can determine that the disk storage device itself has not failed due to the short monitoring time due to the cause of the timeout error, based on the notification of the specific error code from the disk storage device accompanying the detection of the timeout error of the host system. Conversely, if there is no specific error code notification from the disk storage device due to the detection of the timeout error, the host system can determine that the disk storage device may have failed (or locked).

【００２１】このように本発明においては、ディスク記
憶装置でのリトライ処理中にホストシステムからリセッ
ト要求を受け取るとその事象発生をログとして残し、こ
のログが残っている状態で前回と同一のアクセス要求を
受け取ると特定エラーコードをディスク記憶装置からホ
ストシステムに通知して、ホストシステムに対して監視
時間を延長させる構成とすることにより、ホストシステ
ムでのタイムアウトエラーの発生を低減でき、且つタイ
ムアウトエラーの要因がホストシステムで容易に判断可
能となる。As described above, according to the present invention, when a reset request is received from the host system during the retry processing in the disk storage device, the occurrence of the event is recorded as a log. Is received, a specific error code is notified from the disk storage device to the host system, and the monitoring time is extended to the host system, so that the occurrence of a timeout error in the host system can be reduced, and the timeout error can be reduced. The factors can be easily determined by the host system.

【００２２】ここで、リトライ処理の対象となったアク
セス要求及びアドレスと実行中のリトライステップを示
すリトライステップ識別情報とを上記特定領域に登録す
ると共に、リトライ処理でリトライステップが切り替え
られる都度上記特定領域上のリトライステップ識別情報
を更新するようにし、リトライ処理の対象となったのと
同一のアクセス要求をホストシステムから受け取った場
合に、上記特定領域に登録されている前回リトライ処理
の対象となったアドレスへのアクセスを、当該特定領域
上のリトライステップ識別情報の示すリトライステップ
からのリトライ処理により実行する構成とするならば、
つまりホストシステムからのリセット要求で停止した前
回のリトライ処理の途中からリトライを開始する構成と
するならば、エラー回復の確率を高めながら処理時間を
短縮できる。Here, the access request and the address subjected to the retry processing and the retry step identification information indicating the retry step being executed are registered in the specific area, and the above-mentioned specification is performed each time the retry step is switched in the retry processing. The retry step identification information in the area is updated, and when the same access request as the target of the retry processing is received from the host system, it becomes the target of the previous retry processing registered in the specific area. Access to the specified address is performed by a retry process from the retry step indicated by the retry step identification information on the specific area,
That is, if the retry is started in the middle of the previous retry processing stopped by the reset request from the host system, the processing time can be reduced while increasing the probability of error recovery.

【００２３】[0023]

【発明の実施の形態】以下、本発明の実施の形態につき
図面を参照して説明する。図１は本発明の一実施形態に
係る計算機システムの構成を示すブロック図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a computer system according to one embodiment of the present invention.

【００２４】図１の計算機システムは、ホストシステム
（ホスト計算機）１０と、このホストシステム１０の外
部記憶装置として当該ホストシステム１０により利用さ
れるディスク記憶装置、例えばＨＤＤ（磁気ディスク装
置）２０とから構成される。ホストシステム１０とＨＤ
Ｄ２０とはホストインタフェースバスとしての例えばＳ
ＣＳＩバス３０により相互接続されている。なお、ＳＣ
ＳＩバス３０に代えて、ＦｉｂｒｅＣｈａｎｎｅ１
（ファイバ・チャネル）等を用いることも可能である。The computer system shown in FIG. 1 includes a host system (host computer) 10 and a disk storage device used as an external storage device of the host system 10 by the host system 10, for example, an HDD (magnetic disk device) 20. Be composed. Host system 10 and HD
D20 is, for example, S as a host interface bus.
They are interconnected by a CSI bus 30. Note that SC
Fiber Channel 1 instead of SI bus 30
(Fibre Channel) or the like can also be used.

【００２５】ＨＤＤ２０は、ディスクドライブを含むド
ライブ部２１と当該ドライブ部２１を制御する制御部２
２とを備えている。なお、ＳＣＳＩバス３０と接続され
てホストシステム１０とのインタフェースをなすホスト
インタフェース等は省略されている。The HDD 20 includes a drive unit 21 including a disk drive and a control unit 2 for controlling the drive unit 21.
2 is provided. Note that a host interface and the like that are connected to the SCSI bus 30 and serve as an interface with the host system 10 are omitted.

【００２６】制御部２２は、当該制御部２２の中心をな
すマイクロプロセッサ２２１と、制御プログラムが予め
格納されたＲＯＭ（Read Only Memory）２２２と、マイ
クロプロセッサ２２１のワーク領域等を提供するＲＡＭ
（Random Access Memory）２２３とを有している。The control unit 22 includes a microprocessor 221 which forms the center of the control unit 22, a ROM (Read Only Memory) 222 in which a control program is stored in advance, and a RAM which provides a work area for the microprocessor 221.
(Random Access Memory) 223.

【００２７】マイクロプロセッサ２２１は、ホストシス
テム１０からのリード／ライトコマンドで指定されたド
ライブ部２１を対象とするリード／ライトアクセスをＲ
ＯＭ２２２に格納されている制御プログラムに従って制
御する。マイクロプロセッサ２２１はまた、リード／ラ
イトアクセスでエラーとなった場合に、予め定められた
リトライステップの群からなるリトライ処理を実行す
る。マイクロプロセッサ２２１は更に、リトライ処理中
にホストシステム１０からＳＣＳＩバスリセットが与え
られた場合に、その旨をＲＡＭ２２３に記録して、この
状態でリトライの要因となったコマンドがホストシステ
ム１０から与えられた場合、ホストシステム１０に特定
のエラーコードを通知する。The microprocessor 221 performs read / write access to the drive unit 21 specified by the read / write command from the host system 10 by R.
Control is performed according to a control program stored in the OM 222. The microprocessor 221 executes a retry process including a group of predetermined retry steps when an error occurs in read / write access. Further, when the SCSI bus reset is given from the host system 10 during the retry processing, the microprocessor 221 records the fact in the RAM 223, and in this state, the host system 10 gives the command that caused the retry. In this case, a specific error code is notified to the host system 10.

【００２８】ＲＡＭ２２３には、リトライ管理エリア２
２４が確保される。リトライ管理エリア２２４は、図２
に示すように、リトライとなったアクセスコマンドの種
類（コマンドタイプ）が設定されるフィールド２２４ａ
と、当該コマンドの指定する開始アドレスが設定される
フィールド２２４ｂと、同じくサイズが設定されるフィ
ールド２２４ｃと、リトライ（エラー）となったアドレ
ス（エラーアドレス）が設定されるフィールド２２４ｄ
と、リトライステップの番号が設定されるフィールド２
２４ｅと、リトライ中リセットフラグが設定されるフィ
ールド２２４ｆとを有している。リトライ中リセットフ
ラグは、リトライ処理中にホストシステム１０からＨＤ
Ｄ２０にリセット要求としてのＳＣＳＩバスリセット
（と呼ばれるリセット要求）を受信したか（ＯＮの場
合）否か（ＯＦＦの場合）を示す。The RAM 223 has a retry management area 2
24 are secured. The retry management area 224 is shown in FIG.
Field 224a in which the type (command type) of the retried access command is set as shown in FIG.
And a field 224b in which a start address specified by the command is set, a field 224c in which the size is set similarly, and a field 224d in which a retry (error) address (error address) is set.
And field 2 in which the number of the retry step is set
24e and a field 224f in which a reset flag during retry is set. The reset flag during retry indicates that the host system 10
D20 indicates whether a SCSI bus reset (reset request called as a reset request) has been received (if ON) or not (if OFF).

【００２９】次に、図１の計算機システムにおける動作
を、図３乃至図７のフローチャートを適宜参照して説明
する。まず、ホストシステム１０からＨＤＤ２０に対し
てアクセスコマンドが送信されたものとする。このコマ
ンドには、アクセス先の先頭アドレス（開始アドレス）
とアクセスデータのサイズの情報が付されている。Next, the operation of the computer system shown in FIG. 1 will be described with reference to flowcharts shown in FIGS. First, it is assumed that an access command has been transmitted from the host system 10 to the HDD 20. This command includes the start address (start address) of the access destination
And information on the size of the access data.

【００３０】ＨＤＤ２０内の制御部２２に設けられたマ
イクロプロセッサ２２１は、ホストシステム１０からの
アクセスコマンドを受信すると、ＲＡＭ２２３上に確保
されたリトライ管理エリア２２４を参照する（ステップ
Ｓ１）。そしてマイクロプロセッサ２２１は、受信コマ
ンドがリトライ管理エリア２２４に登録されているコマ
ンドと同一であるか否か、更に具体的に述べるならば前
回リトライ処理が行われたコマンドと同一であるかを否
かを判定する（ステップＳ２）。ここでは、コマンドタ
イプ、開始アドレス及びサイズが一致する場合に同一コ
マンドであるものとする。Upon receiving the access command from the host system 10, the microprocessor 221 provided in the control unit 22 in the HDD 20 refers to the retry management area 224 secured on the RAM 223 (Step S1). Then, the microprocessor 221 determines whether or not the received command is the same as the command registered in the retry management area 224, and more specifically, whether or not the received command is the same as the command for which the retry processing was previously performed. Is determined (step S2). Here, it is assumed that the command is the same when the command type, the start address, and the size match.

【００３１】もし、同一コマンドでない場合には、マイ
クロプロセッサ２２１はリトライ管理エリア２２４の情
報をクリアした後（ステップＳ３）、受信したコマンド
を実行し、当該コマンドで指定された開始アドレス（開
始セクタブロックアドレス）から始まるドライブ部２１
の領域から指定サイズのデータを読み出す制御、または
指定された開始アドレスから始まるドライブ部２１の領
域に指定サイズのデータを書き込む制御を行う（ステッ
プＳ４）。そしてマイクロプロセッサ２２１は、コマン
ドの実行が正常に行えた場合（ステップＳ５）、ＳＣＳ
Ｉバス３０のステータス・フェーズを利用してホストシ
ステム１０に正常終了を通知する（ステップＳ６）。If the commands are not the same, the microprocessor 221 clears the information in the retry management area 224 (step S3), executes the received command, and executes the start address (start sector block) specified by the command. Drive section 21 starting with address)
Control to read out data of the specified size from the area of, or control to write data of the specified size to the area of the drive unit 21 starting from the specified start address (step S4). When the command has been executed normally (step S5), the microprocessor 221 determines that the SCS
The host system 10 is notified of the normal end using the status phase of the I bus 30 (step S6).

【００３２】これに対し、コマンドの実行でエラーとな
った場合、即ち、あるアドレス（セクタブロックアドレ
ス）へのアクセスでエラーとなった場合（ステップＳ
５）、マイクロプロセッサ２２１は実行中のコマンドの
情報とエラーとなったアドレス（エラーアドレス）とを
リトライ管理エリア２２４に登録する（ステップＳ
７）。ここで、コマンド情報はコマンドタイプと開始ア
ドレスとサイズからなり、コマンドタイプはリトライ管
理エリア２２４のフィールド２２４ａに、開始アドレス
はフィールド２２４ｂに、そしてサイズはフィールド２
２４ｃに、それぞれ登録される。また、エラーアドレス
はリトライ管理エリア２２４のフィールド２２４ｄに登
録される。On the other hand, when an error occurs during execution of a command, that is, when an error occurs during access to a certain address (sector block address) (step S
5), the microprocessor 221 registers the information of the command being executed and the address at which the error occurred (error address) in the retry management area 224 (step S).
7). Here, the command information includes a command type, a start address, and a size, the command type is in a field 224a of the retry management area 224, the start address is in a field 224b, and the size is in a field 2
24c are registered. The error address is registered in the field 224d of the retry management area 224.

【００３３】次にマイクロプロセッサ２２１は、リトラ
イステップの番号を示す変数Ｎを初期値、例えば１に設
定する（ステップＳ８）。そしてマイクロプロセッサ２
２１は、リトライ管理エリア２２４のフィールド２２４
ｅの内容（リトライステップ番号）をＮに更新して（ス
テップＳ９）、リトライステップＮ、即ちリトライ処理
中のＮ番目のリトライステップを実行する（ステップＳ
１０）。明らかなようにリトライ処理の実行中は、リト
ライ管理エリア２２４には、リトライ処理の対象となっ
ているコマンドの情報（コマンドタイプと開始アドレス
とサイズ）とエラーアドレス（リトライ処理の対象とな
っているアドレス）と実行中のリトライステップの番号
Ｎが保持される。Next, the microprocessor 221 sets a variable N indicating the number of the retry step to an initial value, for example, 1 (step S8). And microprocessor 2
21 is a field 224 of the retry management area 224
The content of e (retry step number) is updated to N (step S9), and the retry step N, that is, the Nth retry step in the retry process is executed (step S9).
10). As is apparent, during the execution of the retry processing, the information (command type, start address, and size) of the command to be retried and the error address (the target of the retry processing) are stored in the retry management area 224. Address) and the number N of the retry step being executed.

【００３４】もし、エラーアドレスへのアクセスに関す
るリトライステップＮを実行してもエラーが解消されな
かった場合（ステップＳ１１）、マイクロプロセッサ２
２１はＮを１だけインクリメントして（ステップＳ１
３）、ステップＳ９以降の処理を再び実行する。マイク
ロプロセッサ２２１は、以上のステップＳ９以降の処理
を、リトライに成功するまで（ステップＳ１１）、繰り
返し実行する。但し、予め定められたリトライステップ
までリトライ処理を実行してもリトライに成功しなかっ
た場合には（ステップＳ１１，Ｓ１２）、マイクロプロ
セッサ２２１はステータス・フェーズを利用してホスト
システム１０に対して異常終了を通知（応答）する（ス
テップＳ１７）。この異常終了通知には、エラー箇所の
情報（エラーアドレス）が付されている。If the error has not been resolved by executing the retry step N for accessing the error address (step S11), the microprocessor 2
21 increments N by 1 (step S1).
3) The processing after step S9 is executed again. The microprocessor 221 repeatedly executes the processing from step S9 onward until the retry succeeds (step S11). However, if the retry is not successful even after executing the retry processing up to the predetermined retry step (steps S11 and S12), the microprocessor 221 uses the status phase to make an error to the host system 10. The end is notified (responded) (step S17). This abnormal end notification is accompanied by information of an error location (error address).

【００３５】これに対し、リトライステップＮが正常に
実行できた場合、つまりリトライステップＮによりエラ
ーが解消された場合には（ステップＳ１１）、マイクロ
プロセッサ２２１はリトライ管理エリア２２４の内容を
クリアする（ステップＳ１４）。次にマイクロプロセッ
サ２２１は、ホストシステム１０から要求されたコマン
ドで指定されたアドレス範囲のアクセスのうち未実行の
アドレスがあるならば（ステップＳ１５）、リトライに
成功したアドレスに後続するアドレス以降を対象として
当該コマンドを実行する（ステップＳ１６）。そしてマ
イクロプロセッサ２２１は、コマンド実行が正常に行え
たならば（ステップＳ５）、上記ステップＳ６によりホ
ストシステム１０に対して正常終了を通知して処理を終
了し（ステップＳ６）、エラーとなったならば（ステッ
プＳ５）、上記ステップＳ７以降の処理を行う。On the other hand, when the retry step N can be executed normally, that is, when the error is eliminated by the retry step N (step S11), the microprocessor 221 clears the contents of the retry management area 224 (step S11). Step S14). Next, if there is an unexecuted address among the accesses in the address range specified by the command requested from the host system 10 (step S15), the microprocessor 221 targets the address following the address succeeding the retry. And execute the command (step S16). Then, if the command execution has been performed normally (step S5), the microprocessor 221 notifies the host system 10 of the normal end in step S6 and ends the processing (step S6). If it is (step S5), the processing after step S7 is performed.

【００３６】一方、ホストシステム１０からのアクセス
コマンドが、前回リトライ処理が行われたコマンドと同
一である場合（ステップＳ２）、マイクロプロセッサ２
２１はリトライ管理エリア２２４のフィールド２２４ｆ
に設定されているリトライ中リセットフラグがＯＮ状態
にあるか否かを判定する（ステップＳ２１）。On the other hand, if the access command from the host system 10 is the same as the command for which the retry process was performed last time (step S2), the microprocessor 2
21 is a field 224f of the retry management area 224
It is determined whether or not the retry-in-reset flag set to is ON (step S21).

【００３７】もし、リトライ中リセットフラグがＯＮ状
態にある場合（ステップＳ２１）、マイクロプロセッサ
２２１は前回のリセット処理の最中にホストシステム１
０からＳＣＳＩバスリセットを受け取ったものと判定す
る。この場合、マイクロプロセッサ２２１はリトライ中
リセットフラグをＯＦＦして（ステップＳ２２）、ホス
トシステム１０に対して予め定められた特定エラーコー
ドを通知し（ステップＳ２３）、しかる後にステップＳ
２４に進む。これに対し、リトライ中リセットフラグが
ＯＮ状態にない場合（ステップＳ２１）、マイクロプロ
セッサ２２１は前回のリセット処理の最中にホストシス
テム１０からＳＣＳＩバスリセットを受け取っていない
ものと判定し、即ちホストシステム１０での監視時間内
に最後のリセットステップまで実行したものと判定し、
前記ステップＳ３に戻って、同一コマンドでない場合と
同様の処理を行う。If the reset flag during retry is in the ON state (step S21), the microprocessor 221 sends the host system 1 during the last reset processing.
From 0, it is determined that a SCSI bus reset has been received. In this case, the microprocessor 221 turns off the reset flag during the retry (step S22) and notifies the host system 10 of a predetermined specific error code (step S23).
Proceed to 24. On the other hand, if the reset flag during the retry is not in the ON state (step S21), the microprocessor 221 determines that the SCSI bus reset has not been received from the host system 10 during the previous reset process, that is, the host system Judge that the last reset step has been executed within the monitoring time at 10,
Returning to step S3, the same processing as when the commands are not the same is performed.

【００３８】マイクロプロセッサ２２１はステップＳ２
４においてリトライ管理エリア２２４を参照し、フィー
ルド２２４ｄに設定されている前回リトライとなったア
ドレス（エラーアドレス）Ａとフィールド２２４ｅに設
定されているＳＣＳＩバスリセットを受け取った際に実
行していたリトライステップの番号Ｎとを検出する。そ
してマイクロプロセッサ２２１は、新たに受信した、前
回リトライとなったのと同一のコマンドの指定する通常
のアクセス処理を開始アドレスから上記検出したアドレ
スＡの直前のアドレスＡ−１まで実行する（ステップＳ
２５）。次にマイクロプロセッサ２２１は、前回エラー
となったアドレスＡのアクセスを、通常の処理ではなく
てリトライ処理により、且つ上記検出したリトライステ
ップ番号Ｎで示されるリトライステップＮから実行する
（ステップＳ２６）。即ちマイクロプロセッサ２２１
は、前回ホストシステム１０から受信したコマンドの指
定するアドレスのうちのアドレスＡに対するアクセスで
エラーが発生してリトライ処理が行われ、そのリトライ
処理におけるリトライステップＮの実行中にＳＣＳＩバ
スリセットを受け取っていたときは、次に同じコマンド
を実行する場合、アドレスＡに対するアクセスについて
は、リトライステップＮからのリトライ処理により実行
する。The microprocessor 221 determines in step S2
4 refers to the retry management area 224, and executes the retry step executed when the address (error address) A set in the field 224d and the SCSI bus reset set in the field 224e are received. And the number N. Then, the microprocessor 221 executes the newly received normal access processing designated by the same command as the one previously retried from the start address to the address A-1 immediately before the detected address A (step S).
25). Next, the microprocessor 221 executes access to the address A in which the error has occurred last time, not through normal processing, but through retry processing, and from the retry step N indicated by the detected retry step number N (step S26). That is, the microprocessor 221
Indicates that an error has occurred in the access to the address A among the addresses designated by the command received from the host system 10 last time, a retry process is performed, and a SCSI bus reset is received during the execution of the retry step N in the retry process. When the same command is executed next, the access to the address A is executed by the retry processing from the retry step N.

【００３９】例えば、前回のコマンドがリードコマンド
で、当該コマンドの指定するアドレスが＃１００〜＃１
１０であり、アドレス＃１０５からの読み出し時にリト
ライが発生し、リトライステップ＃１〜＃３０の（リト
ライステップ列の）うちのリトライステップ＃１５を実
行中にＳＣＳＩバスリセットを受け取ったものとする
と、次にホストシステム１０から同じコマンドを受信し
た場合、当該コマンドは次のように実行される。まず、
アドレス＃１００〜＃１０４に対するアクセスは通常に
行われる。次に、アドレス＃１０５に対するアクセス
は、リトライステップ＃１５から実行される。For example, the previous command is a read command, and the address specified by the command is # 100 to # 1.
Assuming that a retry occurs at the time of reading from the address # 105 and a SCSI bus reset is received during the retry step # 15 of the retry steps # 1 to # 30 (of the retry step sequence), Next, when the same command is received from the host system 10, the command is executed as follows. First,
Access to addresses # 100 to # 104 is normally performed. Next, access to the address # 105 is executed from the retry step # 15.

【００４０】もし、本実施形態とは異なって、アドレス
＃１０５に対するアクセスを通常に行った場合には、リ
トライの発生確率が高い。また、リトライ処理を行った
としても、リトライステップ＃１からのリトライ処理で
は、リトライステップ＃１４までは回復する可能性が少
ない。これに対して本実施形態では、前回実行中のリト
ライステップ＃１５からリトライ処理を実施するように
しているため、エラー回復の確率が上がり処理時間の短
縮にもなる。If the access to the address # 105 is normally performed unlike the present embodiment, the probability of occurrence of a retry is high. Further, even if the retry processing is performed, there is little possibility of recovery from retry processing from retry step # 1 to retry step # 14. On the other hand, in the present embodiment, since the retry processing is performed from the retry step # 15 which is being executed last time, the probability of error recovery increases and the processing time is shortened.

【００４１】さて、上記ステップＳ２６の後は、前記し
たステップＳ１１以降の処理が行われる。After the above step S26, the above-described processing after step S11 is performed.

【００４２】次に、ＨＤＤ２０側でホストシステム１０
からＳＣＳＩバスリセットを受け取った場合の動作を説
明する。まずマイクロプロセッサ２２１はリトライ処理
中であるか否かを判定する（ステップＳ３１）。もし、
リトライ処理中であれば、マイクロプロセッサ２２１は
リトライ処理を停止して、リトライ管理エリア２２４の
フィールド２２４ｆ中のリトライ中リセットフラグをＯ
Ｎする（ステップＳ３２）。これにより、リトライ処理
中にホストシステム１０からＳＣＳＩバスリセットを受
け取ったことを示すログ、つまりホストシステム１０で
タイムアウトエラーが検出されたことを示すログがフィ
ールド２２４ｆに採取されたことになる。Next, on the HDD 20 side, the host system 10
The operation when a SCSI bus reset is received from the CPU will be described. First, the microprocessor 221 determines whether a retry process is being performed (step S31). if,
If the retry processing is being performed, the microprocessor 221 stops the retry processing, and sets the reset flag during retry in the field 224f of the retry management area 224 to O.
N (Step S32). As a result, a log indicating that a SCSI bus reset has been received from the host system 10 during the retry process, that is, a log indicating that a timeout error has been detected in the host system 10, has been collected in the field 224f.

【００４３】マイクロプロセッサ２２１は、ステップＳ
３２を実行すると、リトライ管理エリア２２４を除き、
ホストシステム１０からのＳＣＳＩバスリセットに応じ
てＨＤＤ２０を対象とする初期化処理を行う（ステップ
Ｓ３３）。これにより、リトライ管理エリア２２４の情
報、即ちＳＣＳＩバスリセットを受け取った際にリトラ
イ処理の対象となっていたコマンドの情報とエラーアド
レス、及びその際に実行されていたリトライステップの
番号がリセットされずに残される。これに対し、リトラ
イ処理中でない状態でホストシステム１０からＳＣＳＩ
バスリセットを受け取った場合、マイクロプロセッサ２
２１はＳＣＳＩバスリセットに応じてＨＤＤ２０を対象
とする通常の初期化処理を行う（ステップＳ３４）。こ
の場合、リトライ管理エリア２２４の情報も初期化（ク
リア）される。マイクロプロセッサ２２１はステップＳ
３３またはＳ３４の初期化処理を行うとＲＥＡＤＹ状態
となり、ホストシステム１０からのコマンドを待つ。The microprocessor 221 determines in step S
32, except for the retry management area 224,
The initialization process for the HDD 20 is performed in response to the SCSI bus reset from the host system 10 (step S33). As a result, the information of the retry management area 224, that is, the information and error address of the command that was the target of the retry processing when the SCSI bus reset was received, and the number of the retry step executed at that time are not reset. Will be left. On the other hand, if the host system 10
When a bus reset is received, the microprocessor 2
21 performs a normal initialization process for the HDD 20 in response to the SCSI bus reset (step S34). In this case, the information of the retry management area 224 is also initialized (cleared). The microprocessor 221 determines in step S
When the initialization processing of step S33 or S34 is performed, the state changes to the READY state, and a command from the host system 10 is waited.

【００４４】次に、ホストシステム１０でのリトライ、
即ちホストリトライについて説明する。ホストシステム
１０は、ＨＤＤ２０に対してアクセスコマンドを送信す
ると、予め定められた監視時間の間、ＨＤＤ２０からの
当該コマンドに対する応答を監視する。もし、上記監視
時間の間にＨＤＤ２０から異常終了が通知された場合、
或いは監視時間を経過してもＨＤＤ２０から応答が返さ
れずにタイムアウトエラーとなった場合、ホストシステ
ム１０はＨＤＤ２０に対してＳＣＳＩバスリセットを送
信した後、以下に述べるホストリトライを実行する。Next, retry in the host system 10
That is, the host retry will be described. When transmitting the access command to the HDD 20, the host system 10 monitors a response to the command from the HDD 20 for a predetermined monitoring time. If an abnormal end is notified from the HDD 20 during the monitoring time,
Alternatively, if a timeout error occurs without returning a response from the HDD 20 even after the monitoring time has elapsed, the host system 10 transmits a SCSI bus reset to the HDD 20, and then executes a host retry described below.

【００４５】まずホストシステム１０は、異常終了また
はタイムアウトエラーとなったのと同一のコマンドをＨ
ＤＤ２０に送信する（ステップＳ４１）。そしてホスト
システム１０はＨＤＤ２０からの応答の監視を開始する
（ステップＳ４２）。First, the host system 10 sends the same command as the one that caused the abnormal termination or the timeout error to H
The data is transmitted to the DD 20 (step S41). Then, the host system 10 starts monitoring the response from the HDD 20 (step S42).

【００４６】前記したように、ＨＤＤ２０側では、リト
ライ中リセットフラグがＯＮしている状態で、即ちリト
ライ処理中にホストシステム１０からＳＣＳＩバスリセ
ットを受け取ったことがリトライ中リセットフラグによ
り示されている状態で、前回リトライ処理が行われたの
と同一のアクセスコマンドをホストシステム１０から受
け取った場合、ＨＤＤ２０からホストシステム１０に特
定エラーコードが通知される。As described above, on the HDD 20 side, the reset flag during retry indicates that the reset flag during retry is ON, that is, the SCSI bus reset is received from the host system 10 during the retry process. In this state, when the same access command as the one in which the retry processing was performed last time is received from the host system 10, the specific error code is notified from the HDD 20 to the host system 10.

【００４７】ホストシステム１０は、ＨＤＤ２０へのア
クセスコマンド送出に対してＨＤＤ２０から特定エラー
コードが通知された場合（ステップＳ４３）、監視時間
を所定時間だけ延長するように設定する（ステップＳ４
４）。When a specific error code is notified from the HDD 20 in response to the transmission of the access command to the HDD 20 (step S43), the host system 10 sets the monitoring time to be extended by a predetermined time (step S4).
4).

【００４８】ホストシステム１０は、設定されている監
視時間を上限としてＨＤＤ２０からの応答を監視する
（ステップＳ４５，Ｓ４６）。もし、監視時間を経過し
てもＨＤＤ２０からの応答がなかった場合、ホストシス
テム１０はホストリトライでのタイムアウトエラーを判
定する（ステップＳ４６）。The host system 10 monitors the response from the HDD 20 with the set monitoring time as an upper limit (steps S45 and S46). If there is no response from the HDD 20 even after the monitoring time has elapsed, the host system 10 determines a timeout error in the host retry (step S46).

【００４９】ホストシステム１０はタイムアウトエラー
を判定すると、エラーコードの通知がない状態でＭ回
（Ｍは予め定められた整数）連続してタイムアウトエラ
ーとなったか否かを判定する（ステップＳ４７，Ｓ４
８）。もしエラーコードの通知がない状態でＭ回連続し
てタイムアウトエラーとなった場合、ホストシステム１
０はＨＤＤ２０が故障またはロック状態にあるものと判
断する（ステップＳ４９）。この場合、ホストシステム
１０はユーザ（システム管理者）に対して表示画面等を
介してＨＤＤ２０の故障またはロック状態を通知し、Ｈ
ＤＤ２０の交換等を要求する。When the host system 10 determines that a timeout error has occurred, it determines whether a timeout error has occurred consecutively M times (M is a predetermined integer) without notification of an error code (steps S47 and S4).
8). If a timeout error occurs consecutively M times without notification of an error code, the host system 1
A value of 0 determines that the HDD 20 is in a failure or locked state (step S49). In this case, the host system 10 notifies the user (system administrator) of the failure or the locked state of the HDD 20 via a display screen or the like.
Request exchange of DD20, etc.

【００５０】これに対し、タイムアウトエラーの連続発
生回数がＭ回に満たないか、Ｍ回連続していてもエラー
コードの通知のある状態でのタイムアウトエラーである
場合には、ホストシステム１０はＨＤＤ２０でのリトラ
イ処理に要する時間に比較して監視時間が短いことによ
るタイムアウトエラーであり、ＨＤＤ２０が故障または
ロック状態にあるためのタイムアウトエラーではないと
判断する。この場合、ホストシステム１０はＨＤＤ２０
に対して再びＳＣＳＩバスリセットを送信し（ステップ
Ｓ５０）、しかる後にホストリトライを再度実行する。On the other hand, if the number of consecutive occurrences of the timeout error is less than M, or if the error is a timeout error with the error code being notified even if the number of consecutive occurrences is M, the host system 10 It is determined that the timeout error is due to the monitoring time being shorter than the time required for the retry processing in the above, and is not the timeout error due to the failure or the locked state of the HDD 20. In this case, the host system 10
, A SCSI bus reset is transmitted again (step S50), and then the host retry is executed again.

【００５１】一方、タイムアウトとなる前に、つまり監
視時間内に、ホストシステム１０からのアクセスコマン
ドに対してＨＤＤ２０からステータス・フェーズで応答
が返された場合（ステップＳ４５）、ホストシステム１
０は応答内容から正常終了を示す応答であるか、或いは
異常終了を示す応答であるかを判定する（ステップＳ５
１）。もし、異常終了応答であるならば、ホストシステ
ム１０は、Ｌ回（Ｌは予め定められた整数）連続して異
常終了応答が返されたか否かを判定する（ステップＳ５
２）。そしてホストシステム１０は、異常終了応答の連
続回数がＬ回に満たないならば、再度のホストリトライ
で正常に終了する可能性があるものとして、ＨＤＤ２０
に対して再びＳＣＳＩバスリセットを送信し（ステップ
Ｓ５０）、しかる後にホストリトライを再度実行する。
これに対し、Ｌ回連続して異常終了応答が返されたなら
ば、ホストシステム１０はこれ以上のホストリトライは
無駄であるとして、異常終了応答で示されたエラー箇所
の情報に基づくエラー処理を行う（ステップＳ５３）。
なお、このエラー処理自体は本発明に直接関係しないた
め説明を省略する。また、ＨＤＤ２０から正常終了応答
が返されたならば、ホストシステム１０はホストリトラ
イを終了する。On the other hand, if the HDD 20 returns a response to the access command from the host system 10 in the status phase before the timeout occurs, that is, within the monitoring time (step S45), the host system 1
It is determined whether 0 is a response indicating a normal end or a response indicating an abnormal end from the response content (step S5).
1). If the response is an abnormal end response, the host system 10 determines whether an abnormal end response has been returned L times (L is a predetermined integer) continuously (step S5).
2). If the number of consecutive abnormal termination responses is less than L, the host system 10 determines that the HDD 20 may terminate normally by retrying the host again.
, A SCSI bus reset is transmitted again (step S50), and then the host retry is executed again.
On the other hand, if an abnormal end response is returned L times consecutively, the host system 10 determines that further host retries are useless and performs error processing based on the information of the error location indicated by the abnormal end response. Perform (step S53).
Note that the error processing itself is not directly related to the present invention, and a description thereof will be omitted. When a normal end response is returned from the HDD 20, the host system 10 ends the host retry.

【００５２】以上に述べた実施形態では、受信コマンド
のコマンドタイプ、開始アドレス及びサイズが、前回リ
トライ処理が行われたコマンドのそれと同一である場合
に、当該受信コマンドが前回リトライ処理が行われたの
と同一のコマンドであるとしたがこれに限るものではな
い。例えば、少なくとも前回エラーとなったアドレスを
含むアドレス範囲のアクセスを要求するアクセスコマン
ドを、前回と同一のコマンドであるとしても構わない。In the embodiment described above, when the command type, the start address, and the size of the received command are the same as those of the command for which the previous retry processing was performed, the received command is subjected to the previous retry processing. Although it is assumed that the command is the same as that described above, the present invention is not limited to this. For example, an access command requesting access to an address range including at least the address where an error has occurred last time may be the same command as the last time.

【００５３】なお、本発明は、上記実施形態に限定され
るものではなく、実施段階ではその要旨を逸脱しない範
囲で種々に変形することが可能である。更に、上記実施
形態には種々の段階の発明が含まれており、開示される
複数の構成要件における適宜な組み合わせにより種々の
発明が抽出され得る。例えば、実施形態に示される全構
成要件から幾つかの構成要件が削除されても、発明が解
決しようとする課題の欄で述べた課題が解決でき、発明
の効果の欄で述べられている効果が得られる場合には、
この構成要件が削除された構成が発明として抽出され得
る。The present invention is not limited to the above-described embodiment, and can be variously modified in an implementation stage without departing from the gist of the invention. Further, the embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some components are deleted from all the components shown in the embodiment, the problem described in the column of the problem to be solved by the invention can be solved, and the effects described in the column of the effect of the invention can be solved. If you get
A configuration from which this configuration requirement is deleted can be extracted as an invention.

【００５４】[0054]

【発明の効果】以上詳述したように本発明によれば、デ
ィスク記憶装置でのリトライ処理中にホストシステムで
のタイムアウトエラー検出に伴うリセット要求を受け取
ると、その事象を記録しておき、その状態でホストシス
テムからディスク記憶装置に対して先にリトライとなっ
たのと同一のアクセス要求が与えられた場合に、ディス
ク記憶装置からホストシステムに特定エラーコードを通
知して、ホストシステムに対して監視時間を延長させる
ようにしたので、ホストシステムでのタイムアウトエラ
ー検出の発生を抑えることができる。また本発明によれ
ば、タイムアウトエラー検出に伴うリセット要求に対し
てディスク記憶装置からホストシステムに特定エラーコ
ードが通知されるか否かにより、タイムアウトエラーの
要因をホストシステムにおいて容易に判断できる。As described above in detail, according to the present invention, when a reset request accompanying a timeout error detection in the host system is received during a retry process in the disk storage device, the event is recorded and the event is recorded. In this state, when the same access request as that which was retried earlier to the disk storage device is given from the host system, the disk storage device notifies the host system of a specific error code and notifies the host system. Since the monitoring time is extended, the occurrence of timeout error detection in the host system can be suppressed. Further, according to the present invention, the cause of the timeout error can be easily determined in the host system based on whether a specific error code is notified from the disk storage device to the host system in response to the reset request accompanying the detection of the timeout error.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る計算機システムの構
成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a computer system according to an embodiment of the present invention.

【図２】図１中のリトライ管理エリア２２４のデータ構
造例を示す図。FIG. 2 is a view showing an example of a data structure of a retry management area 224 in FIG. 1;

【図３】同実施形態におけるＨＤＤ２０のアクセスコマ
ンド受信時の処理手順を説明するためのフローチャート
の一部を示す図。FIG. 3 is an exemplary flowchart showing a part of a processing procedure for receiving an access command from the HDD 20 according to the embodiment;

【図４】同アクセスコマンド受信時の処理手順を説明す
るためのフローチャートの他の一部を示す図。FIG. 4 is a view showing another part of the flowchart for describing the processing procedure when the access command is received.

【図５】同アクセスコマンド受信時の処理手順を説明す
るためのフローチャートの残りを示す図。FIG. 5 is a view showing the rest of the flowchart for explaining the processing procedure when the access command is received.

【図６】同実施形態におけるＨＤＤ２０のＳＣＳＩバス
リセット受信時の処理手順を説明するためのフローチャ
ート。FIG. 6 is an exemplary flowchart illustrating a processing procedure when the HDD 20 receives a SCSI bus reset in the embodiment.

【図７】同実施形態におけるホストシステム１０のホス
トリトライ処理の手順を説明するためのフローチャー
ト。FIG. 7 is an exemplary flowchart for explaining the procedure of a host retry process of the host system 10 in the embodiment.

[Explanation of symbols]

１０…ホストシステム２０…ＨＤＤ（磁気ディスク装置、ディスク記憶装置）２１…ドライブ部２２…制御部２２１…マイクロプロセッサ２２２…ＲＯＭ２２３…ＲＡＭ２２４…リトライ管理エリア（特定領域） DESCRIPTION OF SYMBOLS 10 ... Host system 20 ... HDD (magnetic disk device, disk storage device) 21 ... Drive part 22 ... Control part 221 ... Microprocessor 222 ... ROM 223 ... RAM 224 ... Retry management area (specific area)

Claims

[Claims]

1. A disk storage device used as an external storage device of a host system by said host system, wherein said means executes an access request received from said host system, and execution of said access request results in an error. A means for executing a retry process including a predetermined retry step group; a storage means having a specific area secured; and a response from the disk storage device regarding execution of the access request sent by the host system is a set monitoring time. If a timeout error not returned within the host system is detected by the host system, and a reset request for initializing the disk storage device sent from the host system during execution of the retry process is received, The retry-in-reset information indicating that fact is stored in the storage device. Retry information management means for registering in the specific area of a stage, means for causing the access request execution means to execute an access request sent from the host system,
If the access request is the same request as the previous request and the reset information during retry is registered in the specific area of the storage unit, it indicates that the reset request was received during execution of the retry process, and Means for notifying the host system of a specific error code requesting the host system to extend the monitoring time and causing the access request execution means to execute the access request. .

2. The retry information management means registers, in the specific area, an access request and an address subjected to the retry processing by the access request execution means and retry step identification information indicating a retry step being executed. At the same time, each time the retry step is switched, the retry step identification information on the specific area is updated, and the access request execution means receives from the host system the same access request as the target of the retry processing. In this case, an access to an address which has been previously subjected to the retry processing registered in the specific area is executed by a retry processing from a retry step indicated by the retry step identification information on the specific area. The disk storage device according to claim 1, wherein

3. A means for performing initialization processing of the disk storage device in response to the reset request from the host system, wherein when the reset request is received during execution of the retry processing, at least the 2. The disk storage device according to claim 1, further comprising means for performing an initialization process excluding a specific area.

4. A computer system comprising a host system and a disk storage device used by the host system as an external storage device of the host system, wherein the disk storage device receives an access request received from the host system. Means for executing, when an error occurs during execution of the access request, means for executing a retry process including a predetermined retry step group; storage means for securing a specific area; execution of the retry process When receiving a reset request for initializing the disk storage device from the host system during retry information management means for registering retry-in-reset information indicating the fact in the specific area of the storage means, Execution of the access request sent from the system And means for executing by the step,
If the access request is the same request as the previous request, and the reset information during retry is registered in the specific area of the storage unit, a specific request indicating that the reset request was received during execution of the retry process Means for notifying an error code to the host system and causing the access request execution means to execute the access request. When the host system sends an access request to the disk storage device, the host system executes the access request. Monitoring means for detecting a timeout error by monitoring a response from the disk storage device with respect to a currently set monitoring time as an upper limit, wherein the monitoring is performed when the error code is notified from the host system. Computer having monitoring means for monitoring for extended time Stem.

5. The host device determines a failure of the disk storage device when the timeout error is continuously detected a predetermined number of times for the same access request in a state where the error code is not notified. 5. The computer system according to claim 4, further comprising:

6. The retry information management means registers, in the specific area, an access request and an address subjected to the retry processing by the access request execution means and retry step identification information indicating a retry step being executed. At the same time, each time the retry step is switched, the retry step identification information on the specific area is updated, and the access request execution means receives from the host system the same access request as the target of the retry processing. In this case, an access to an address which has been previously subjected to the retry processing registered in the specific area is executed by a retry processing from a retry step indicated by the retry step identification information on the specific area. The computer system according to claim 4, wherein

7. An error notification method at the time of a retry process in a computer system including a host system and a disk storage device used by the host system as an external storage device of the host system, wherein When an error occurs in execution of the access request given to the disk storage device, the disk storage device executes an internal retry process including a predetermined retry step group; and the host system accesses the disk storage device. Detecting a time-out error by monitoring the response from the disk storage device regarding the execution of the access request with the currently set monitoring time as an upper limit when the request is transmitted; and Detection or said data Upon host retry of sending the same access request from the host system to the disk storage device again in response to the abnormal termination notification from the disk storage device, the host system initializes the storage device to the disk storage device. Sending a reset request for executing the internal retry process of the disk storage device, and when the disk storage device receives the reset request, holding retry-in-progress reset information indicating the reset request, Initializing the inside of the disk storage device except for at least the reset information during retry in response to a reset request; and the disk storage device receives an access request from the host system in a state of holding the reset information during retry. If the disk storage device Notifying the host system of a specific error code; and extending the monitoring time by the host system according to the error code. .