JP2003157151A

JP2003157151A - File sharing system with mirror configuration storage

Info

Publication number: JP2003157151A
Application number: JP2002200737A
Authority: JP
Inventors: Shoji Kodama; 昇司児玉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2001-07-13
Filing date: 2002-07-10
Publication date: 2003-05-30

Abstract

(57)【要約】（修正有）【課題】データアクセスを最適化するシステムを提供す
る。【解決手段】本システムは、一つのクラスターファイル
システムサーバ、幾つかのクラスターファイルシステム
クライアント及び一つのストレージシステム、更に本ス
トレージシステムは、ペア構成された複数のディスクド
ライブを含む。各ペアは一つのマスタディスクドライブ
と一つ以上のミラーディスクドライブで構成される。あ
るファイルが必要とされるとき、該クライアントは該サ
ーバに要求を送信し、該サーバは、必要とされるファイ
ルの位置、つまり該ペアがどれであるかを特定し、さら
に該ペア内のアクセスすべきディスクドライブを判定す
る。本判定は、ペア内の各ディスクドライブのアクセス
負荷のバランスをとる様選択される。また、本システム
は幾つかのストレージシステムを有する事もでき、ペア
を構成するディスクドライブは複数のストレージシステ
ムにまたがることも出来る。 (57) [Summary] (with correction) [PROBLEMS] To provide a system for optimizing data access. The system includes one cluster file system server, several cluster file system clients and one storage system, and the storage system includes a plurality of disk drives configured in pairs. Each pair includes one master disk drive and one or more mirror disk drives. When a file is needed, the client sends a request to the server, which identifies the location of the needed file, i.e., which pair, and the access within the pair. Determine the disk drive to be used. This determination is selected so as to balance the access load of each disk drive in the pair. Further, the present system can have several storage systems, and a disk drive forming a pair can also span a plurality of storage systems.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、一般的にはファイ
ルストレージシステムに関連し、特に、ミラー機能を持
つファイルストレージシステムを共有する時のデータア
クセスを最適化するためのシステムと方法に関連する。FIELD OF THE INVENTION The present invention relates generally to file storage systems, and more particularly to systems and methods for optimizing data access when sharing file storage systems with mirror capabilities. .

【０００２】[0002]

【従来の技術】一般的に、ファイルストレージシステム
は多数のホストやクライアントが同時にアクセスできる
多数のファイルを保有する。多くの種類のアプリケーシ
ョンがホスト上で稼動し、ある種のアプリケーションで
はファイルを共有し、又ある種のタスクを共同して処理
する。そのようなアプリケーションの一例はクラスター
化されたビデオサーバシステムである。ビデオサーバシ
ステムの典型的なタスクは、ビデオデータの流れをコン
ピュータネットワーク経由で、視聴クライアントへ送信
することである。ビデオデータのサイズは通常非常に大
きく、数ギガバイト以上の規模にもなる。従って、ビデ
オデータの送信はしばしば大変時間のかかる処理とな
る。それ故、ただ一つのビデオサーバが複数ホストに対
してビデオデータの送信を扱うのは効率的ではない。2. Description of the Related Art Generally, a file storage system has many files that can be simultaneously accessed by many hosts and clients. Many types of applications run on the host, some applications share files, and some tasks collaborate. One example of such an application is a clustered video server system. A typical task of a video server system is to send a stream of video data over a computer network to a viewing client. The size of video data is usually very large, even over several gigabytes. Therefore, sending video data is often a very time consuming process. Therefore, it is not efficient for a single video server to handle the transmission of video data for multiple hosts.

【０００３】本限界を克服する典型的な方法は、複数の
ビデオサーバを使用して、ビデオデータを並行して、ま
たは同時に送信することである。上述した様に、ビデオ
データのサイズは通常非常に大きい。従って、ビデオデ
ータのコピーを各ビデオサーバのローカルディスクドラ
イブに保持しようとすると、大量のディスク容量が各ビ
デオサーバに必要になり、この為の維持コストもかなり
大きなものとなる。その結果、複数のビデオサーバがビ
デオデータを共有する事が一般的な解決策となる。図１
はビデオデータを保持する単一のストレージシステムを
複数のビデオサーバが共有する典型的なシステム構成を
示す。A typical way to overcome this limitation is to use multiple video servers to send video data in parallel or simultaneously. As mentioned above, the size of video data is usually very large. Therefore, attempting to keep a copy of the video data on each video server's local disk drive would require a large amount of disk space on each video server, which would be quite expensive to maintain. As a result, sharing video data with multiple video servers is a common solution. Figure 1
Shows a typical system configuration in which a plurality of video servers share a single storage system holding video data.

【０００４】図１に示す単一ストレージシステムには少
なくとも4つの問題点がある。即ち、（１）ビデオデー
タを単一のストレージシステムに保存することに起因す
る性能ネック、（２）ビデオデータを単一のディスクド
ライブに保存することに起因する性能ネック、（３）ビ
デオデータを単一のストレージシステムに保存すること
に起因する単一障害問題、及び（４）ビデオデータを単
一のディスクドライブに保存することに起因する単一障
害問題、である。The single storage system shown in FIG. 1 has at least four problems. That is, (1) a performance bottleneck caused by storing the video data in a single storage system, (2) a performance bottleneck caused by storing the video data in a single disk drive, and (3) a performance bottleneck. There are a single failure problem caused by saving in a single storage system, and (4) a single failure problem caused by saving video data in a single disk drive.

【０００５】性能の観点からは、単一ストレージシステ
ムが幾つかの性能ネック要因で制約を受けることは容易
にわかる。例えば、一つの性能ネック要因はストレージ
システムのI/Oスループットの限界である。ストレージ
システムのI/Oスループットには限界がないわけではな
い。従って、視聴クライアント数が増大し続けると、ス
トレージシステムのI/Oスループットはある点で最大限
界に達し、幾つかのクライアントの要求は満たされない
まま放置される事となる。From a performance point of view, it is easy to see that a single storage system is constrained by several performance bottlenecks. For example, one performance bottleneck is the limit of storage system I / O throughput. Storage system I / O throughput is not without its limits. Therefore, as the number of viewing clients continues to increase, the I / O throughput of the storage system reaches its maximum limit at some point, and the demands of some clients are left unfulfilled.

【０００６】もう一つの性能ネック要因は、ストレージ
システムの内部に存在する。ストレージシステムはデー
タ保存用に多数の物理ディスクドライブを有している。
単一ディスクドライブのI/Oスループットは小さい。そ
れ故、多数の視聴クライアントから同時に一つのディス
クドライブに保存された同一のビデオデータが要求され
た場合は、当該ディスクドライブが当該ビデオサーバシ
ステムの性能ネック要因となり得る。Another performance bottleneck exists inside the storage system. The storage system has a large number of physical disk drives for storing data.
Single disk drive has low I / O throughput. Therefore, when the same video data stored in one disk drive is requested from many viewing clients at the same time, the disk drive may become a performance bottleneck of the video server system.

【０００７】システムの可用性の観点からは、単一スト
レージシステムは単一障害ネックになる事は明らかであ
る。もしある障害が単一ストレージシステムを稼動不能
にしたら、ビデオデータは全く出力できなくなる。同様
にビデオデータが単一のディスクドライブに保存されて
いる場合も同様である。もし一つのディスクドライブが
障害になると、当該ドライブからのビデオデータは全く
出力できなくなってしまう。From the standpoint of system availability, it is clear that a single storage system becomes a single bottleneck. If a failure renders a single storage system inoperable, no video data can be output. Similarly, when the video data is stored in a single disk drive, the same is true. If one disk drive fails, the video data from that drive cannot be output at all.

【０００８】上述した単一ストレージシステムに関わる
問題を克服する為の努力がなされてきた。例えば、RAID
(Redundant Array of Inexpensive Disk)と呼ばれる方
法は、データに冗長性を持たせるために、ディスクドラ
イブのグループにデータを保存させる方法である。RAID
の基本原理は同じデータをディスクドライブグループ内
の異なったディスクドライブに保存することである。冗
長データを異なったディスクドライブに保存する事によ
り、ディスクドライブグループ内の一つのディスクドラ
イブで障害が起こっても、データの利用は可能である。
従って、RAIDは、データを単一ディスクドライブに保存
する事に起因する単一障害問題の解決に役立つ。Efforts have been made to overcome the problems associated with the single storage system described above. For example, RAID
A method called (Redundant Array of Inexpensive Disk) is a method of storing data in a group of disk drives in order to make the data redundant. RAID
The basic principle of is to store the same data on different disk drives in a disk drive group. By storing redundant data in different disk drives, the data can be used even if one disk drive in the disk drive group fails.
Therefore, RAID helps solve the single failure problem caused by storing data on a single disk drive.

【０００９】RAIDは更にデータアクセスの面からも、性
能向上に役立つ。RAID１は異なるディスクドライブに複
数のコピーを作るとき使われる方法である。言いかえる
と、ディスクドライブグループ内の各ディスクドライブ
は同じデータを保有している。本方式はデータミラーリ
ングと呼ばれている。ホストコンピュータがRAID１構成
のディスクドライブからデータを読み出す場合には、グ
ループ内の一つ以上のディスクドライブから並行して読
み出すことができる。RAID also helps improve performance in terms of data access. RAID 1 is the method used to make multiple copies on different disk drives. In other words, each disk drive in the disk drive group holds the same data. This method is called data mirroring. When the host computer reads data from the disk drives having the RAID1 configuration, it can read data from one or more disk drives in the group in parallel.

【００１０】ホストコンピュータからの読み出し要求を
並行して処理することにより、合計スループットは増大
する。それ故、単一ディスクドライブに保存することよ
り生じる性能ネック要因は緩和される。 RAIDの実現に
あたっては、二つの方法が一般に知られている。By processing read requests from the host computer in parallel, the total throughput is increased. Therefore, the performance bottleneck resulting from storing on a single disk drive is mitigated. Two methods are generally known for realizing RAID.

【００１１】一つ目は、RAIDがストレージシステムで実
現される。ストレージシステムは、ディスクドライブグ
ループを管理し、その構成を決定するディスクコントロ
ーラを保有する。ディスクコントローラはホストコンピ
ュータから読み出し要求を受信して、ディスクグループ
を使用して如何に当該要求をバランス良く充足させるか
を決定する。即ち、ディスクコントローラはディスクグ
ループを効率良く利用する為の負荷バランス機能を果た
す。First, RAID is realized by a storage system. The storage system has a disk controller that manages the disk drive group and determines its configuration. The disk controller receives the read request from the host computer and uses the disk group to determine how to balance the request. That is, the disk controller performs a load balancing function for efficiently using the disk group.

【００１２】二つ目は、RAIDがホストコンピュータで実
現される。ホストコンピュータは一般にオペレーテイン
グシステムの中にLVMS (Logical Volume Management Sy
stem)を有する。このLVMSはストレージシステムと同じ
機能を果たす。Second, RAID is realized by the host computer. Generally, a host computer is installed in an operating system such as LVMS (Logical Volume Management System).
stem). This LVMS performs the same function as a storage system.

【００１３】ストレージシステムによるRAID及びLVMSに
よるRAIDのいずれも、データを単一ディスクドライブに
格納することにより発生する性能ネック及び単一障害問
題を解決する。しかしながら、ストレージシステムによ
るRAIDは、データを単一ストレージシステムに格納する
事により生じる性能ネック及び単一障害問題は解決しな
い。これは、RAIDが単一ストレージシステム構成に限定
されること自体に起因するからである。これに対して、
LVMSは単一ストレージシステムに限定されることなく、
複数ストレージシステムに適用可能である。この事は、
LVMSは異なるストレージシステム内のディスクドライブ
を1グループとして使用して、RAID1を構成できる事を意
味する。Both RAID by storage system and RAID by LVMS solve the performance bottleneck and the single failure problem caused by storing data in a single disk drive. However, storage system RAID does not solve the performance bottleneck and the single failure problem caused by storing data in a single storage system. This is because RAID itself is limited to a single storage system configuration. On the contrary,
LVMS is not limited to a single storage system,
It is applicable to multiple storage systems. This thing is
LVMS means that disk drives in different storage systems can be used as a group to configure RAID1.

【００１４】図２はLVMSによって管理される複数ストレ
ージシステムを使用したディスクミラーの構成例を示
す。二つ以上のディスクドライブで構成される一グルー
プが1ペアとして定義される。本例では、三つのペア、
ペア１、ペア２,及びペア３が存在する。ペア１は三つ
の別々のストレージシステムに存在する三つのディスク
ドライブを保有する。LVMSはデータ（この例ではファイ
ルA）のコピーをペア１の各ディスクドライブに書きこ
む。コピーを作る為に、LVMSは各ストレージシステムに
同時に同じデータ書き込み命令を発行する。FIG. 2 shows a configuration example of a disk mirror using a plurality of storage systems managed by LVMS. A group consisting of two or more disk drives is defined as a pair. In this example, three pairs,
There are pair 1, pair 2, and pair 3. Pair 1 has three disk drives that reside in three separate storage systems. LVMS writes a copy of the data (file A in this example) to each disk drive in pair 1. To make a copy, LVMS issues the same data write command to each storage system at the same time.

【００１５】同様に、ペア２は二つの異なったストレー
ジシステムに二つのディスクドライブを持つ。それに対
し、ペア３は異なるケースであり、同じストレージシス
テム内の二つのディスクドライブでペアを構成してい
る。本ケースは主に、上述の（２）、（４）の問題、即
ち、単一ディスクドライブにデータを記憶することに基
づく性能ネックと単一障害ネックを解消する事を目的に
用いられる。システム設計者がペア３を使用するか否か
は、ストレージシステムの可用性及びストレージシステ
ム構築コスト等の幾つかの要素いかんである。Similarly, pair 2 has two disk drives in two different storage systems. On the other hand, pair 3 is a different case, and two disk drives in the same storage system form a pair. This case is mainly used to eliminate the problems (2) and (4) described above, that is, the performance neck and the single failure neck due to storing data in a single disk drive. Whether or not the system designer uses the pair 3 depends on several factors such as availability of the storage system and storage system construction cost.

【００１６】[0016]

【発明が解決しようとする課題】以上のことより、LVMS
は上記の（１）から（４）の問題を解決できる様に見え
る。しかしながら、LVMS方式には二つの大きな問題点が
ある。第一の問題は、ストレージシステムに複数の書き
込み要求を発行する事が、ホストコンピュータに過大な
CPU負荷を課すことになることである。例えば、１０台
のディスクドライブがミラーとして存在すれば、ホスト
コンピュータは１０回の書き込み要求を同時に発行する
必要がある。対照的に、ストレージシステムによるRAID
を使用すれば、ミラー操作はストレージシステム内のデ
ィスクコントローラが実行してくれる。ディスクコント
ローラが書き込み要求を管理してくれる為、ホストコン
ピュータのCPU負荷は軽減される。[Problems to be Solved by the Invention] From the above, LVMS
Seems to be able to solve the problems (1) to (4) above. However, the LVMS method has two major problems. The first problem is that issuing multiple write requests to the storage system is too large for the host computer.
It will impose a CPU load. For example, if 10 disk drives exist as mirrors, the host computer must issue 10 write requests at the same time. In contrast, RAID by storage system
With, the disk controller in the storage system handles the mirroring. Since the disk controller manages write requests, the CPU load on the host computer is reduced.

【００１７】LVMS方式に関わるもう一つの問題は、複数
のストレージシステムに対して、I/O要求を分配する機
構がないことである。データのミラー操作は各ホストコ
ンピュータで独立に管理され、ホストコンピュータは各
時点でどのディスクドライブがどのホストコンピュータ
で使用されているか知る由がないことである。最悪のケ
ース、データは複数のストレージシステム間でミラーさ
れておりながら、全てのホストコンピュータが1つのス
トレージシステムを同時に使いに来る事があり得る。従
って、データアクセスを最適化し、他の便益も提供でき
るファイル共用システムを開発することが望まれてい
る。Another problem associated with the LVMS system is that there is no mechanism for distributing I / O requests to multiple storage systems. The data mirroring operation is managed independently by each host computer, and the host computer has no way of knowing which disk drive is being used by which host computer at each point in time. In the worst case, data can be mirrored across multiple storage systems, but all host computers can come to use one storage system at the same time. Therefore, it is desirable to develop a file sharing system that optimizes data access and can provide other benefits as well.

【００１８】[0018]

【課題を解決するための手段】本発明により、データア
クセスを最適化するシステムが実現される。一つの典型
的な実施例では、該システムは、一つのファイルサー
バ、本ファイルサーバと交信できる一つ以上のクライア
ント、及び複数のファイルを収容する為にミラーを構成
する複数のディスクドライブを含む。各ペアはマスタデ
ィスクドライブと一つ以上のミラーディスクドライブで
構成される。各ミラーディスクはマスタディスクドライ
ブに保存されたデータのコピーを保持する。ファイルサ
ーバは、複数のファイルのデータがディスクドライブの
どのペアのどこに保存されているかを示すファイル情報
と、更にディスクドライブの各ペアのアクセス負荷情報
をも保持している。クライアントが一つのファイルにつ
いてファイルサーバにファイル情報を要求すると、ファ
イルサーバは要求ファイルがどのペアに保存されている
かを判定し、更に当該ペア内のどのディスクドライブに
アクセスすべきかも判定する。本判定はペア内ディスク
ドライブの負荷のバランスをとる様実行される。そし
て、本アクセスドライブに関する情報はクライアントに
転送され、クライアントは適切なディスクドライブより
必要なファイルを読み出すことが可能となる。The present invention provides a system for optimizing data access. In one exemplary embodiment, the system includes a file server, one or more clients that can communicate with the file server, and a plurality of disk drives that make up a mirror to contain a plurality of files. Each pair consists of a master disk drive and one or more mirror disk drives. Each mirror disk holds a copy of the data stored on the master disk drive. The file server also holds file information indicating which pair of disk drives where data of a plurality of files are stored, and access load information of each pair of disk drives. When a client requests file information from a file server for one file, the file server determines which pair the requested file is stored in, and also which disk drive in the pair should be accessed. This judgment is executed so as to balance the loads on the disk drives in the pair. Then, the information regarding this access drive is transferred to the client, and the client can read the necessary file from the appropriate disk drive.

【００１９】一方、このシステムでは、多数のストレー
ジシステムを含み得る。一組のディスクドライブのペア
は二つ以上のストレージシステムにまたがってもよい。
即ち、マスタディスクドライブとミラーディスクドライ
ブは各々異なったストレージシステムに存在し得るので
ある。On the other hand, this system may include a large number of storage systems. A pair of disk drives may span more than one storage system.
That is, the master disk drive and the mirror disk drive can exist in different storage systems.

【００２０】マスタディスクドライブとミラーディスク
ドライブが異なるストレージシステム内に存在し、ミラ
ーディスクドライブはマスタディスクドライブの最新デ
ータのコピーを保持している場合のある動作モードで
は、クラスタファイルシステムクライアントは、必要と
されるファイルの格納位置をクラスタファイルシステム
サーバから入手後、当該ファイルを使用率の最も低いミ
ラーディスクドライブから直接読み出す。In some operating modes, where the master disk drive and the mirror disk drive are in different storage systems and the mirror disk drive holds a copy of the latest data on the master disk drive, the cluster file system client requires After obtaining the storage location of the file from the cluster file system server, the file is directly read from the mirror disk drive with the lowest usage rate.

【００２１】マスタディスクドライブとミラーディスク
ドライブが異なるストレージシステム内に存在し、ミラ
ーディスクドライブはマスタディスクドライブの最新デ
ータのコピーを保持していない場合のある別の動作モー
ドでは、必要とされるファイルの保存位置をクラスタフ
ァイルシステムサーバから入手後、クラスタファイルシ
ステムクライアントは、適切なストレージシステムと交
信し、当該ファイルを使用率の最も低いミラーディスク
ドライブより読み取る事を試みる。そのミラーディスク
ドライブがマスタディスクドライブの最新のコピーを保
有していないと判定された場合は、当該ストレージシス
テムが必要なデータをマスタディスクドライブから読み
出し、本データのコピーをクラスタファイルシステムク
ライアントに転送する。In another mode of operation where the master disk drive and the mirror disk drive are in different storage systems and the mirror disk drive does not have a copy of the latest data on the master disk drive, the required files After obtaining the storage location of the file from the cluster file system server, the cluster file system client attempts to read the file from the least used mirror disk drive by contacting the appropriate storage system. If it is determined that the mirror disk drive does not have the latest copy of the master disk drive, the storage system reads the required data from the master disk drive and transfers a copy of this data to the cluster file system client. .

【００２２】他の典型的な実施例として、データアクセ
スを最適化する方法を提示する。本方法は、複数のファ
イル保存の為に複数のディスクドライブをペア構成にし
て、各ペアは一つのマスタディスクドライブと一つ以上
のミラーディスクドライブで構成し、各ミラーディスク
ドライブはマスタディスクドライブの保存データのコピ
ーを保有し、各ファイルがどのペアの何処に保存されい
るかのファイル情報を保持し、更に各ペアの各ディスク
ドライブについて、アクセス負荷情報を保持し、ファイ
ルアクセス要求を受領すると、該ファイル情報を利用し
て、当該ファイルを保有するディスクドライブのペアを
決定し、該アクセス負荷情報を利用して、該ペア内のア
クセスすべきディスクドライブを判定することを含む。As another exemplary embodiment, a method for optimizing data access is presented. In this method, multiple disk drives are paired to store multiple files, each pair consists of one master disk drive and one or more mirror disk drives, and each mirror disk drive is a master disk drive. Having a copy of the saved data, holding file information of where each file is saved in which pair, and further holding access load information for each disk drive of each pair, when a file access request is received, Using the file information, a pair of disk drives holding the file is determined, and the access load information is used to determine a disk drive to be accessed in the pair.

【００２３】他の効果は、以降の開示により明確にな
る。図面と、請求範囲を含めて、本明細書の残りの部分
を参照する事により、本発明の他の機能、効果が明確に
なる。更に本発明の機能、効果が、多様な実施例の構成
と動作とともに以下の図面に関連して詳細に記述する。
本図面では、同様な参照番号は、同一又は機能的に類似
な要素を示す。Other effects will be clarified by the following disclosure. Other functions and effects of the present invention will become apparent by referring to the drawings and the rest of the specification including the claims. Further, the functions and effects of the present invention will be described in detail with reference to the following drawings along with the configurations and operations of various embodiments.
In the drawings, like reference numbers indicate identical or functionally similar elements.

【００２４】[0024]

【発明の実施の形態】本出願は、U.S.Patent Applicati
on Serial No.09/606,403,filed on June 30,2002,enti
tled “Continuos Update of Data in a Data Server S
ystem” by Kodama et al., and U.S.Patent Applicati
on Serial No.09/815,494, filed onMarch 21,2001,ent
itled ”Multiple Processor Data Processing System
WithMirrored Data for Distributed Access” by Koda
ma et al.からの一部継続出願である。本件はひとえに
参考目的の為に記す。DETAILED DESCRIPTION OF THE INVENTION The present application is US Patent Applicati
on Serial No.09 / 606,403, filed on June 30,2002, enti
tled “Continuos Update of Data in a Data Server S
ystem ”by Kodama et al., and USPatent Applicati
on Serial No.09 / 815,494, filed onMarch 21,2001, ent
itled ”Multiple Processor Data Processing System
WithMirrored Data for Distributed Access ”by Koda
It is a partial continuation application from ma et al. This matter is provided solely for reference purposes.

【００２５】以下、本発明の多様な実施例を説明する。
図３及び図４は本発明の二つの典型的な実施例であるシ
ステム１０を説明する簡略図である。図３は一つのスト
レージシステム１２を含むシステム１０の第一の典型的
な実施例である。図４は複数のストレージシステム１４
ａ、１４ｂ及び１４ｃを含むシステム１０の第二の典型
的な実施例である。図３及び図４で示す様に、システム
１０は更に複数のホストコンピュータ１６ａ、１６ｂ及
び１６ｃを含む。Hereinafter, various embodiments of the present invention will be described.
3 and 4 are simplified diagrams illustrating a system 10, which is two exemplary embodiments of the present invention. FIG. 3 is a first exemplary embodiment of a system 10 including one storage system 12. FIG. 4 shows a plurality of storage systems 14.
2 is a second exemplary embodiment of system 10 including a, 14b and 14c. As shown in FIGS. 3 and 4, system 10 further includes a plurality of host computers 16a, 16b and 16c.

【００２６】図３でわかる様に、各ホストコンピュータ
は、更に、一つ以上のアプリケーション１８及びクラス
タファイルシステム２０を含む。例えば、アプリケーシ
ョン１８は、ホストコンピュータ１６上で走行するビデ
オサーバである。その場合、アプリケーション１８は、
クラスタファイルシステム２０を通してデータを読み出
す。クラスタファイルシステム２０は典型的には、ホス
トコンピュータ１６上のオペレーテイングシステム内に
展開される。As can be seen in FIG. 3, each host computer further includes one or more applications 18 and a cluster file system 20. For example, the application 18 is a video server running on the host computer 16. In that case, the application 18
Data is read through the cluster file system 20. Cluster file system 20 is typically deployed in an operating system on host computer 16.

【００２７】クラスタファイルシステム２０の一つの機
能は、複数のホストコンピュータ１６間でデータの共有
を調整することである。更にクラスタファイルシステム
２０は二つの構成要素、すなわち、一つのCFSサーバと
一つ以上のCFSクライアントを含む。CFSサーバはファイ
ルシステムのメタデータ情報を管理、制御する専用サー
バである。加えて後に説明する様に、CFSサーバはデー
タ入出力（I/O）の負荷バランス機能を果たす。CFSクラ
イアントはアプリケーション１８からのI/O要求を実行
する。CFSクライアントは要求ファイルがストレージシ
ステム１２内のディスクドライブ内のどこに存在するか
を示すファイルアロケーションリストを得る為に、CFS
サーバと交信する。図３に示される通り、ホストコンピ
ュータ１６ａはCFSサーバを含み、ホストコンピュータ
１６ｂ及び１６ｃは各々CFSクライアントを含む。One function of the cluster file system 20 is to coordinate data sharing among multiple host computers 16. Furthermore, the cluster file system 20 includes two components, one CFS server and one or more CFS clients. The CFS server is a dedicated server that manages and controls the metadata information of the file system. In addition, as will be explained later, the CFS server performs a data input / output (I / O) load balancing function. The CFS client executes the I / O request from the application 18. The CFS client uses CFS to get a file allocation list showing where the requested files reside in the disk drives in the storage system 12.
Communicate with the server. As shown in FIG. 3, host computer 16a includes a CFS server and host computers 16b and 16c each include a CFS client.

【００２８】更に、図３に示される通り、単一ストレー
ジシステム１２は一つのディスクコントローラ及び複数
の物理ディスクドライブを含む。しかしながら、各スト
レージシステム１２又は１４は一つ以上のディスクコン
トローラを保有する可能性があることに注意が必要であ
る。更に又、ストレージシステム１２又は１４はディス
クドライブを保有しているよう記述されているが、各ス
トレージシステムは他のタイプのストレージ装置、例え
ばCD-ROM、DVD 又は他のタイプの装置を含む可能性があ
ることにも注意が必要である。Further, as shown in FIG. 3, the single storage system 12 includes one disk controller and multiple physical disk drives. However, it should be noted that each storage system 12 or 14 may have more than one disk controller. Furthermore, although storage system 12 or 14 is described as having a disk drive, each storage system may include other types of storage devices, such as CD-ROMs, DVDs or other types of devices. It is also necessary to note that there is.

【００２９】ディスクコントローラは更に二つの要素よ
り構成される。一つはI/Oプロセッサであり、もう一つ
は同期デーモンである。I/Oプロセッサはホストコンピ
ュータ１６からのI/O要求を処理し、同期デーモンはデ
ータミラーリングに関連する機能を実行する。The disk controller is further composed of two elements. One is an I / O processor and the other is a synchronization daemon. The I / O processor handles I / O requests from the host computer 16 and the sync daemon performs the functions associated with data mirroring.

【００３０】ペアはデータミラーリングを果たすディス
クドライブのグループとして定義される。ペア内の一つ
はマスタディスクドライブと呼ばれ、他はミラーディス
クドライブと呼ばれる。ホストコンピュータ１６とスト
レージシステム１２はペアとその構成ディスクドライブ
についての情報を共有する。ディスクドライブは、属す
るストレージシステム１２のIDとシステム１２内のディ
スクドライブのIDで指定される。これらはそれぞれSS I
D及びVol IDと呼ばれる。各ペアのID情報を維持する為
にペア構成テーブルが使用される。図５にペア構成テー
ブルの一例を示す。A pair is defined as a group of disk drives that perform data mirroring. One in the pair is called the master disk drive and the other is called the mirror disk drive. The host computer 16 and storage system 12 share information about the pair and its constituent disk drives. The disk drive is designated by the ID of the storage system 12 to which it belongs and the ID of the disk drive in the system 12. These are SS I
Called D and Vol ID. A pair configuration table is used to maintain ID information for each pair. FIG. 5 shows an example of the pair configuration table.

【００３１】図５でわかる様に、ペア構成テーブルで
は、ペアはペア１、ペア２のようなある名前で識別され
る。各ペアは、一つのマスタディスクドライブと一つ以
上のミラーディスクドライブで構成される。上述の様
に、ディスクドライブはSS ID及びVol IDで識別され
る。As shown in FIG. 5, in the pair configuration table, a pair is identified by a certain name such as pair 1 and pair 2. Each pair consists of one master disk drive and one or more mirror disk drives. As described above, the disk drive is identified by SS ID and Vol ID.

【００３２】図６はペアの例を示す。図６において、ペ
ア１は３台のディスクドライブで構成されている。本ペ
アのマスタディスクドライブは、Vol I D８を保有し、S
S ID=1のストレージシステム内に存在する。二つのミラ
ーディスクドライブはSS ID=２及びSS ID=３の異なった
ストレージシステム内に存在する。同様にペア２は二つ
のディスクドライブより構成される。図６のペア２に示
される様に、マスタディスクドライブとミラーディスク
ドライブはSS ID=1の同じストレージシステム内に存在
することも可能である。FIG. 6 shows an example of a pair. In FIG. 6, the pair 1 is composed of three disk drives. The master disk drive of this pair owns Vol I D8, S
It exists in the storage system with S ID = 1. Two mirror disk drives exist in different storage systems with SS ID = 2 and SS ID = 3. Similarly, pair 2 consists of two disk drives. As shown in pair 2 of FIG. 6, the master disk drive and the mirror disk drive can exist in the same storage system with SS ID = 1.

【００３３】一方、ミラーディスクドライブはミラー番
号で識別される。ホストコンピュータ１６がペア名とミ
ラー番号を知っていれば、この情報と、ペア構成テーブ
ルを使用してミラーディスクドライブを特定することが
できる。例えば、ペア１内のミラー番号２のディスクド
ライブは、SS ID=3のストレージシステム内に存在するV
ol ID5のディスクドライブである。ミラー番号＝０のデ
ィスクドライブはマスタディスクドライブである事に注
意しなければならない。On the other hand, the mirror disk drive is identified by the mirror number. If the host computer 16 knows the pair name and the mirror number, this information and the pair configuration table can be used to identify the mirror disk drive. For example, the disk drive with mirror number 2 in pair 1 is the V that exists in the storage system with SS ID = 3.
ol ID5 disk drive. It should be noted that the disk drive with mirror number = 0 is the master disk drive.

【００３４】クラスタファイルシステム２０について、
以下説明する。クラスタファイルシステム２０は一つの
CFSサーバと一つ以上のCFSクライアントより構成され
る。一つのホストコンピュータ１６は、一般的にはCFS
サーバ機能かCFSクライアント機能の何れかの機能を持
つ。しかし、一つのホストコンピュータ１６がCFSサー
バ機能及びCFSクライアント機能の両方の機能を同時に
そなえ得る事にも注意する必要がある。Regarding the cluster file system 20,
This will be described below. One cluster file system 20
It consists of a CFS server and one or more CFS clients. One host computer 16 is typically CFS
It has either a server function or a CFS client function. However, it should be noted that one host computer 16 can have both the CFS server function and the CFS client function at the same time.

【００３５】CFSサーバはファイルシステムのメタデー
タ情報を管理、制御する。典型的なファイルシステムは
ビットマップテーブルとｉノード情報をメタデータとし
て保有する。ビットマップテーブルはディスクドライブ
内のどのクラスターが既にファイル保存に割り当て済み
であるか、ディスクドライブ内のどのクラスタが将来フ
ァイル保存のために割り当て可能であるかを示す。ｉノ
ード情報はファイルのディスクドライブ上での位置と属
性を示す。The CFS server manages and controls the metadata information of the file system. A typical file system has a bitmap table and inode information as metadata. The bitmap table indicates which clusters in the disk drive have already been allocated for file storage, and which clusters in the disk drive can be allocated for file storage in the future. The inode information indicates the position and attribute of the file on the disk drive.

【００３６】CFSサーバはファイルアロケーションテー
ブルを維持する。図７はファイルシステム内のファイル
アロケーションテーブルの一例を示す。ファイルはファ
イル名で識別される。1ファイルは複数のディスクドラ
イブ上に保存された一組のデータである。一組のデータ
の集合は1ブロックと呼ばれる。1ブロックは1ディスク
ドライブに割り当てられる一組のデータの集合である。
1ファイルは一つ以上のブロックより構成される。図８
は１ファイルとそれの構成ブロックがいかに１ディスク
ドライブ上に配置されているかを示す。ブロックはブロ
ック＃１、ブロック＃２等の番号で識別される。The CFS server maintains a file allocation table. FIG. 7 shows an example of the file allocation table in the file system. The file is identified by the file name. A file is a set of data stored on multiple disk drives. A set of data is called a block. One block is a set of data assigned to one disk drive.
One file consists of one or more blocks. Figure 8
Indicates how one file and its constituent blocks are arranged on one disk drive. Blocks are identified by numbers such as block # 1 and block # 2.

【００３７】データの各ブロックは図８に示されるよう
に、1ディスクドライブ上に保存される。ドライブのア
ドレス空間は０から始まりディスクドライブのサイズで
終わる。データが1ディスクドライブ上に保存されると
きには、スタートポイント（アドレス０）とデータサイ
ズよりオフセットが指定される。例えば、図８のファイ
ル１のブロック＃１はディスクドライブのオフセット１
００から保存されている。ブロックサイズは固定されて
いるが、ファイルシステム構成によって変わることもあ
る。Each block of data is stored on one disk drive, as shown in FIG. The drive address space starts at 0 and ends at the size of the disk drive. When data is stored on one disk drive, the offset is specified by the start point (address 0) and the data size. For example, block # 1 of file 1 in FIG. 8 is offset 1 of the disk drive.
Stored from 00. The block size is fixed, but it may change depending on the file system configuration.

【００３８】ファイルアロケーションテーブルにはファ
イル名とその保存位置の情報を持つ。ファイルの保存位
置はブロックのリストにより識別される。ブロックはデ
ィスクドライブとオフセットにより識別される。図７を
参照すると、ファイル１のブロック＃１はSS ID=1を持
つストレージシステムに存在するVol ID８のディスクド
ライブ内のオフセット１００より保存されている。ある
ファイルが必要になったとき、CFSクライアントはCFSサ
ーバにファイルアロケーションテーブル内の当該ファイ
ルのファイルアロケーションリストを返す様に要求す
る。CFSクライアントは返送されたファイルアロケーシ
ョンリストを使用して当該ファイルに関連する読み出
し、書き込み動作を実行する。The file allocation table has information on the file name and its storage location. The storage location of the file is identified by a list of blocks. Blocks are identified by disk drive and offset. Referring to FIG. 7, the block # 1 of the file 1 is stored from the offset 100 in the disk drive of Vol ID 8 existing in the storage system having SS ID = 1. When a file is needed, the CFS client requests the CFS server to return the file allocation list for that file in the file allocation table. The CFS client uses the returned file allocation list to perform read and write operations associated with the file.

【００３９】加えてCFSサーバは、各ディスクドライブ
の使用中/不使用中を示す情報を維持する。CFSクライア
ントはファイルの使用を開始する前に、先ず、CFSサー
バに必要とするファイルアロケーションリストを送信す
る様に依頼する。それ故、CFSサーバはCFSクライアント
の活動の情報を集めるのに都合よい所に位置することと
なる。In addition, the CFS server maintains information indicating whether each disk drive is in use / not in use. The CFS client first asks the CFS server to send it the required file allocation list before starting to use the file. Therefore, the CFS server is in a convenient place to gather information about the activities of CFS clients.

【００４０】その結果、CFSサーバは様々なディスクド
ライブ間での適正な負荷バランス機能を果たす事が可能
となる。例えば、CFSサーバがCFSクライアントよりファ
イルオープン要求を受けると、CFSサーバは要求ファイ
ルを保存しているミラーディスクドライブ内の最も使用
率の少ないディスクドライブを選択する。CFSサーバは
次いで、選択されたミラーディスクドライブのファイル
オープン数を加算する。As a result, the CFS server can fulfill an appropriate load balancing function among various disk drives. For example, when the CFS server receives a file open request from a CFS client, the CFS server selects the least-used disk drive among the mirror disk drives storing the requested file. The CFS server then adds the number of file opens for the selected mirror disk drive.

【００４１】CFSサーバはCFSクライアントの活動情報を
収集する。本情報はペア使用テーブルに保存される。図
９はペア使用テーブルの一例を示す。図９に示されるペ
ア１の例で見ると、マスタディスクドライブで１００フ
ァイルがオープンされており、ミラーディスクドライブ
＃１で２００ファイル、＃２で５０ファイルがオープン
されている。本数字は各グループ内でのそれぞれのディ
スクドライブの使用状態を示している。本例の場合は、
ペア１の中では、ミラーディスクドライブ＃２が最も使
用率の低いディスクドライブである。The CFS server collects CFS client activity information. This information is stored in the pair usage table. FIG. 9 shows an example of the pair usage table. In the example of pair 1 shown in FIG. 9, 100 files are opened in the master disk drive, 200 files are opened in the mirror disk drive # 1, and 50 files are opened in the # 2. This number shows the usage status of each disk drive in each group. In the case of this example,
In the pair 1, the mirror disk drive # 2 is the disk drive with the lowest usage rate.

【００４２】CFSサーバの処理シーケンスを以降説明す
る。図１０はいかにCFSサーバが動作するかを示す。CFS
サーバは起動後、CFSクライアントからの要求に備え待
機する。主なる要求は二種類あり、ファイルオープンと
ファイルクローズの要求である。The processing sequence of the CFS server will be described below. Figure 10 shows how the CFS server works. CFS
After the server starts, it waits for the request from the CFS client. There are two main types of requests, file open and file close.

【００４３】ファイルオープンの場合は、CFSサーバは
ファイルアロケーションテーブルを用いて、要求ファイ
ルが保存されているペアのペア名を判定する。ペア使用
テーブルを用いて、CFSサーバはペア内の最も使用率の
低いディスクドライブを判定する。その後、CFSサーバ
は最も使用率の低いディスクドライブのミラー番号を添
えて、要求ファイルのファイルアロケーションリストを
要求元CFSクライアントに送信する。最後に、CFSサーバ
はペア使用テーブルの当該使用率の最も低いディスクド
ライブのファイルオープン数を加算し、ペア使用テーブ
ルの更新を行う。In the case of file open, the CFS server uses the file allocation table to determine the pair name of the pair in which the requested file is stored. Using the pair usage table, the CFS server determines the least busy disk drive in the pair. The CFS server then sends the requesting CFS client a file allocation list of requested files, with the mirror number of the least used disk drive. Finally, the CFS server updates the pair usage table by adding the file open count of the disk drive with the lowest usage rate in the pair usage table.

【００４４】図１１はCFSクライアントからのファイル
オープン要求の後、CFSサーバからCFSクライアントに返
送されるファイルアロケーションリストの典型的な構成
の一例である。図１１に示される通り、ファイルアロケ
ーションリストは、mirror_num、block_num、及びblock
_listを含む。mirror_numはCFSクライアントが使用すべ
きミラーディスクドライブを示す。そして、block_num
は要求ファイルを構成するブロック数を、block_listは
要求ファイルを構成するブロックのリストを示す。FIG. 11 shows an example of a typical structure of the file allocation list returned from the CFS server to the CFS client after the file open request from the CFS client. As shown in FIG. 11, the file allocation list includes mirror_num, block_num, and block.
Contains _list. mirror_num indicates the mirror disk drive that the CFS client should use. And block_num
Indicates the number of blocks forming the request file, and block_list indicates a list of blocks forming the request file.

【００４５】ファイルクローズの場合は、CFSサーバはC
FSクライアントからのファイルクローズ要求で指定され
るミラー番号のミラーディスクドライブのファイルオー
プン数を単純に減算する。次ぎに、CFSサーバはCFSクラ
イアントにファイルクローズ処理が完了したことを返信
する。In the case of file close, the CFS server uses C
The file open count of the mirror disk drive of the mirror number specified by the file close request from the FS client is simply subtracted. Next, the CFS server replies to the CFS client that the file close processing has been completed.

【００４６】CFSクライアントの処理シーケンスを以降
記述する。アプリケーションはファイルI/O要求を用い
て、CFSクライアントと交信する。アプリケーションが
データを必要とするとき、アプリケーションは、適切な
るファイルI/O要求をCFSクライアントに向けて発行す
る。図１２は本ファイルI/O要求の典型的な構成の一例
である。アプリケーションが発行するファイルI/O要求
には４種類のタイプがある。即ち、ファイルオープン,
ファイル書き込み,ファイル読み出し、そしてファイル
クローズである。図１３に、この４種類のファイルI/O
要求、File_Open, File_Write, File_Read 及びFile_Cl
oseを示す。タイプにより、図１２に示されるファイルI
/O要求の構成は変わる可能性がある。The processing sequence of the CFS client will be described below. Applications use file I / O requests to communicate with CFS clients. When an application needs data, it issues an appropriate file I / O request to the CFS client. FIG. 12 shows an example of a typical configuration of this file I / O request. There are four types of file I / O requests issued by applications. That is, file open,
File write, file read, and file close. Figure 4 shows these four types of file I / O.
Request, File_Open, File_Write, File_Read and File_Cl
Indicates ose. Depending on the type, the file I shown in Fig. 12
The composition of the / O request can change.

【００４７】ファイルオープンの場合、ファイルI/O要
求はオープンされるべきファイル名とモードのような追
加情報を含む。このモードは読み取り専用か、読み書き
両用か等を区別する。ファイルオープン処理完了後、CF
Sクライアントはオープンされたファイルのfile IDを要
求アプリケーションに返信する。In the case of file open, the file I / O request contains additional information such as the file name and mode to be opened. This mode distinguishes between read-only and read-write. After file open processing is completed, CF
The S client returns the file ID of the opened file to the requesting application.

【００４８】ファイル書き込みの場合、ファイルI/O要
求は、file ID、データが保存されているところのオフ
セット、保存されているデータのサイズ、及びデータそ
のものを含む。但し、本ファイルI/O要求に記されるオ
フセットはディスクドライブのアドレス空間でのオフセ
ットではなく、ファイル内でのオフセットである事に注
意が必要である。For file writing, the file I / O request includes the file ID, the offset where the data is stored, the size of the stored data, and the data itself. However, it should be noted that the offset described in this file I / O request is not the offset in the address space of the disk drive, but the offset within the file.

【００４９】ファイル読み出しの場合、ファイルI/O要
求は、file ID、アプリケーションが必要とするファイ
ルデータのオフセット、データのサイズ、及びデータを
読み込むべきバッファのアドレスを含む。In the case of reading a file, the file I / O request includes the file ID, the offset of the file data required by the application, the size of the data, and the address of the buffer to read the data.

【００５０】ファイルクローズの場合、ファイルI/O要
求はfile IDのみを含む。In the case of file close, the file I / O request contains only the file ID.

【００５１】図１４はCFSクライアントの処理操作シー
ケンスを示す。CFSクライアントはファイルI/O要求のタ
イプに従い４つのタスクを実行する。各タイプに対応し
て、CFSクライアントは対応するモジュールを呼び出
す。File_OpenはFile Open Module、File_WriteはFile
Write Module、File_Read はFile Read Module、File_C
loseはFile Close Moduleがそれぞれ対応する。要求の
タイプが４タイプのいずれにも該当しない場合、CFSク
ライアントは当該ファイルI/O要求をエラーとしてアプ
リケーションに返信する。FIG. 14 shows a processing operation sequence of the CFS client. The CFS client performs four tasks depending on the type of file I / O request. For each type, the CFS client calls the corresponding module. File_Open is File Open Module, File_Write is File
Write Module, File_Read is File Read Module, File_C
File Close Module corresponds to lose. If the request type does not correspond to any of the four types, the CFS client returns the file I / O request as an error to the application.

【００５２】図１５はCFSクライアントから呼び出され
たときのFile Open Moduleの処理シーケンスを示す。本
モジュールはファイルアロケーションリストを入手する
為に、ファイルオープン要求をCFSサーバに送信する。
本要求は要求ファイルのファイル名を特定する。CFSク
ライアントはファイルアロケーションリストを入手後、
図１６に示す通り、ファイル管理テーブルに本情報を格
納する。ファイル管理テーブルはファイルアロケーショ
ンリストの一覧（リスト）であり、各要素はfile IDと
呼ばれるインデックスにより識別される。このfile ID
はファイルオープン要求とも関連づけられている。本モ
ジュールは最終的に要求アプリケーションにfile IDを
返信する。FIG. 15 shows the processing sequence of the File Open Module when called from the CFS client. This module sends a file open request to the CFS server to get the file allocation list.
This request specifies the file name of the request file. After the CFS client gets the file allocation list,
As shown in FIG. 16, this information is stored in the file management table. The file management table is a list (list) of a file allocation list, and each element is identified by an index called a file ID. This file ID
Is also associated with a file open request. This module finally returns the file ID to the requesting application.

【００５３】図１７はCFSクライアントから呼び出され
たときのFile Write Moduleの処理シーケンスを示す。
アプリケーションからの本ファイルI/O要求は、file I
D、オフセット及びデータサイズを含む。file ID、オフ
セット及びデータサイズは、ファイルアロケーションリ
ストを用いてデータが書き込まれるブロック番号を特定
する為に使用される。データサイズ次第で、データ書き
込みに必要なブロック数は複数にわたることもあり得
る。本モジュールはファイルアロケーションリストより
マスタディスクドライブのSS ID及びVol IDを識別す
る。ここで注意すべきことがある。ペアは常に一つのマ
スタディスクドライブと一つ以上のミラーディスクドラ
イブを含み、書き込みデータは常に、マスタディスクド
ライブに書き込まれる。FIG. 17 shows the processing sequence of the File Write Module when called from the CFS client.
This file I / O request from the application is file I
Includes D, offset and data size. The file ID, offset and data size are used to specify the block number where the data is written using the file allocation list. Depending on the data size, the number of blocks required to write the data may be multiple. This module identifies the SS ID and Vol ID of the master disk drive from the file allocation list. There is one caveat here. A pair always includes one master disk drive and one or more mirror disk drives, and write data is always written to the master disk drive.

【００５４】次いで、ストレージシステム内のディスク
コントローラにより、本データは全てのミラーディスク
ドライブにコピーされることについてである。この後、
本モジュールは指定のSS IDを持つストレージシステム
に対し発行するデータI/O要求を作成する。データI/O要
求の典型的なフォーマットの一例を図１８に示す。これ
は、データI/O要求の1タイプである。ストレージシステ
ム内のVol ID、当該Vol IDを持つディスクドライブの先
頭からのオフセット、書き込みデータのサイズ及び書き
込みデータそのものを含む。データI/O要求には、図１
９に示す通り、Data_Read、Data_Writeの二種類が存在
する。オフセットはブロック番号とファイルアロケーシ
ョンリストより求められる。本モジュールは次ぎに、デ
ータI/O要求をストレージシステムに送信し、書き込み
結果を受領する。最後に本モジュールは、結果を要求ア
プリケーションに返信する。Next, this data is copied to all mirror disk drives by the disk controller in the storage system. After this,
This module creates a data I / O request issued to the storage system with the specified SS ID. FIG. 18 shows an example of a typical format of the data I / O request. This is one type of data I / O request. It includes the Vol ID in the storage system, the offset from the beginning of the disk drive having the Vol ID, the write data size, and the write data itself. The data I / O request is shown in Figure 1.
As shown in FIG. 9, there are two types, Data_Read and Data_Write. The offset is obtained from the block number and file allocation list. The module then sends a data I / O request to the storage system and receives the write result. Finally, the module returns the result to the requesting application.

【００５５】図２０はCFSクライアントから呼び出され
たときのFile Read Moduleの処理シーケンスを示す。ア
プリケーションからの本ファイルI/O要求は、file ID、
オフセット、データサイズ及びバッファアドレスを含
む。これらは、ファイルアロケーションリストを用い
て、データの読み出し位置のブロック番号を特定する為
に使用される。ペア構成テーブルを参照して、本モジュ
ールはファイルアロケーションリストより指示されるミ
ラー番号を持つミラーディスクドライブのSS IDとVol I
Dを識別する。使用するミラーディスクドライブの選択
はCFSサーバが判定する。ミラー番号０はマスタディス
クドライブである事に再度注意されたい。この後、本モ
ジュールはそのSS IDを持つストレージシステムに発行
するデータI/O要求を作成する。本モジュールは次ぎ
に、本データI/O要求をストレージシステムに送信し、
読み出し結果を受領する。本モジュールは最後に受領デ
ータを要求アプリケーションが指定したバッファにコピ
ーし、結果を要求アプリケーションに返信する。FIG. 20 shows the processing sequence of the File Read Module when called from the CFS client. This file I / O request from the application is file ID,
Includes offset, data size and buffer address. These are used to specify the block number of the data read position using the file allocation list. With reference to the pair configuration table, this module uses the SS ID and Vol I of the mirror disk drive with the mirror number specified by the file allocation list.
Identify D. The CFS server decides which mirror disk drive to use. Note again that mirror number 0 is the master disk drive. After this, this module creates a data I / O request to issue to the storage system with that SS ID. The module then sends this data I / O request to the storage system,
Receive the read result. This module finally copies the received data to the buffer specified by the requesting application and returns the result to the requesting application.

【００５６】図２１はCFSクライアントから呼び出され
たときのFile Close Moduleの処理シーケンスを示す。
本モジュールはファイルアロケーションリストにあるミ
ラー番号をもつCFSサーバにファイルクローズ要求を送
信する。CFSサーバからの返信を受信後、本モジュール
はファイル管理テーブルからファイルアロケーションリ
ストの該当エントリを削除する。最後に本モジュールは
要求アプリケーションに返信する。FIG. 21 shows the processing sequence of the File Close Module when called from the CFS client.
This module sends a file close request to the CFS server with the mirror number in the file allocation list. After receiving the reply from the CFS server, this module deletes the corresponding entry in the file allocation list from the file management table. Finally, the module returns to the requesting application.

【００５７】ストレージシステムの動作とデータのコピ
ーの方法を以降に説明する。ストレージシステムはデー
タのコピーを作成する。これを実行するのに、ストレー
ジシステムは受領したデータ書き込み要求を同一ペア内
の他のディスクドライブに送信する。この時点で、各ス
トレージシステムは二つの選択肢を持つ。一つは、スト
レージシステムが、データ書き込み要求を実行する前に
ホストコンピュータに本I/O要求の完了を返信する方法
であり、本方法は非同期書き込みと呼ばれる。もう一つ
は、ストレージシステムがデータ書き込み要求を実行し
てから、ホストコンピュータに本入出力要求の完了を返
信する方法で、本方法は同期書き込みと呼ばれる。The operation of the storage system and the method of copying data will be described below. The storage system makes a copy of the data. To do this, the storage system sends the received data write request to other disk drives in the same pair. At this point, each storage system has two options. One is a method in which the storage system returns the completion of this I / O request to the host computer before executing the data write request, and this method is called asynchronous write. The other is a method in which the storage system executes a data write request and then returns the completion of this input / output request to the host computer. This method is called synchronous write.

【００５８】非同期書き込みの場合、CFSサーバが、CFS
クライアントに読み出し対象のミラーディスクドライブ
を指定したとしても、その時点では当該ミラーディスク
ドライブにはマスタディスクドライブからの最新データ
がコピーされていないことがある。この場合は、当該要
求を受信したディスクコントローラはマスタディスクド
ライブより最新データを受け取り、ホストコンピュータ
に転送する。In the case of asynchronous writing, the CFS server
Even if the client specifies a mirror disk drive to be read, the latest data from the master disk drive may not be copied to the mirror disk drive at that time. In this case, the disk controller receiving the request receives the latest data from the master disk drive and transfers it to the host computer.

【００５９】データの読み書きシーケンスは図２２−図
２９に図式的に説明されている。図２２−図２９は、ホ
ストコンピュータとストレージシステムとの間の交信を
示す。下記の組み合わせにより発生する１０のケースを
考察する。（１）同期書き込みであるか、非同期書き込みであるか（２）読み出し要求がミラーディスクドライブに発行さ
れたとき、ミラーディスクドライブに当該データが在る
か否か（最新データがコピーされているか否か）（３）ペア内の各ディスクドライブが単一ストレージシ
ステム内か複数ストレージシステムにまたがって存在す
るか否か（４）読み出し要求であるか、書き込み要求であるか図３０は図２２−図２９と上述の各状況によりケースの
相互関係を示す。図２２は下記条件の場合を示す。（ａ）同期書き込み（ｂ）単一ストレージシステム内でのペア（ｃ）ホストコンピュータは書き込み要求を発行（ステップ１，２及び３）ホストコンピュータ内のCFS
クライアントはCFSサーバよりファイルアロケーション
リストを入手する。CFSクライアントは要求ファイルを
保存するマスタディスクドライブが存在するストレージ
システムに書き込み要求を発行する。I/Oプロセッサは
当該要求を受領し、マスタディスクドライブにデータを
書き込む。（ステップ４）同時にI/Oプロセッサは同じ
データをペア内の各ミラーディスクドライブに書き込
む。この後、I/Oプロセッサは当該I/Oの終了をホストコ
ンピュータに送信する。The data read / write sequence is schematically described in FIGS. 22-29 show communication between the host computer and the storage system. Consider the 10 cases that occur due to the following combinations: (1) Whether it is synchronous writing or asynchronous writing. (2) Whether the relevant data exists in the mirror disk drive when a read request is issued to the mirror disk drive (whether the latest data has been copied or not). (3) Whether each disk drive in the pair exists in a single storage system or across multiple storage systems. (4) Whether it is a read request or a write request. 29 and the above-mentioned situations show the interrelationship between the cases. FIG. 22 shows the case of the following conditions. (A) Synchronous write (b) Pair in a single storage system (c) Host computer issues write request (steps 1, 2 and 3) CFS in host computer
The client gets the file allocation list from the CFS server. The CFS client issues a write request to the storage system that has the master disk drive that stores the requested file. The I / O processor receives the request and writes the data to the master disk drive. (Step 4) At the same time, the I / O processor writes the same data to each mirror disk drive in the pair. After this, the I / O processor sends the end of the I / O to the host computer.

【００６０】図２３は下記条件でのケースを示す。（ａ）非同期書き込み（ｂ）単一ストレージシステム内でのペア（ｃ）ホストコンピュータは書き込み要求を発行（ステップ１及び２）ホストコンピュータ内のCFSクラ
イアントはCFSサーバよりファイルアロケーションリス
トを入手する。（ステップ３）CFSクライアントは要求
ファイルを保存するマスタディスクドライブが存在する
ストレージシステムに書き込み要求を発行する。I/Oプ
ロセッサは当該要求を受領し、マスタディスクドライブ
にデータを書き込む。この後、I/Oプロセッサは当該I/O
の終了をホストコンピュータに送信する。（ステップ
４）書き込まれたデータは同期デーモンにより、非同期
にペア内のミラーディスクドライブにコピーされる。
図２４は下記条件でのケースを示す。（ａ）同期書き込み（ｂ）単一ストレージシステム内でのペア（ｃ）ホストコンピュータは読み出し要求を発行又は、（ａ）非同期書き込み（ｂ）単一ストレージシステム内でのペア（ｃ）ホストコンピュータは読み出し要求を発行（ｄ）データは指定ミラーディスクドライブに存在（最
新データがミラーディスクドライブにコピーされてい
る）（ステップ１，２及び３）ホストコンピュータ内のCFS
クライアントはCFSサーバよりファイルアロケーション
リストを入手する。CFSクライアントは要求ファイルを
保存するペアのミラーディスクドライブが存在するスト
レージシステムに読み出し要求を発行する。I/Oプロセ
ッサは当該要求を受領し、ミラーディスクドライブより
要求データを読み出す。（ステップ４）この後、I/Oプ
ロセッサは読み出しデータと当該I/Oの終了をホストコ
ンピュータに送信する。FIG. 23 shows a case under the following conditions. (A) Asynchronous write (b) Pair in single storage system (c) Host computer issues write request (steps 1 and 2) The CFS client in the host computer obtains the file allocation list from the CFS server. (Step 3) The CFS client issues a write request to the storage system in which the master disk drive that stores the requested file exists. The I / O processor receives the request and writes the data to the master disk drive. After this, the I / O processor
Is sent to the host computer. (Step 4) The written data is asynchronously copied to the mirror disk drive in the pair by the synchronization daemon.
FIG. 24 shows a case under the following conditions. (A) Synchronous writing (b) Pair in a single storage system (c) Host computer issues read request, or (a) Asynchronous writing (b) Pair in single storage system (c) Host computer Issue read request (d) Data exists in designated mirror disk drive (latest data is copied to mirror disk drive) (Steps 1, 2 and 3) CFS in host computer
The client gets the file allocation list from the CFS server. The CFS client issues a read request to the storage system that has the pair of mirror disk drives that stores the requested file. The I / O processor receives the request and reads the requested data from the mirror disk drive. (Step 4) After that, the I / O processor sends the read data and the end of the I / O to the host computer.

【００６１】図２５は下記条件でのケースを示す。（ａ）非同期書き込み（ｂ）単一ストレージシステム内でのペア（ｃ）ホストコンピュータは読み出し要求を発行（ｄ）データが指定ミラーディスクドライブに“存在し
ない”（最新データがミラーディスクドライブにまだコ
ピーされていない）（ステップ１，２及び３）ホストコンピュータ内のCFS
クライアントはCFSサーバよりファイルアロケーション
リストを入手する。CFSクライアントは要求ファイルを
保存するペアのミラーディスクドライブが存在するスト
レージシステムに読み出し要求を発行する。（ステップ
４）I/Oプロセッサは当該要求を受領し、指定ミラーデ
ィスクドライブは最新データを保有していないことを検
出し、要求データをペアのマスタディスクドライブより
読み出す。（ステップ５）この後、I/Oプロセッサは読
み出しデータと当該I/Oの終了をホストコンピュータに
送信する。FIG. 25 shows a case under the following conditions. (A) Asynchronous write (b) Pair within a single storage system (c) Host computer issues read request (d) Data "does not exist" in designated mirror disk drive (latest data still copied to mirror disk drive) Not done) (Steps 1, 2 and 3) CFS in host computer
The client gets the file allocation list from the CFS server. The CFS client issues a read request to the storage system that has the pair of mirror disk drives that stores the requested file. (Step 4) The I / O processor receives the request, detects that the designated mirror disk drive does not have the latest data, and reads the requested data from the master disk drive of the pair. (Step 5) After that, the I / O processor sends the read data and the end of the I / O to the host computer.

【００６２】図２６は下記条件でのケースを示す。（ａ）同期書き込み（ｂ）ペアは複数ストレージシステムにまたがって存在（ｃ）ホストコンピュータは書き込み要求を発行（ステップ１、２及び３）ホストコンピュータ内のCFS
クライアントはCFSサーバよりファイルアロケーション
リストを入手する。CFSクライアントは要求ファイルを
保存するマスタディスクドライブが存在するストレージ
システムに書き込み要求を発行する。I/Oプロセッサは
当該要求を受領し、マスタディスクドライブにデータを
書き込む。（ステップ４）同時に、I/Oプロセッサは同
じデータを当該ペアの各ミラーディスクドライブに、異
なるストレージシステムへの通信パスを通して書き込
む。この後、I/Oプロセッサは当該I/Oの終了をホストコ
ンピュータに送信する。FIG. 26 shows a case under the following conditions. (A) Synchronous write (b) Pair exists across multiple storage systems (c) Host computer issues write request (steps 1, 2 and 3) CFS in host computer
The client gets the file allocation list from the CFS server. The CFS client issues a write request to the storage system that has the master disk drive that stores the requested file. The I / O processor receives the request and writes the data to the master disk drive. (Step 4) At the same time, the I / O processor writes the same data to each mirror disk drive of the pair through a communication path to a different storage system. After this, the I / O processor sends the end of the I / O to the host computer.

【００６３】図２７は下記条件でのケースを示す。（ａ）非同期書き込み（ｂ）ペアは複数ストレージシステムにまたがって存在（ｃ）ホストコンピュータは書き込み要求を発行（ステップ１及び２）ホストコンピュータ内のCFSクラ
イアントはCFSサーバよりファイルアロケーションリス
トを入手する。（ステップ３）CFSクライアントは要求
ファイルを保存するマスタディスクドライブが存在する
ストレージシステムに書き込み要求を発行する。I/Oプ
ロセッサは当該要求を受領し、マスタディスクドライブ
にデータを書き込む。この後、I/Oプロセッサは当該I/O
の終了をホストコンピュータに送信する。（ステップ
４）書き込まれたデータは同期デーモンにより、異なる
ストレージシステムへの通信パスを通して、非同期にペ
ア内の各ミラーディスクドライブにコピーされる。FIG. 27 shows a case under the following conditions. (A) Asynchronous write (b) Pair exists across multiple storage systems (c) The host computer issues a write request (steps 1 and 2) The CFS client in the host computer obtains the file allocation list from the CFS server. (Step 3) The CFS client issues a write request to the storage system in which the master disk drive that stores the requested file exists. The I / O processor receives the request and writes the data to the master disk drive. After this, the I / O processor
Is sent to the host computer. (Step 4) The written data is asynchronously copied by the synchronization daemon to each mirror disk drive in the pair through a communication path to a different storage system.

【００６４】図２８は下記条件でのケースを示す。（ａ）同期書き込み（ｂ）ペアは複数ストレージシステムにまたがって存在（ｃ）ホストコンピュータは読み出し要求を発行又は、（ａ）非同期書き込み（ｂ）ペアは複数ストレージシステムにまたがって存在（ｃ）ホストコンピュータは読み出し要求を発行（ｄ）データは指定ミラーディスクドライブに存在（最
新データがミラーディスクドライブにコピーされてい
る）（ステップ１及び２）ホストコンピュータ内のCFSクラ
イアントはCFSサーバよりファイルアロケーションリス
トを入手する。（ステップ３）CFSクライアントは要求
ファイルを保存するペアのミラーディスクドライブが存
在するストレージシステムに読み出し要求を発行する。
I/Oプロセッサは当該要求を受領し、ミラーディスクド
ライブよりデータを読み出す。（ステップ４）この後、
I/Oプロセッサは読み出しデータと当該I/Oの終了をホス
トコンピュータに送信する。FIG. 28 shows a case under the following conditions. (A) Synchronous write (b) Pair exists across multiple storage systems (c) Host computer issues read request or (a) Asynchronous write (b) pair exists across multiple storage systems (c) Host Computer issues read request (d) Data exists in designated mirror disk drive (latest data is copied to mirror disk drive) (Steps 1 and 2) CFS client in host computer sends file allocation list from CFS server Obtain. (Step 3) The CFS client issues a read request to the storage system that has a pair of mirror disk drives for storing the requested file.
The I / O processor receives the request and reads the data from the mirror disk drive. (Step 4) After this,
The I / O processor sends the read data and the end of the I / O to the host computer.

【００６５】図２９は下記条件でのケースを示す。（ａ）非同期書き込み（ｂ）ペアは複数ストレージシステムにまたがって存在（ｃ）ホストコンピュータは読み出し要求を発行（ｄ）データは指定ミラーディスクドライブに“存在し
ない”（最新データがミラーディスクドライブにまだコ
ピーされていない）（ステップ１及び２）ホストコンピュータ内のCFSクラ
イアントはCFSサーバよりファイルアロケーションリス
トを入手する。（ステップ３）CFSクライアントは要求
ファイルを保存するペアのミラーディスクドライブが存
在するストレージシステムに読み出し要求を発行する。
（ステップ４）I/Oプロセッサは当該要求を受領し、指
定ミラーディスクドライブは当該データを保有していな
いことを検出し、要求データを異なるストレージシステ
ムへの通信パスを通して、ペアのマスタディスクドライ
ブから読み出す。（ステップ５）この後、I/Oプロセッ
サは読み出しデータと当該I/Oの終了をホストコンピュ
ータに送信する。FIG. 29 shows a case under the following conditions. (A) Asynchronous write (b) Pair exists across multiple storage systems (c) Host computer issues read request (d) Data does not exist in designated mirror disk drive (latest data still exists in mirror disk drive) (Not copied) (Steps 1 and 2) The CFS client in the host computer obtains the file allocation list from the CFS server. (Step 3) The CFS client issues a read request to the storage system that has a pair of mirror disk drives for storing the requested file.
(Step 4) The I / O processor receives the request, detects that the designated mirror disk drive does not hold the data, and sends the requested data from the paired master disk drive through the communication path to the different storage system. read out. (Step 5) After that, the I / O processor sends the read data and the end of the I / O to the host computer.

【００６６】ストレージシステム１２の他の構成要素を
以降に述べる。ストレージシステム内の同期デーモン
は、ミラー内の各クラスターが保有するデータの有効、
無効性を示すビットマップテーブルを管理する。無効デ
ータとは、マスタディスクドライブの当該クラスターの
データが未だミラーディスクドライブの対応するクラス
ターにコピーされていないことを意味する。クラスター
は同期デーモンがコピー時に扱うデータの処理単位であ
る。ディスクドライブのアドレス空間はクラスターのサ
イズで分割されている。クラスターのサイズはブロック
と同じか、又はシステム設計により、1クラスターが複
数のブロックで構成される場合もある。クラスターはク
ラスター＃１、クラスター＃２、等の番号で順序づけさ
れる。Other components of the storage system 12 will be described below. The sync daemon in the storage system is responsible for validating the data held by each cluster in the mirror.
Manages a bitmap table that indicates invalidity. The invalid data means that the data of the cluster of the master disk drive has not been copied to the corresponding cluster of the mirror disk drive. A cluster is a processing unit of data that the synchronization daemon handles during copying. The disk drive address space is divided by the size of the cluster. The size of the cluster is the same as the block, or one cluster may consist of multiple blocks depending on the system design. The clusters are ordered by number such as cluster # 1, cluster # 2, and so on.

【００６７】図３１はビットマップテーブルの一例を示
す。本例は二つのペア、ペア１及びペア２を示し、ペア
１はN台のミラーディスクドライブを持つ。既に述べた
様に、ディスクドライブはSS IDとVol IDで識別され
る。ペア１のミラーディスクドライブ１のクラスター＃
１は本テーブルでは有効である。これは、ミラーディス
クドライブ１のクラスター＃１はマスタディスクドライ
ブの対応するクラスターと同じデータを保持しているこ
とを意味する。これに対して、ペア１のミラーディスク
ドライブ１のクラスター＃２は本テーブルでは無効であ
る。これは、ミラーディスクドライブ１のクラスター＃
２はマスタディスクドライブデータのミラーディスクド
ライブ１のクラスター＃２へのコピーがまだ終了してい
ないことを意味する。FIG. 31 shows an example of the bitmap table. This example shows two pairs, pair 1 and pair 2, with pair 1 having N mirror disk drives. As mentioned earlier, disk drives are identified by SS ID and Vol ID. Cluster for mirror disk drive 1 in pair #
1 is valid in this table. This means that cluster # 1 of mirror disk drive 1 holds the same data as the corresponding cluster of the master disk drive. On the other hand, the cluster # 2 of the mirror disk drive 1 of the pair 1 is invalid in this table. This is a cluster of mirror disk drive #
2 means that copying of the master disk drive data to the cluster # 2 of the mirror disk drive 1 has not been completed yet.

【００６８】ビットマップテーブルを調べることによ
り、同期デーモンはマスタディスクドライブとミラーデ
ィスクドライブ間のデータの不一致を検出することが可
能であり、それ故、マスタディスクドライブから適切な
ミラーディスクドライブへのデータのコピーを実行する
ことができる。典型的にはストレージシステム内のディ
スクコントローラは、プロセッサとメモリとI/Oコント
ローラで構成されるコンピュータである。ディスクコン
トローラは、他の機能に加えて、二つのタイプのタスク
を実行する。つまり、I/Oプロセッサと同期デーモンの
機能を果たす。これらの機能について以下記述する。By examining the bitmap table, the sync daemon can detect data inconsistencies between the master disk drive and the mirror disk drive, and thus the data from the master disk drive to the appropriate mirror disk drive. Can be performed. The disk controller in the storage system is typically a computer including a processor, a memory, and an I / O controller. Disk controllers perform two types of tasks, in addition to other functions. That is, it functions as an I / O processor and a synchronization daemon. These functions are described below.

【００６９】I/Oプロセッサはホストコンピュータから
のデータI/O要求を処理する。図１８に示されるよう
に、I/OプロセッサがデータI/O要求を受け取ると、デー
タI/O要求のタイプに応じてData Write Module又は Dat
a Read Moduleのモジュールを呼び出す。もしタイプがD
ata_Read、Data_WriteのいずれでもないときはデータI/
Oエラーをホストコンピュータに返信する。図３２はI/O
プロセッサの動作シーケンスを示す。The I / O processor processes data I / O requests from the host computer. As shown in FIG. 18, when the I / O processor receives the data I / O request, the Data Write Module or Dat is sent depending on the type of the data I / O request.
a Call the module of Read Module. If type is D
If neither ata_Read nor Data_Write, data I /
O Return an error to the host computer. Figure 32 shows I / O
The operation sequence of a processor is shown.

【００７０】図３３は同期書き込みに対するData Write
Moduleの動作シーケンスを示す。本モジュールはI/Oプ
ロセッサから呼び出される。本データ書き込みモジュー
ルはディスクドライブのオフセットにデータを書き込
む。データ、ディスクドライブのオフセット及びVol ID
はデータI/O要求で特定される。当該ディスクドライブ
が何らかのペアを構成しておれば、本モジュールは同じ
データI/O要求をペア内の各ミラーディスクドライブに
送信する。もしいくつかのミラーディスクドライブが異
なるストレージシステム内に存在する場合は、本モジュ
ールは通信パスを通じ適切なストレージシステムに要求
を送信する。本モジュールは次いで、ホストコンピュー
タに返信する。そのディスクドライブがいかなるペアに
も属していない場合、本モジュールはホストコンピュー
タに返信のみ行う。FIG. 33 shows Data Write for synchronous writing.
The operation sequence of Module is shown. This module is called from the I / O processor. The data writing module writes data to the offset of the disc drive. Data, disk drive offset and Vol ID
Is specified in the data I / O request. If the disk drives are in any pair, this module will send the same data I / O request to each mirror disk drive in the pair. If several mirror disk drives are in different storage systems, this module will send the request to the appropriate storage system through the communication path. The module then returns to the host computer. If the disk drive does not belong to any pair, this module will only reply to the host computer.

【００７１】図３４は非同期書き込みに対するData Wri
te Moduleの動作シーケンスを示す。本モジュールはI/O
プロセッサから呼び出される。本モジュールはディスク
ドライブのオフセットにデータを書き込む。データ、オ
フセット及びVol IDはデータI/O要求で特定される。デ
ィスクドライブが何らかのペアを構成しておれば、本モ
ジュールはペア内の全てのミラーディスクドライブの対
応クラスター番号に無効フラグをセットする。本モジュ
ールは次いで、ホストコンピュータに返信する。そのデ
ィスクドライブがいかなるペアにも属していない場合、
本モジュールはホストコンピュータに返信のみ行う。FIG. 34 shows Data Wri for asynchronous writing.
The operation sequence of te Module is shown. This module is I / O
Called by the processor. This module writes the data to the offset of the disk drive. The data, offset and Vol ID are specified in the data I / O request. If the disk drives form any pair, this module will set the invalid flag to the corresponding cluster number of all mirror disk drives in the pair. The module then returns to the host computer. If the disk drive does not belong to any pair,
This module only returns to the host computer.

【００７２】図３５は同期書き込みに対するData Read
Moduleの動作シーケンスを示す。本モジュールはI/Oプ
ロセッサから呼び出される。本モジュールはディスクド
ライブのオフセットからデータを読み込む。ディスクド
ライブのオフセット及びVolIDはデータI/O要求で特定さ
れる。読み込まれたデータは次いでホストコンピュータ
に送信される。同期書き込みの場合は、全てのミラーデ
ィスクドライブは常に最新のデータを保有する。FIG. 35 shows Data Read for synchronous writing.
The operation sequence of Module is shown. This module is called from the I / O processor. This module reads data from the offset of the disk drive. The disk drive offset and VolID are specified by the data I / O request. The read data is then transmitted to the host computer. In the case of synchronous writing, all mirror disk drives always hold the latest data.

【００７３】図３６は非同期書き込みに対するData Rea
d Moduleの動作シーケンスを示す。本モジュールはI/O
プロセッサから呼び出される。本モジュールはデータI/
O要求で特定されたディスクドライブがペアに属するか
否かをチェックする。属する場合は、本モジュールは当
該ペアのマスタディスクドライブを有するストレージシ
ステムに無効チェック要求（無効であるか否か判定する
チェック）を送信する。本送信対象のストレージシステ
ムは同一又は異なるストレージシステムいずれでも良
い。ストレージシステムからの返信が無効でないことを
示した場合(即ち、ミラードライブ内のデータは最新の
ものである)、本モジュールはデータI/O要求で特定され
たディスクドライブのオフセットからデータを読み出す
のみである。そして最終的にその読み込みデータをホス
トコンピュータに送信する。ストレージシステムからの
返信が無効であることを示した場合、本モジュールは、
返信に含まれるデータをホストコンピュータに送信す
る。このとき、マスタディスクドライブを有するストレ
ージシステムはその返信で最新データを送信することに
注意されたい。そのディスクドライブがいかなるペアに
も属さない場合、本モジュールはデータI/O要求で特定
されたディスクドライブのオフセットからデータを読み
出すのみである。そして最終的にその読み込みデータを
ホストコンピュータに送信する。FIG. 36 shows Data Rea for asynchronous writing.
d Module operation sequence is shown. This module is I / O
Called by the processor. This module uses data I /
Check if the disk drive specified in the O request belongs to a pair. If it belongs, this module sends an invalidation check request (check for determining whether it is invalid) to the storage system having the master disk drive of the pair. The storage systems subject to this transmission may be the same or different storage systems. If the response from the storage system indicates that it is not invalid (that is, the data in the mirror drive is the latest), this module will only read the data from the offset of the disk drive specified in the data I / O request. Is. Finally, the read data is transmitted to the host computer. If the reply from the storage system is invalid, this module will
Send the data contained in the reply to the host computer. Note that the storage system with the master disk drive then sends the latest data in its reply. If the disk drive does not belong to any pair, this module will only read the data from the disk drive offset specified in the data I / O request. Finally, the read data is transmitted to the host computer.

【００７４】同期デーモンは非同期書き込みでマスタデ
ィスクドライブの最新データをペア内の全てのミラーデ
ィスクドライブにコピーする処理を行う。図３７は同期
デーモンの処理シーケンスを示す。同期デーモンは起動
されると、I/Oプロセッサからの無効チェック要求があ
るか否かをチェックする。無効チェック要求があった場
合、Check Invalid Moduleを呼び出す。、同期デーモン
はビットマップテーブル中に無効フラグを持つクラスタ
ーが存在するか否かをチェックする。そのような無効フ
ラグを持つクラスターが存在する場合、同期デーモンは
マスタディスクドライブの対応するクラスターのデータ
を読み込み、当該ペアの各ミラーディスクドライブにそ
の読み込みデータを書き込む。いくつかのミラーディス
クドライブが異なるストレージシステムに存在する場合
は、同期デーモンは別通信パスを経由し適切なストレー
ジシステムと通信し、データを書き込む。The synchronous daemon performs the process of copying the latest data of the master disk drive to all the mirror disk drives in the pair by asynchronous writing. FIG. 37 shows the processing sequence of the synchronization daemon. When the sync daemon is started, it checks whether there is an invalid check request from the I / O processor. If there is an invalid check request, call Check Invalid Module. , The sync daemon checks if there is a cluster with invalid flag in the bitmap table. If a cluster with such an invalid flag exists, the sync daemon will read the data of the corresponding cluster of the master disk drive and write the read data to each mirror disk drive of the pair. If several mirror disk drives are in different storage systems, the sync daemon will communicate with the appropriate storage system via another communication path and write the data.

【００７５】図３８はCheck Invalid Moduleが同期デー
モンより呼び出された時の処理シーケンスを示す。本モ
ジュールは呼び出されると、ビットマップテーブル中の
特定されたミラーディスク中の対応するクラスター上に
無効フラグを持つものが存在するか否かをチェックす
る。無効フラグが存在しない場合、有効フラグを付けて
要求を出したI/Oプロセッサに返信するのみである。I/O
プロセッサが同期デーモンとは別のストレージシステム
に存在する場合、本モジュールは通信パスを経由し適切
なストレージシステムに返信する。当該クラスターが無
効状態、つまり無効フラグが存在する場合、本モジュー
ルはマスタディスクドライブの当該クラスターの最新デ
ータを読み取り、無効フラグを付けてデ−タと共にI/O
プロセッサに返信する。I/Oプロセッサはホストコンピ
ュータからのデータI/O要求への応答に本データを用い
る。FIG. 38 shows a processing sequence when the Check Invalid Module is called by the synchronization daemon. When this module is called, it checks if there is one with the invalid flag on the corresponding cluster in the specified mirror disk in the bitmap table. If the invalid flag does not exist, it simply returns the valid flag to the requesting I / O processor. I / O
If the processor is in a different storage system than the sync daemon, this module will reply to the appropriate storage system via the communication path. When the cluster is in the invalid state, that is, the invalid flag exists, this module reads the latest data of the cluster of the master disk drive, adds the invalid flag, and I / O with data.
Reply to the processor. The I / O processor uses this data in response to the data I / O request from the host computer.

【００７６】図３９が示す通り、ストレージシステム１
２はキャッシュメモリを保有することも可能である。キ
ャッシュメモリが利用できる場合は、データは直ちにデ
ィスクドライブに書き込まれる必要はない。代わりに、
データは先ず第一に、キャッシュメモリに書きこまれ、
保存される。キャッシュメモリに保存されたデータは非
同期にディスクドライブに書き込まれる。ディスクドラ
イブへの書き込みには通常１０−２０ミリ秒かかるのに
対し、キャッシュメモリへの書き込みには１０−１００
マイクロ秒で済むため、データをキャッシングすること
により、ストレージシステム１２の書き込み性能は向上
する。As shown in FIG. 39, the storage system 1
2 can also have a cache memory. If cache memory is available, the data does not have to be written to the disk drive immediately. instead of,
First of all, the data is written to the cache memory,
Saved. The data stored in the cache memory is asynchronously written to the disk drive. It usually takes 10-20 milliseconds to write to the disk drive, while 10-100 to write to the cache memory.
Since microseconds suffice, caching the data improves the write performance of the storage system 12.

【００７７】本明細書に記された例、実施例は説明を目
的としており、付記されているクレームで定義された本
発明の範囲を逸脱する事なく、変更と修正が可能である
事は言うまでもない。本明細書で引用された全ての出版
物、特許、特許出願は単に参照目的として取り上げたも
のである。It is needless to say that the examples and embodiments described in the present specification are for the purpose of explanation, and changes and modifications can be made without departing from the scope of the present invention defined by the appended claims. Yes. All publications, patents, patent applications cited herein are for reference purposes only.

【００７８】[0078]

【発明の効果】本発明は数々の効果をもたらす。例え
ば、一つの効果は、ストレージシステムそのものは、LV
MS無しでデータミラー処理を実行できることである。ス
トレージシステム内のディスクコントローラがホストコ
ンピュータから書き込み要求を受けると、ディスクコン
トローラが同じ書き込み要求を同一ペア内のミラーディ
スクドライブへ発行する。本処理はこれらのディスクド
ライブが同じストレージシステム内に存在するか否かに
関わらない。故にホストコンピュータのCPU負荷を減少
させることが可能となる。The present invention brings various effects. For example, one effect is that the storage system itself is LV
Data mirror processing can be executed without MS. When the disk controller in the storage system receives a write request from the host computer, the disk controller issues the same write request to the mirror disk drives in the same pair. This process does not matter whether these disk drives are present in the same storage system. Therefore, it is possible to reduce the CPU load of the host computer.

【００７９】もう一つの効果は、クラスタファイルシス
テムが導入され、ホストコンピュータ間でファイルが共
有され、一つの専任サーバが各ペアのミラーディスクド
ライブの使用率を収集し、各ホストコンピュータがペア
内のどのミラーディスクドライブを使用すべきか伝達
し、負荷バランスをとることが可能となることである。
これにより、複数のホストコンピュータが同じストレー
ジシステム又は、ディスクドライブを同時に使用してし
まう事態を防ぐ。Another effect is that a cluster file system is introduced, files are shared between host computers, one dedicated server collects the usage rate of the mirror disk drive of each pair, and each host computer It is possible to tell which mirror disk drive to use and to balance the load.
This prevents multiple host computers from using the same storage system or disk drive at the same time.

[Brief description of drawings]

図１はビデオデータを保有する単一ストレージシステム
が複数のビデオサーバにより共有される典型的なシステ
ム構成を示す簡略図である。図２はLVMS(Logical Volume Management System)で管理
される複数ストレージシステム上でのディスクミラーリ
ング構成例を示す簡略図である。図３は、本発明の第一の典型的な実施例を示す簡略図で
ある。図４は、本発明の第二の典型的な実施例を示す簡略図で
ある。図５は、ペア構成テーブルの一例を示す簡略図である。図６は、ペアの一例を示す簡略図である。図７は、ファイルアロケーションテーブルの一例を示す
簡略図である。図８は、ディスクドライブでのファイルの保存状態の一
例を示す簡略図である。図９は、ペア使用テーブルの一例を示す簡略図である。図１０は、CFSサーバの処理シーケンスを示すフローチ
ャートである。図１１は、ファイルオープンがCFSクライアントにより
要求されたときのCFSサーバからCFSクライアントに返信
されるファイルアロケーションリストの典型的な一例を
示す。図１２は、ファイルI/O要求の典型的な構成例を示す図
である。図１３は、ファイルI/O要求のタイプを示す図である。図１４は、CFSクライアントの処理シーケンスを示す図
である。図１５は、File Open ModuleがCFSクライアントより呼
び出されたときの処理シーケンスを示すフローチャート
である。図１６は、ファイル管理テーブルの一例を示す図であ
る。図１７は、File Write ModuleがCFSクライアントより呼
び出されたときの処理シーケンスを示すフローチャート
である。図１８は、データI/O要求の典型的なフォーマット例を
示す図である。図１９は、データI/O要求の二つのタイプを示す図であ
る。図２０は、File Read ModuleがCFSクライアントより呼
び出されたときの処理シーケンスを示すフローチャート
である。図２１は、File Close ModuleがCFSクライアントより呼
び出されたときの処理シーケンスを示すフローチャート
である。図２２は、単一ストレージシステム内ミラーでの同期書
き込みシーケンスを示す簡略図である。図２３は、単一ストレージシステム内ミラーでの非同期
書き込みシーケンスを示す簡略図である。図２４は、単一ストレージシステム内ミラーでの読み出
しシーケンスを示す簡略図である(同期書き込み又は非
同期書き込みで最新データがミラーに存在している場
合)。図２５は、単一ストレージシステム内ミラーでの読み出
しシーケンスを示す簡略図である(非同期書き込みで最
新データがミラーに存在しない場合)。図２６は、複数ストレージシステム間ミラーでの同期書
き込みシーケンスを示す簡略図である。図２７は、複数ストレージシステム間ミラーでの非同期
書き込みシーケンスを示す簡略図である。図２８は、複数ストレージシステム間ミラーでの読み出
しシーケンスを示す簡略図である(同期書き込み又は、
非同期書き込みで最新データがミラーに存在している場
合)。図２９は、複数ストレージシステム間ミラーでの読み出
しシーケンスを示す簡略図である(非同期書き込みで最
新データがミラーに存在しない場合)。図３０は、図２２−２９の間の関係を示すテーブルであ
る。図３１は、ビットマップテーブルの一例を示す。図３２は、I/Oプロセッサの処理シーケンスを示すフロ
ーチャートである。図３３は、同期書き込みでのData Write Moduleの処理
シーケンスを示すフローチャートである。図３４は、非同期書き込みでのData Write Moduleの処
理シーケンスを示すフローチャートである。図３５は、同期書き込みに対するData Read Moduleの処
理シーケンスを示すフローチャートである。図３６は、非同期書き込みに対するData Read Moduleの
処理シーケンスを示すフローチャートである。図３７は、同期デーモンの処理シーケンスを示すフロー
チャートである。図３８は、Check Invalid Moduleの処理シーケンスを示
すフローチャートである。図３９は、ストレージシステムの一実施例を示す簡略図
である。FIG. 1 is a simplified diagram showing a typical system configuration in which a single storage system holding video data is shared by a plurality of video servers. FIG. 2 is a simplified diagram showing an example of a disk mirroring configuration on a plurality of storage systems managed by LVMS (Logical Volume Management System). FIG. 3 is a simplified diagram showing a first exemplary embodiment of the present invention. FIG. 4 is a simplified diagram showing a second exemplary embodiment of the present invention. FIG. 5 is a simplified diagram showing an example of the pair configuration table. FIG. 6 is a simplified diagram showing an example of a pair. FIG. 7 is a simplified diagram showing an example of the file allocation table. FIG. 8 is a simplified diagram showing an example of a file storage state in the disk drive. FIG. 9 is a simplified diagram showing an example of the pair usage table. FIG. 10 is a flowchart showing the processing sequence of the CFS server. FIG. 11 shows a typical example of the file allocation list returned from the CFS server to the CFS client when a file open is requested by the CFS client. FIG. 12 is a diagram showing a typical configuration example of a file I / O request. FIG. 13 is a diagram showing types of file I / O requests. FIG. 14 is a diagram showing a processing sequence of the CFS client. FIG. 15 is a flowchart showing the processing sequence when the File Open Module is called by the CFS client. FIG. 16 is a diagram showing an example of the file management table. FIG. 17 is a flowchart showing the processing sequence when the File Write Module is called by the CFS client. FIG. 18 is a diagram showing a typical format example of a data I / O request. FIG. 19 is a diagram showing two types of data I / O requests. FIG. 20 is a flowchart showing the processing sequence when the File Read Module is called by the CFS client. FIG. 21 is a flowchart showing the processing sequence when the File Close Module is called by the CFS client. FIG. 22 is a simplified diagram showing a synchronous write sequence in a mirror in a single storage system. FIG. 23 is a simplified diagram showing an asynchronous write sequence in a mirror in a single storage system. FIG. 24 is a simplified diagram showing a read sequence in the mirror in the single storage system (when the latest data exists in the mirror by synchronous writing or asynchronous writing). FIG. 25 is a simplified diagram showing a read sequence in a mirror in a single storage system (when the latest data does not exist in the mirror by asynchronous writing). FIG. 26 is a simplified diagram showing a synchronous write sequence in a mirror between a plurality of storage systems. FIG. 27 is a simplified diagram showing an asynchronous write sequence in a mirror between a plurality of storage systems. FIG. 28 is a simplified diagram showing a read sequence in a mirror between a plurality of storage systems (synchronous writing or
Asynchronous write with latest data present on mirror). FIG. 29 is a simplified diagram showing a read sequence in the mirror between a plurality of storage systems (when the latest data does not exist in the mirror by asynchronous writing). FIG. 30 is a table showing the relationship between FIGS. FIG. 31 shows an example of the bitmap table. FIG. 32 is a flowchart showing the processing sequence of the I / O processor. FIG. 33 is a flowchart showing the processing sequence of the Data Write Module in synchronous writing. FIG. 34 is a flowchart showing the processing sequence of the Data Write Module in asynchronous writing. FIG. 35 is a flowchart showing the processing sequence of the Data Read Module for synchronous writing. FIG. 36 is a flowchart showing the processing sequence of the Data Read Module for asynchronous writing. FIG. 37 is a flowchart showing the processing sequence of the synchronization daemon. FIG. 38 is a flowchart showing the processing sequence of Check Invalid Module. FIG. 39 is a simplified diagram showing an embodiment of the storage system.

[Explanation of symbols]

16a・・・ホストＡ、16b・・・ホストＢ、16c・・・ホストC 、18
・・・アプリケーション、20・・・クラスタファイル、14a・・・
ストレージシシステム、14b・・・ストレージシステム、14
c・・・ストレージシステム16a ... Host A, 16b ... Host B, 16c ... Host C, 18
・・・ Application, 20 ・・・ Cluster file, 14a ・・・
Storage system, 14b ... Storage system, 14
c ... Storage system

Claims

[Claims]

1. A system for optimizing data access, the system comprising a file server capable of communicating with one or more clients, and a plurality of storages paired to store a plurality of files. Elements, each pair consisting of one master storage element and one or more mirror storage elements, each mirror storage element holding a copy of the data stored on the master storage element, the file server The file server holds a pair of storage elements for storing each file and file information indicating a storage position within the pair, the file server further holds access load information of each pair of storage elements, and the client stores a file of one request file. When requesting information from the file server, the file server saves the request file System characterized by determining the pair of storage elements, further determines the storage element to be accessed in the pair of storage elements that store the requested file that.

2. The system according to claim 1, wherein the plurality of storage elements are a plurality of disk drives.

3. The system of claim 1, wherein the plurality of storage elements are contained within a single storage system.

4. The system of claim 1, wherein the plurality of storage elements reside in one or more storage systems.

5. The system according to claim 1, further comprising:
A system comprising a plurality of host computers, wherein the file server resides on one of the plurality of host computers and the one or more clients reside on the rest of the plurality of host computers.

6. The system of claim 1, further comprising a synchronization daemon, the synchronization daemon synchronizing data stored on each pair of storage elements.

7. The system of claim 1, wherein the master storage element and the one or more mirror storage elements in a pair reside in a single storage system.

8. The system of claim 1, wherein the master storage element and the one or more mirror storage elements in a pair reside in one or more storage systems.

9. The system of claim 8, wherein one mirror storage element is accessed for the request file and the accessed mirror storage element of the request file is stored in the corresponding master storage element. The system wherein the client reads the requested file directly from the mirror storage element if it is determined that it has the latest copy.

10. The system of claim 8, wherein one mirror storage element is accessed for said request file and said accessed mirror storage element of said request file is stored in said corresponding master storage element. If you determine that you do not have the latest copy,
A system wherein the latest copy of the requested file data stored in the corresponding master storage element is read from the corresponding master storage element and transferred to the client.

11. The system according to claim 1, wherein the file information indicating a pair of the storage elements storing each file and a storage position within the pair includes a file allocation list.

12. The system according to claim 1, wherein when determining a storage element to be accessed within a pair of storage elements storing the request file, each storage within the pair of storage elements holding the request file. A system characterized by determining that the access loads of elements are substantially balanced.

13. The system of claim 1, wherein in determining a storage element to access within a pair of storage elements storing the requested file, the file server provides the client with information related to the determination. Then
The system, wherein the client determines a read storage element based on this information and reads the request file from the storage element.

14. The system according to claim 13, wherein when the information related to the storage element determination is provided to the client, the file server secures the access balance monitoring accuracy within the pair. A system characterized by updating information.

15. A system for optimizing data access, the system communicating with a first host computer having a file system server, the first host computer communicating with a second host computer having a file system client. Yes, one storage system, which includes multiple disk drives that make up multiple pairs to accommodate multiple files, each pair consisting of one master disk drive and one or more mirror disk drives. Configured, each mirror disk drive holds a copy of the data stored on the master disk drive, and the file system server stores a pair of disk drives storing each file and file information indicating a storage location within the pair. Holds the file system The server further holds the access load information of each pair of disk drives, and when the file system client requests the file information of one request file from the file system server, the file system server holds the disk holding the request file. A system for determining a drive pair, and further determining a disk drive to be accessed in the pair of disk drives storing the request file.

16. The system according to claim 15, wherein the file information indicating a pair of the disk drives storing each file and a storage position within the pair includes a file allocation list.

17. The system according to claim 15, wherein in determining a disk drive to be accessed in a pair of disk drives holding the request file, each of the pair of disk drives holding the request file is determined. A system characterized by deciding so that the access load of the disk drive is substantially balanced.

18. The system of claim 15, wherein the file system server is associated with the file system client in determining a disk drive to access within a pair of disk drives holding the requested file. A system which supplies information, the file system client determines a read disk drive based on this information, and reads the requested file from the disk drive.

19. The system according to claim 18, wherein the file system server ensures the monitoring accuracy of the access balance within the pair when the information related to the disk drive determination is provided to the file system client. A system characterized by updating the access load information.

20. A system for optimizing data access, the system comprising: a first host computer having a file system server, the first host computer communicating with a second host computer having a file system client. A plurality of storage systems, each of the plurality of storage systems having a plurality of disk drives, including a plurality of disk drives forming a plurality of pairs from the plurality of storage systems to accommodate a plurality of files; A pair consists of one master disk drive and one or more mirror disk drives, each mirror disk drive holding a copy of the data stored on the master disk drive, and the file system server storing each file. Disk drive pair The file system server holds the file information indicating the storage location in the pair, the file system server further holds the access load information of each pair of disk drives, and the file system client transfers the file information of one request file to the file system server. When requested, the file system server determines a pair of disk drives holding the request file, and further determines a disk drive to access within the pair of disk drives storing the request file. Characterized system.

21. The system according to claim 20, wherein the file information indicating a pair of the disk drives storing each file and a storage position within the pair includes a file allocation list.

22. The system according to claim 20, wherein in determining a disk drive to be accessed in a pair of disk drives holding the request file, each of the pair of disk drives holding the request file is determined. A system characterized by deciding so that the access load of the disk drive is substantially balanced.

23. The system of claim 20, wherein the file system server is associated with the file system client in determining a disk drive to access within a pair of disk drives holding the requested file. A system which supplies information, the file system client determines a read disk drive based on this information, and reads the requested file from the disk drive.

24. The system according to claim 20, wherein the file system server ensures the accuracy of monitoring the access balance within the pair when providing the information related to the disk drive determination to the file system client. A system characterized by updating the access load information.

25. The system according to claim 20, wherein one mirror disk drive is accessed for the request file, and the requested mirror disk drive is stored in the corresponding master disk drive. The system characterized in that the file system client reads the requested file directly from the mirror disk drive when it is determined that the latest copy is held.

26. The system according to claim 20, wherein one mirror disk drive is accessed for the request file, and the mirror disk drive to be accessed is stored in the corresponding master disk drive. If it is determined that the latest copy of the requested file data stored in the corresponding master disk drive is read from the corresponding master disk drive, the file system client A system characterized by being transferred.

27. A method for optimizing data access, the method pairing a plurality of storage elements for storing a plurality of files, each pair comprising one master storage element and one master storage element. File information composed of the above mirror storage elements, each mirror storage element holding a copy of data stored in the master storage element, and holding a save pair of the plurality of files and a save position in the save pair. Holding the access load information of each pair of storage elements and receiving a request for one file, the file information is used to determine the pair of storage elements holding the requested file, Use load information to determine which storage element to access within the pair of storage elements holding the requested file A method characterized by:

28. The method of claim 27, further comprising providing information related to the determination to a client in determining a storage element to access within the pair of storage elements holding the requested file. The client determines a read storage element based on this information and reads the request file from the storage element.

29. The method of claim 28, further comprising providing the access load within the pair to ensure accuracy of monitoring access balance within the pair when providing information related to the storage element determination to the client. A method characterized by updating information.

30. The method of claim 27, wherein the plurality of storage elements reside within a single storage system.

31. The method of claim 27, wherein the plurality of storage elements reside in one or more storage systems.

32. The method of claim 31, further comprising accessing a mirror storage element for the request file, the mirror storage element being accessed being stored in the corresponding master storage element. A method comprising reading the request file directly from the mirror storage element if it is found to have the latest copy of the data in the request file.

33. The method of claim 31, further comprising accessing one mirror storage element for the request file and storing the accessed mirror storage element in the corresponding master storage element. If it is found that it does not have the latest copy of the data of the request file, the latest copy of the data of the request file stored in the corresponding master storage is read from the corresponding master storage element. how to.

34. The method of claim 27, wherein within the pair of storage elements holding the requested file, determining the storage element to access within the pair of storage elements holding the requested file. The access load of each storage element of is determined so as to be substantially balanced.

35. The method of claim 27, wherein the plurality of storage elements comprises a plurality of disk drives.

36. A method for optimizing data access between a file server and one or more clients, the method comprising pairing a plurality of disk drives for storing a plurality of files. Each pair consists of one master disk drive and one or more mirror disk drives, each mirror disk drive holding a copy of the data stored on that master disk drive, and each pair holding each file The file information indicating the position is held, the access load information of each pair of disk drives is held, the file information and the access load information are held in the file server, and the file server receives a file request from one client. And the file server holds the requested file using the file information. A pair of disk drives, the access load information is used to determine a disk drive to be accessed within the pair of disk drives holding the request file, and the pair of disk drives holding the request file. A method of providing a client with information related to the determination of a disk drive to be accessed in the client, and the client reading the request file based on this information.