JP2003208345A

JP2003208345A - Network type storage device

Info

Publication number: JP2003208345A
Application number: JP2002007423A
Authority: JP
Inventors: Seiji Kaneko; 誠司金子
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2002-01-16
Filing date: 2002-01-16
Publication date: 2003-07-25

Abstract

(57)【要約】【課題】ネットワーク型記憶装置を複数台のストレー
ジ装置にて構成する場合、障害時の可用性とコストを両
立させる構成を採ることが困難である。【解決手段】これまでネットワーク型記憶装置を複数
台のストレージ装置にて構成する場合、記憶装置を管理
するための制御情報(メタデータ)と実際のデータを特に
区別することなく配置していたが、本発明ではこの二つ
の配置制御に区別を導入し、制御情報のみの冗長性を高
めて耐障害性を改善する。上記構成を採ることにより、
大きなコスト上昇を招かず高い可用性を実現するネット
ワーク型記憶装置を実現することができる。 (57) [Problem] To configure a network-type storage device with a plurality of storage devices, it is difficult to adopt a configuration that balances availability at the time of failure and cost. So far, when a network type storage device is configured by a plurality of storage devices, control information (metadata) for managing the storage device and actual data are arranged without particular distinction. In the present invention, a distinction is introduced between these two arrangement controls, and the redundancy of control information alone is increased to improve fault tolerance. By adopting the above configuration,
A network-type storage device that achieves high availability without causing a large increase in cost can be realized.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明はネットワーク型記憶
装置の構成に関し、特に障害に対して可用性を失わず動
作可能でかつ低価格な構成を有するネットワーク型記憶
装置を実現する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a configuration of a network type storage device, and more particularly to a technique for realizing a network type storage device having a low cost configuration which can operate without loss of availability against a failure.

【０００２】[0002]

【従来の技術】ディスクアレイ装置は、複数台のディス
ク装置、すなわち複数のストレージ装置を並列にアクセ
スすることによってデータの転送性能と耐障害性を高め
ようとする RAID 技術を用いた装置である。本RAIDに関
しては、"A Case for Redundant Arrays of Inexpensiv
e Disks (RAID) "; In Proc. ACM SIGMOD, June 1988
（カリフォルニア大学バークレー校発行）に詳しい。こ
の RAID 技術を用いたディスクアレイ装置は大容量の記
憶装置として現在広く用いられている。2. Description of the Related Art A disk array device is a device that uses RAID technology to improve data transfer performance and fault tolerance by accessing a plurality of disk devices, that is, a plurality of storage devices in parallel. For this RAID, refer to "A Case for Redundant Arrays of Inexpensiv
e Disks (RAID) "; In Proc. ACM SIGMOD, June 1988
Detailed information (published by the University of California, Berkeley). A disk array device using this RAID technology is currently widely used as a large-capacity storage device.

【０００３】また、その発展形としてクラスタ構成を採
るディスクアレイ装置が知られている。従来より高性能
かつ高信頼なシステムを低価格にて構成する方式とし
て、データベース等のサーバ分野では比較的早くから複
数の独立のサーバを高速の結合網にて組み合わせてクラ
スタ構成とする技術が知られていた。これは大規模なシ
ステムの構成として、極めて高い性能の単一システムを
構築するより、単体ではやや性能の劣るシステムを多数
高速結合網で組み合わせた構成の方が、ある一定以上の
性能以上を持つシステムの場合コストに対する性能をよ
りよいものにできることが実証されてきているためであ
る。A disk array device adopting a cluster configuration is known as a development of the disk array device. As a method of constructing a high-performance and highly-reliable system at a lower cost than in the past, in the field of servers such as databases, a technology has been known from a comparatively early stage in which multiple independent servers are combined in a high-speed connection network to form a cluster configuration. Was there. As a large-scale system configuration, rather than constructing a single system with extremely high performance, a configuration in which a large number of systems with slightly inferior performance are combined in a high-speed connection network has a certain level of performance or more. This is because it has been proved that the performance of the system can be improved with respect to the cost.

【０００４】これまではディスクアレイ装置では一台の
コントローラを用いた構成が一般的であったが、最近は
ディスクアレイに置いても同様の構成が検討され始めて
いる。例えば、特許公開公報2001-27972号にそのような
ディスク装置の構成が開示されている。Up to now, a disk array device has generally been constructed using a single controller, but recently, a similar configuration has begun to be studied even if it is placed in a disk array. For example, Japanese Patent Laid-Open Publication No. 2001-27972 discloses the configuration of such a disk device.

【０００５】もう一つの本発明の前提となる従来技術と
して、ネットワーク接続型のディスクアレイ装置に関す
る技術が挙げられる。現在のディスクアレイシステムに
おいては、ネットワーク技術としてストレージアレイネ
ットワーク(以下SAN と略す)およびネットワークストレ
ージシステム(以下NAS と略す)が広く用いられている。
まず、SAN は現在標準的に使われているディスクへのイ
ンターフェースであるファイバーチャネルを用いてネッ
トワーク化する技術であり、ディスクアレイはあくまで
も通常のディスクとして扱われる。特徴としてはファイ
バーチャネル上のオーバヘッドの少ないプロトコルのた
め高速性を追求しやすいという利点がある。しかし、あ
くまでもディスクとして扱われるためディスク書き込み
排他等のクライアント接続のために必要な機能を欠くこ
と、ファイバーチャネルは通常のIPネットワークに比べ
普及度も低く、システムが高価になる。IPネットワーク
のプロトコルオーバヘッドは大きく、またファイルシス
テムを実現するためのオーバヘッドも比較的大きいため
ディスクアレイ単体として性能に比し高価格となりがち
である。Another conventional technique which is a premise of the present invention is a technique relating to a network-connected disk array device. In current disk array systems, storage array networks (hereinafter abbreviated as SAN) and network storage systems (hereinafter abbreviated as NAS) are widely used as network technologies.
First, SAN is a network technology that uses Fiber Channel, which is the interface to the disks currently used as standard, and a disk array is treated as a normal disk. As a feature, there is an advantage that it is easy to pursue high speed because of the protocol with little overhead on Fiber Channel. However, since it is treated as a disk, it lacks the functions necessary for client connection such as disk write exclusion, and Fiber Channel is less popular than ordinary IP networks and the system becomes expensive. Since the protocol overhead of the IP network is large and the overhead for realizing the file system is also relatively large, the disk array tends to be expensive compared to its performance.

【０００６】一方 NAS は汎用のファイルシステムをデ
ィスクアレイ上で実現して、それをIPネットワークに接
続してファイルサーバとして提供する方式であり、広く
普及したIPネットワークの技術を使った低価格化と、フ
ァイルシステムとして様々な機能付加が容易であること
が長所である。しかし、IPネットワークのプロトコルオ
ーバヘッドは大きく、またファイルシステムを実現する
ためのオーバヘッドも比較的大きいためディスクアレイ
単体として性能に比し高価格となりがちである。On the other hand, the NAS is a method for realizing a general-purpose file system on a disk array and connecting it to an IP network to provide it as a file server. The advantage is that it is easy to add various functions as a file system. However, the protocol overhead of the IP network is large, and the overhead for realizing the file system is also relatively large, so that the disk array itself tends to be expensive compared to the performance.

【０００７】このようなネットワーク接続型のシステム
においても上記のクラスタ型のシステムが、特に分散配
置によるデータ保全性の向上のため様々検討されてきて
いる。その代表的な一例として米国特許公報5,832,222
号にて災害対処のため地理的に複数箇所に分散配置され
たネットワーク型ディスクアレイ装置間でRAIDシステム
を構成する技術が開示されている。また米国特許公報5,
742,792号には遠隔サイトとの間でRAID1を構成するため
の複写技術が開示されている。これら公報にも記載され
ているような分散配置等によるクラスタ型ディスクアレ
イ装置構成は、専用線およびSANを前提としたシステム
であり、NAS技術を対象としたものではない。もちろ
ん、単純にネットワークに接続する技術としてそのまま
適用することはできるが、NAS装置ではファイルシステ
ムとして整合性を維持する必要があるため、対象となる
ファイルシステムの制御データ及びデータを同期して更
新していく必要があり、オーバヘッドがさらに大きくな
るという問題がある。その上、最終の更新を確実にファ
イルシステムに反映させるための時刻同期など様々な付
加機能が必要になり実現が難しい。Even in such a network connection type system, various studies have been made on the above cluster type system in order to improve data integrity particularly by distributed arrangement. As a typical example thereof, US Pat. No. 5,832,222
In JP-A-No. 1994-2003, there is disclosed a technique for configuring a RAID system among network-type disk array devices distributed geographically at a plurality of locations for disaster response. US Patent Publication 5,
No. 742,792 discloses a copying technique for configuring RAID 1 with a remote site. The cluster type disk array device configuration based on the distributed arrangement and the like as described in these publications is a system assuming a dedicated line and a SAN, and is not intended for NAS technology. Of course, it can be applied as it is as a technology for simply connecting to a network, but since it is necessary to maintain consistency as a file system in the NAS device, the control data and data of the target file system are updated synchronously. However, there is a problem that the overhead becomes larger. Besides, various additional functions such as time synchronization for surely reflecting the final update in the file system are required, which makes it difficult to realize.

【０００８】従って、NASシステムでクラスタ構成を取
るには単純にクラスタとしてのストレージシステムを構
成して大容量のファイルシステムの実現と、負荷分散に
よるパフォーマンスの改善を図るもの、またはSAN等に
より実現したクラスタ型のディスクアレイシステムのフ
ロントエンドとしてサーバを一台ないしは複数台置き、
そのサーバによりNAS機能を実現する形態をとるのが普
通である。製品として実装した例はまだないが、例えば
ブルーアーク(BlueArc)社の発行しているホワイトペー
パーには後者のバックエンドに複数のディスクシステム
を置き、複数台のサーバでディスクシステムの統合管理
とNAS機能を提供するシステムの概要が開示されてい
る。Therefore, in order to take a cluster configuration with the NAS system, a storage system as a cluster is simply configured to realize a large-capacity file system and improve performance by load balancing, or a SAN or the like. Place one or more servers as the front end of the cluster type disk array system,
Usually, the server implements the NAS function. Although it has not been implemented as a product yet, for example, in the white paper issued by BlueArc, multiple disk systems are placed in the back end of the latter, integrated management of disk systems and NAS by multiple servers. An overview of a system that provides functionality is disclosed.

【０００９】[0009]

【発明が解決しようとする課題】従来技術の項で説明し
たようにNAS型のディスクアレイシステムでは複数シス
テムを単純に結合したもの、またはSAN型のシステムの
フロントエンドとしてサーバを置く構成のものが公表さ
れている。As described in the section of the prior art, in the NAS type disk array system, a plurality of systems are simply combined, or a configuration in which a server is placed as a front end of a SAN type system is used. It has been announced.

【００１０】しかしながら、前者のシステムではファイ
ルシステムが複数台のディスクサブシステムに分かれて
配置されるためどのサブシステムの故障でも制御データ
の整合性が失われることから、ファイルシステムトータ
ルとしての耐障害性が著しく低下するという欠点があ
る。もちろんシステム全体を二重化すれば耐障害性は大
きく改善されるがコストも二倍以上に跳ね上がり、かつ
整合性を保つための二重化されたシステム間での管理情
報の通信コストも大きい。またサーバを置いて対処する
システム形態はソフトウェアオーバヘッドが大きく、ま
たファイルシステム整合性を保つためのサーバ間の通信
に伴いネットワークにかかる負荷が大きく高性能なシス
テムを構成することはきわめて難しい。[0010] However, since the former system file system integrity control data is lost in the failure of any subsystem for placement divided into multiple disk subsystems, resiliency as a file system total Is significantly reduced. Of course, if the entire system is duplicated, the fault tolerance will be greatly improved, but the cost will more than double, and the communication cost of the management information between the duplicated systems for maintaining consistency is also large. In addition, the system configuration in which a server is placed has a large software overhead, and it is extremely difficult to construct a high-performance system in which the load on the network is large due to the communication between the servers for maintaining the file system consistency.

【００１１】まず、クラスタ型のディスクアレイ装置で
は、ホストとなるサーバ側からのデータの要求をある一
台のディスクコントローラが受け、それを実際にデータ
を持つディスクアレイ装置に転送して実際のデータの読
み書きを行うことで、単なるディスクアレイ装置の集合
ではないシングルイメージのディスクアレイを実現して
いる。しかしながら、コスト及び電気的な制約からディ
スクアレイ装置間の制御情報とデータのやりとりを行う
データパスはディスクアレイ内部のパスに比べて低速な
ものを使わざるを得ず、その結果としてクラスタ型のデ
ィスクアレイ装置ではホストに接続されているインター
フェースの位置によって性能の差が発生する。特に複数
台のホストから特定のデータ(ホットスポット)に対して
アクセスが集中するような場合では、そのクラスタの一
部をなすディスクアレイ装置と、クラスタ間の結合網の
両方がネックになり、クラスタ本来の高い性能を得るこ
とができない。従来はこのような場合ではデータのコピ
ー等を置き負荷に対応するようにソフトウェア的に対処
しているが、ホットスポットが急速に変化するインター
ネットサーバのような用法ではこのような対処は困難で
ある。First, in a cluster type disk array device, one disk controller receives a data request from a server serving as a host, transfers it to the disk array device which actually has the data, and transfers the actual data. By reading and writing, a single image disk array that is not just a set of disk array devices is realized. However, due to cost and electrical restrictions, the data path for exchanging control information and data between disk array devices has to be slower than the internal path of the disk array, and as a result, a cluster type disk is used. In the array device, a difference in performance occurs depending on the position of the interface connected to the host. Especially when access to specific data (hotspots) is concentrated from multiple hosts, both the disk array device that forms part of the cluster and the coupled network between the clusters become a bottleneck. The original high performance cannot be obtained. Conventionally, in such a case, software is used to cope with the load by placing a copy of data or the like, but it is difficult to deal with such a usage in an Internet server where hot spots change rapidly. .

【００１２】もう一つの問題点として、複数のディスク
アレイ装置でクラスタを構成する場合、構成中のディス
クアレイ装置のうちの一台が障害のため故障ないし保守
停止することを考慮しなければならないが、従来技術で
開示されているクラスタ型のディスクアレイ装置ではそ
のような場合への対処が困難である。具体的には、クラ
スタを構成するディスクアレイ装置に接続されたディス
ク上のデータがすべてアクセス不能となるため、クラス
タとしてのディスクアレイ装置としては可用性の低下が
避けがたい。Another problem is that when a cluster is composed of a plurality of disk array devices, it is necessary to consider that one of the disk array devices in the configuration is out of order due to a failure or maintenance stop. It is difficult for the cluster type disk array device disclosed in the prior art to cope with such a case. Specifically, since all the data on the disks connected to the disk array devices that make up the cluster become inaccessible, it is difficult for the disk array device as a cluster to reduce the availability.

【００１３】本発明は、クラスタ構成を意識したデータ
配置を行うことによって、上記の問題を解決し、特に障
害に対して可用性を失わず動作可能でかつ低価格な構成
を有するネットワーク型記憶装置を提供することを目的
とする。The present invention solves the above problems by arranging data in consideration of the cluster configuration, and in particular, provides a network-type storage device having a low-cost configuration that can operate without loss of availability in the event of a failure. The purpose is to provide.

【００１４】[0014]

【課題を解決するための手段】本発明では、上記目的を
達成するために、ネットワークと、該ネットワークに接
続され磁気ディスクを用いた複数のストレージ装置と、
ネットワークに接続されたクライアントシステムと、該
クライアントシステムに対して提供するファイルデータ
とそれを制御するための管理データを上記ストレージ装
置に格納し、該管理データを用いて前記クライアントシ
ステムにファイルアクセス機能を提供する手段とを有す
るネットワーク型記憶装置において、前記管理データと
上記ファイルデータが異なった冗長性をもって前記複数
のストレージ装置に分散格納されているようにした。According to the present invention, in order to achieve the above object, a network and a plurality of storage devices connected to the network and using magnetic disks,
A client system connected to the network, file data provided to the client system and management data for controlling the same are stored in the storage device, and a file access function is provided to the client system using the management data. In the network-type storage device having means for providing, the management data and the file data are distributed and stored in the plurality of storage devices with different redundancy.

【００１５】[0015]

【発明の実施の形態】本発明では、ネットワーク型記憶
装置であるクラスタ型のディスク装置において、制御デ
ータのみを多重化、またはRAIDによる冗長構成を取るこ
とでNASを構成するための管理データの通信を最小限に
押さえつつ実用的な信頼性を持つクラスタ型NASシステ
ムを構成した。BEST MODE FOR CARRYING OUT THE INVENTION According to the present invention, in a cluster type disk device which is a network type storage device, communication of management data for configuring a NAS by multiplexing only control data or adopting a redundant configuration by RAID. We have constructed a cluster-type NAS system with practical reliability while keeping the minimum.

【００１６】このような構成を取ることにより、障害の
起きたデータの一部に対するアクセスはできないが、そ
の他のデータについては読み書きを通常通り行うことが
でき、ファイルシステムの整合性も失われない。By adopting such a configuration, although some of the failed data cannot be accessed, other data can be read and written normally, and the consistency of the file system is not lost.

【００１７】以下に本発明の好適な実施例を図面を参照
してさらに詳細に説明する。本実施例で前提とする本発
明のネットワーク型記憶装置であるディスクアレイ装置
の構成を図１に示す。すなわち、本発明の記憶装置は、
ネットワークと、該ネットワークに接続された磁気ディ
スクを用いた複数のストレージ装置（下記に説明するデ
ィスクアレイ制御装置と集合ディスク装置から構成され
る）と、ネットワークに接続されたクライアントシステ
ムと、該クライアントシステムに対して提供するファイ
ルデータとそれを制御するための管理データを前記スト
レージ装置に格納し、該管理データを用いて前記クライ
アントシステムにファイルアクセスする機能を提供する
手段を有し、前記管理データと前記管理データが異なっ
た冗長性をもって前記複数のストレージ装置に分散格納
されているネットワーク型記憶装置である。Preferred embodiments of the present invention will be described below in more detail with reference to the drawings. FIG. 1 shows the configuration of a disk array device which is a network type storage device of the present invention, which is a premise of this embodiment. That is, the storage device of the present invention is
A network, a plurality of storage devices (composed of a disk array control device and a collective disk device described below) using magnetic disks connected to the network, a client system connected to the network, and the client system File data to be provided to the client and management data for controlling the file data are stored in the storage device, and the management data is used to provide a function of accessing the file to the client system. A network-type storage device in which the management data is distributed and stored in the plurality of storage devices with different redundancy.

【００１８】本実施例では、４つのネットワークファイ
ルシステム対応の各々のディスクアレイ制御装置１１、
１２、１３、１４がIPネットワーク５によるローカル
エリアネットワークで接続された構成の例を示す。この
ディスクアレイ制御装置全体は内部にファイルシステム
を処理する制御装置(プロセッサ及びバッファなど)１１
１、１２１、１３１、１４１を持ち、この各ディスク
アレイ制御装置に接続されたディスク全体(またはその
一部)から構成されるファイルシステムのアクセスを制
御する機能を提供する。In this embodiment, each disk array controller 11 corresponding to four network file systems,
An example of a configuration in which 12, 13, 14 are connected by a local area network by the IP network 5 is shown. The entire disk array control device has a control device (processor, buffer, etc.) 11 for processing a file system therein.
1, 121, 131, 141, and provides a function of controlling access to a file system configured by the entire disk (or a part thereof) connected to each disk array control device.

【００１９】これは基本的にはネットワークファイルシ
ステムと同様の処理であり、全体を等価的に一台のファ
イルシステムとしてみせるよう制御している。各ディス
クアレイ制御装置には、複数台のディスクからなる集合
ディスク装置（すなわち、複数のストレージ装置）１１
６、１２６、１３６、１４６が結合されており、集合
ディスク装置１１６はディスクアレイ制御装置１１に、
集合ディスク装置１２６はディスクアレイ制御装置１２
に、以下順に結合されている。また、全ての各ディスク
アレイ制御装置１１、１２、１３、１４に結合された管
理サーバ３０があり、設定および障害時の処理等はこの
管理サーバ３０を介して実施する。また、このディスク
アレイ装置群を用いるクライアントシステム４０がネッ
トワーク、この場合IPネットワーク経由でこのディスク
アレイに接続される。複数のクライアントは本実施例で
は本質的に同じ動作を行うため、そのうちの一台を４０
として図１に記載し、このクライアントシステムから本
実施例のディスクアレイシステムを使う場合の動作を以
下説明する。This is basically the same processing as the network file system, and is controlled so that the entire system is equivalently viewed as one file system. Each disk array control device has an aggregate disk device (that is, a plurality of storage devices) 11 including a plurality of disks.
6, 126, 136, 146 are combined, and the collective disk device 116 is connected to the disk array controller 11.
The aggregate disk device 126 is the disk array controller 12
, Which are combined in the following order. Further, there is a management server 30 coupled to all the disk array control devices 11, 12, 13, and 14, and settings and processing at the time of failure are carried out through this management server 30. A client system 40 using this disk array device group is connected to this disk array via a network, in this case an IP network. Since a plurality of clients essentially perform the same operation in this embodiment, one of them is
The operation when the disk array system of this embodiment is used from this client system will be described below.

【００２０】次に本実施例で用いるファイルシステムの
構造について説明する。本実施例でのファイルシステム
の内部データ構造を図２に示す。本実施例では、階層的
な構造を用いたファイルシステムを使う。このようなフ
ァイルシステムに於いてもファイル自体は単なるブロッ
クデータであり、ファイルシステム中に置かれたファイ
ルは制御データであるファイルデータから切り離された
管理情報によって管理される。本実施例での管理情報は
ｉｎｏｄｅデータ構造体に格納される。この図２で６
１がｉｎｏｄｅデータ構造体であり、実際のデータ６
２等に対しては直接のポインタ(小さなデータの場合の
高速化のため)、および間接ポインタ(通常のデータに用
いる)によりアクセスを行う。他のフィールドは管理情
報であり、これらは整合性を保った更新が必要となる。
また、本実施例で特徴的な情報として、二つのディスク
アレイ装置番号６４と６５が追加されている。Next, the structure of the file system used in this embodiment will be described. The internal data structure of the file system in this embodiment is shown in FIG. In this embodiment, a file system using a hierarchical structure is used. Even in such a file system, the file itself is just block data, and the file placed in the file system is managed by the management information separated from the file data which is the control data. The management information in this embodiment is stored in the inode data structure. 6 in this FIG.
1 is the inode data structure, and the actual data 6
The 2nd and so on are accessed by a direct pointer (for speeding up small data) and an indirect pointer (used for normal data). The other fields are management information and need to be updated consistently.
Further, two disk array device numbers 64 and 65 are added as characteristic information in this embodiment.

【００２１】本実施例では、この制御データは常に二つ
のディスクアレイ装置６４，６５上に、同一の内容にな
るように管理され置かれるよう制御される。本実施例で
はこの制御データのみが二重化され、データに対しては
冗長構成を取っていない。この二つのディスクアレイ装
置は、図２のディスクアレイ装置番号６４、６５によっ
て示すものである。本実施例では実装時にはこのフィー
ルドには通信に使うＩＰアドレスが格納されることを想
定している。In this embodiment, this control data is controlled so that it is always managed and placed so as to have the same contents on the two disk array devices 64 and 65. In this embodiment, only this control data is duplicated and no redundant structure is provided for the data. The two disk array devices are indicated by the disk array device numbers 64 and 65 in FIG. In this embodiment, it is assumed that the IP address used for communication is stored in this field at the time of implementation.

【００２２】上記ｉｎｏｄｅ情報に含まれる二つのデ
ィスクアレイ装置は、ファイルシステム作成時に設定さ
れる。本実施例では具体的には４台のディスクアレイ装
置を用いておりｉｎｏｄｅ番号０番はディスクアレイ
装置１１と１１６、及び、１２と１２６に格納され、ｉ
ｎｏｄｅ番号１番はディスクアレイ装置１２と１２６、
及び、１３と１３６に格納され、ｉｎｏｄｅ番号２番は
ディスクアレイ装置１３と１３６、及び、１４と１４６
に格納され、ｉｎｏｄｅ番号３番はディスクアレイ装置
１４と１４６、及び、１１と１１６に格納されという風
に繰り返して設定していく。The two disk array devices included in the inode information are set when the file system is created. In this embodiment, specifically, four disk array devices are used, and the inode number 0 is stored in the disk array devices 11 and 116 and 12 and 126.
The node number 1 is the disk array devices 12 and 126,
13 and 136, and the inode number 2 is the disk array devices 13 and 136 and 14 and 146.
Stored in the disk array devices 14 and 146 and 11 and 116, the inode number 3 is repeatedly set.

【００２３】これは初期設定時にある程度の負荷分散を
計るためである。このｉｎｏｄｅから実際のデータへの
アクセスが行われるが、このデータは単にこのディスク
アレイ装置上に分散して配置される。具体的にどのよう
に配置してもよい。実際の実装では、負荷分散が計られ
ることを加味して保守員により指定する、または動的に
再配置を行うなどが考えられる。また、本実施例ではこ
の初期化時にディスクアレイ装置番号６４、６５が設定
される。上記のように動的に制御情報の配置を変更する
処理を取り入れた場合にこのフィールドを同時に変更す
る必要がある。This is to measure the load distribution to some extent at the time of initial setting. Although actual data is accessed from this inode, this data is simply distributed and arranged on this disk array device. It may be arranged in any specific manner. In actual implementation, it is conceivable to specify it by maintenance personnel in consideration of load balancing or dynamically relocate. Further, in this embodiment, the disk array device numbers 64 and 65 are set at the time of this initialization. It is necessary to change this field at the same time when the process of dynamically changing the arrangement of control information is introduced as described above.

【００２４】上記の構成下で、このディスクアレイ装置
の動作を以下説明する。まずクライアント４０からディ
スクアレイ装置上のファイルを操作する際には、まず管
理サーバ３０に問い合わせるべきディスクアレイ制御装
置を問い合わせる。本実施例の構成では、ファイルシス
テム機能は各ディスクアレイ制御装置が個々に持ってお
り、どの個々のディスクアレイ制御装置に問い合わせて
も結果は得られるよう構成されている。ここで管理サー
バ３０に問い合わせるのは管理サーバ３０がディスクア
レイ制御装置の障害情報を一元管理しているためであ
る。その結果を用いてクライアントはディスクアレイ制
御装置に対して問い合わせを行い、結果を得る。The operation of the disk array device having the above structure will be described below. First, when operating a file on the disk array device from the client 40, first, the management server 30 is inquired about the disk array control device to be inquired. In the configuration of this embodiment, the file system function is individually possessed by each disk array control device, and the result can be obtained by inquiring any individual disk array control device. The inquiry to the management server 30 is made because the management server 30 centrally manages the failure information of the disk array control device. Using the result, the client makes an inquiry to the disk array controller and obtains the result.

【００２５】ここで得られたディスクアレイ装置番号は
上記管理サーバ３０への問い合わせ回数を削減し速度を
向上するため、クライアントにキャッシングされる。こ
のキャッシングされたデータは、クライアントが再度同
じｉｎｏｄｅに対してアクセスをする際に、管理サーバ
に問い合わせずこのデータから直接当該ｉｎｏｄｅをも
つディスクアレイ装置に対してアクセスを行う。このよ
うなキャッシュ管理は一般的なものであり、同業者なら
容易に実装可能であろう。The disk array device number obtained here is cached by the client in order to reduce the number of inquiries to the management server 30 and improve the speed. When the client accesses the same inode again, the cached data directly accesses the disk array device having the inode from this data without inquiring the management server. Such cache management is general and can be easily implemented by those skilled in the art.

【００２６】本発明の特徴である制御データの冗長性を
具体化する多重管理については、本実施例では、上記ｉ
ｎｏｄｅ情報はディスクアレイ制御装置間で制御し同期
して更新する。具体的には、クライアントからの更新に
対してはそれを受けたディスクアレイ装置から該当する
ｉｎｏｄｅ二個をロックして排他更新する。問題は障
害発生に伴い、一方のｉｎｏｄｅが読めなくなった場合
の処理である。、本実施例では以下の処理を行う。Regarding the multiplex management for embodying the redundancy of the control data, which is a feature of the present invention, in the present embodiment, the above i
The node information is controlled by the disk array control devices and updated synchronously. Specifically, with respect to the update from the client, two corresponding inodes are locked and exclusively updated from the disk array device receiving the update. The problem is the processing when one of the inodes becomes unreadable due to the occurrence of a failure. In this embodiment, the following processing is performed.

【００２７】（１）各ディスクアレイ制御装置は、内部
処理にてエラーを見つけた場合には、管理サーバに閉塞
することを伝え、閉塞する。(1) When an error is found in the internal processing, each disk array control device informs the management server that it will be closed and closes it.

【００２８】（２）各クライアントは読み出そうとする
際のｉｎｏｄｅを持つボックス番号は適宜キャッシング
しているが、当該各ディスクアレイ制御装置からデータ
を読み出そうとした際に応答がなかった場合には、あら
かじめ設定された時間でタイムアウトし、上記キャッシ
ュの内容をパージし、問題のｉｎｏｄｅを読むべきディ
スクアレイ制御装置を問い合わせる。(2) Each client caches a box number having an inode when it tries to read data, but there is no response when it tries to read data from the respective disk array controller. For a predetermined time, the contents of the cache are purged, and the disk array controller that should read the inode in question is queried.

【００２９】（３）管理サーバは上記問い合わせに対し
て、必要なｉｎｏｄｅを持ち、動作中のディスクアレイ
制御装置名を返す。このディスクアレイ制御装置名を得
るためにｉｎｏｄｅ中のディスクアレイ装置番号フィー
ルド６４、６５を参照する。(3) In response to the above inquiry, the management server has the necessary inode and returns the operating disk array controller name. To obtain the disk array control device name, the disk array device number fields 64 and 65 in the inode are referenced.

【００３０】（４）（１）から（３）をクライアントか
ら問い合わされたファイル要求に対する応答が完了する
まで繰り返す。(4) (1) to (3) are repeated until the response to the file request inquired from the client is completed.

【００３１】逆に、障害で閉塞していたディスクアレイ
制御装置を再度接続する際には特に処理は行わなくと
も、管理サーバから徐々に再接続されたディスクアレイ
装置サーバに対して負荷を分散していくことでリカバリ
可能である。積極的に再接続のための処理を行ってもも
ちろん構わない。しかし、上記の通り所定ｉｎｏｄｅの
位置はキャッシングされているだけであり、キャッシュ
内容の入れ替えが時間と共に進めば元の状態に近づいて
いく。また、このような閉塞と再接続を繰り返していっ
た場合、場合によっては負荷の不均衡が生じることがあ
るが、その場合に管理サーバが介入してクライアント上
のキャッシングされたデータのパージなどの処理で負荷
バランスを取り直す処理を行うような修正を上記の手続
きに加味することは容易である。On the contrary, when reconnecting the disk array control device which has been blocked due to a failure, the management server gradually spreads the load to the reconnected disk array device server without performing any processing. It is possible to recover by going on. Of course, it does not matter if the processing for reconnection is actively performed. However, as described above, the position of the predetermined inode is only cached, and if the replacement of the cache contents progresses with time, it approaches the original state. In addition, if such blockage and reconnection are repeated, load imbalance may occur in some cases. In that case, the management server intervenes and purges the cached data on the client. It is easy to add a modification such as the process of rebalancing the load to the above procedure.

【００３２】本実施例では速度を重視し、クライアント
が能動的にタイムアウトを検出して管理サーバ30に再問
い合わせを行う構成例を説明したが、キャッシングを行
わない構成とし、クライアント40の一回の転送毎の問い
合わせに対して毎回管理サーバ30が適切なディスクアレ
イ装置を返答するよう構成することも可能であり、上記
の手順の(2)で毎回管理サーバに問い合わせるよう変更
するのみであるため同業者にはこの構成への変更は容易
であろう。この場合は管理サーバ30に高速なものが要求
される一方、クライアント側のオペレーティングシステ
ムでディスクアレイの構成を意識しないでよいという利
点がある。In this embodiment, the speed is emphasized, and the configuration example in which the client actively detects the time-out and re-inquires to the management server 30 has been described. It is also possible to configure the management server 30 to reply with an appropriate disk array device for each transfer inquiry, and since the management server 30 is changed only to the management server in step (2) of the above procedure. It would be easy for a trader to change to this configuration. In this case, a high-speed management server 30 is required, but there is an advantage that the operating system on the client side need not be aware of the disk array configuration.

【００３３】また、本実施例では、通信開始時、障害時
にはシステム上の管理サーバの介入により制御する構成
（すなわち、一台ないしは複数台の管理サーバによって
クライアントシステムがネットワークに接続された場
合）を例示したが、高機能なディスクアレイでは管理サ
ーバを置かず、ディスクアレイ間の通信によって自律的
に障害処理等を行う構成(すなわち、協調動作によって
クライアントシステムがネットワークに接続された場
合)を取ることもできる。この場合、例えばディスクア
レイで障害を自己検出した場合、障害を起こしたディス
クアレイの代わりに制御データを持つ対となるディスク
アレイとの間で処理の引継を行い、代替処理をおこなう
ディスクアレイが障害を起こしたディスクアレイのIPネ
ットワークアドレスに対して代替応答するように制御す
ることによって、クライアントから透過な形で処理の引
継を行うことができる。このような構成をとる場合に
は、クライアントが任意の、あるいはクライアントごと
にあらかじめ指定されたディスクアレイ装置に対してア
クセスすることで、ディスクアレイ装置間で通信を行い
所定のデータが返される。このような場合にも本発明の
制御データのみを冗長化する構成をとることにより可用
性の高いディスクアレイ装置を実現できる。基本的な制
御は上記記載以外は本実施例で示したものとほぼ同じで
あり、同業者ならばそのようなディスクアレイ装置を容
易に実現できるであろう。Further, in this embodiment, at the time of starting communication and at the time of failure, control is performed by intervention of the management server on the system (that is, when the client system is connected to the network by one or a plurality of management servers). As illustrated, a management server is not placed in a high-performance disk array, and a configuration is adopted in which failure processing is autonomously performed by communication between disk arrays (that is, when the client system is connected to the network by cooperative operation). You can also In this case, for example, when a disk array self-detects a failure, the processing is taken over to the paired disk array that has control data instead of the failed disk array, and the disk array that performs the alternative processing fails. It is possible to take over the processing transparently from the client by controlling the alternate response to the IP network address of the disk array that caused the error. In such a configuration, the client accesses an arbitrary disk array device or a disk array device designated in advance for each client, whereby communication is performed between the disk array devices and predetermined data is returned. Even in such a case, a highly available disk array device can be realized by adopting a configuration in which only the control data of the present invention is made redundant. The basic control is almost the same as that shown in this embodiment except the above description, and a person skilled in the art can easily realize such a disk array device.

【００３４】また、本実施例ではすべてのディスクアレ
イ制御装置が一つのローカルエリアネットワークに接続
されている場合を図１に示しているが、ネットワークを
分割することや、ディスクアレイ制御装置を距離的に分
散して配置するなどのネットワーク構成上の修正・拡張
は容易に可能であり、制御面でも特にそれに伴った変更
は必要としない。もちろん、ネットワークスイッチやル
ータを含める変更を行うことも出来る。また、本実施例
は広く普及しているIPネットワークを用いたネットワー
クに即して説明しているが、IPネットワークのみの特徴
を活用しているわけではなく、他のネットワーク技術を
用いた場合にも同様に適用できる。In this embodiment, all the disk array control devices are connected to one local area network in FIG. 1, but the network is divided and the disk array control devices are separated by distance. It is possible to easily modify / expand the network configuration such as distributively arranging them in the network, and it is not necessary to make any changes in the control aspect. Of course, you can make changes to include network switches and routers. Further, although the present embodiment has been described in the context of a network using a widely used IP network, it does not utilize the characteristics of the IP network alone, and may be used when other network technologies are used. Can be similarly applied.

【００３５】本実施例では既に説明したとおり制御デー
タのみを二重化することで、すべてを二重化した構成に
比し低コストで可用性の高いネットワーク型ディスクア
レイ装置を実現している。これを、例えば制御データを
三重化、データを二重化などのより冗長度の高い構成と
しデータの可用性を高める構成に変更することも容易で
あるし、その場合にも制御データの冗長度をデータの冗
長度より高く保つことによって少ないコストで可用性を
大きく向上させることができるネットワーク型記憶装
置、すなわち、ディスクアレイ装置を実現することがで
きる。In the present embodiment, as already described, by duplicating only the control data, a network type disk array apparatus having a low cost and a high availability is realized as compared with the duplication of all. It is easy to change this to a configuration with higher redundancy, such as triple control data and double data, to increase the availability of data. By maintaining the redundancy higher than the redundancy, it is possible to realize a network type storage device, that is, a disk array device, whose availability can be greatly improved at a low cost.

【００３６】[0036]

【発明の効果】本発明により、信頼性とコストの最適化
を図ったネットワーク型ファイルシステムを提供するネ
ットワーク型記憶装置、すなわち、ディスクアレイ装置
の実現が可能となった。According to the present invention, it is possible to realize a network type storage device, that is, a disk array device, which provides a network type file system in which reliability and cost are optimized.

[Brief description of drawings]

【図１】本発明の好適な実施例におけるネットワーク型
ファイルシステムを提供するネットワーク型記憶装置、
すなわち、ネットワーク対応クラスタ型ディスクアレイ
装置の概略を示す。FIG. 1 is a network type storage device which provides a network type file system according to a preferred embodiment of the present invention;
That is, an outline of a network-compatible cluster type disk array device is shown.

【図２】本発明の好適な実施例における制御データであ
るｉｎｏｄｅの構成を示すFIG. 2 shows a configuration of an inode which is control data in a preferred embodiment of the present invention.

[Explanation of symbols]

５…ネットワーク(IPネットワーク)、１１、１２、１
３、１４…ディスクアレイ制御装置、３０…管理サー
バ、４０…クライアントシステム、６１…ｉｎｏｄｅデ
ータ構造体、６２…データ、６４、６５…ディスクアレ
イ装置、１１１、１２１、１３１、１４１…ファイル制
御装置、１１２、１２２、１３２、１４２…制御記憶装
置、１１３、１２３、１３３、１４３…ディスク制御装
置、１１４、１２４、１３４、１４４…キャッシュ、１
１６、１２６、１３６、１４６…集合ディスク装置。5 ... Network (IP network), 11, 12, 1
3, 14 ... Disk array control device, 30 ... Management server, 40 ... Client system, 61 ... Inode data structure, 62 ... Data, 64, 65 ... Disk array device, 111, 121, 131, 141 ... File control device, 112, 122, 132, 142 ... Control storage device, 113, 123, 133, 143 ... Disk control device, 114, 124, 134, 144 ... Cache, 1
16, 126, 136, 146 ... Collective disk device.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 12/16 ３２０Ｇ０６Ｆ 12/16 ３２０Ｌ ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G06F 12/16 320 G06F 12/16 320L

Claims

[Claims]

1. A network, a plurality of storage devices connected to the network and using magnetic disks, a client system connected to the network, file data provided to the client system, and controlling the file data. Means for storing management data in the storage device and providing a file access function to the client system using the management data, the management data and the file data having different redundancy. A network type storage device characterized by being distributedly stored in.

2. The network type storage device according to claim 1, wherein the client system is a client system connected to a network by cooperative operation.

3. The network type storage device according to claim 1, wherein the client system is a client system connected to a network according to an instruction from one or a plurality of management servers.