JPH06511099A

JPH06511099A - How to perform disk array operations using a non-uniform stripe size mapping scheme

Info

Publication number: JPH06511099A
Application number: JP5511941A
Authority: JP
Inventors: ニューフェルド，イー・デイビド
Original assignee: Compaq Computer Corp
Current assignee: Compaq Computer Corp
Priority date: 1991-12-27
Filing date: 1992-12-18
Publication date: 1994-12-08
Also published as: EP0619896A1; WO1993013475A1; AU3424993A; CA2126754A1

Abstract

(57)【要約】本公報は電子出願前の出願データであるため要約のデータは記録されません。 (57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】不均−スドライブサイズマツピングスキームを用いたディスク配列操作を実行する方法技術分野本発明は、コンピュータシステムにおける多重ディスク・ドライブのための性能を改善する方法を指向し、特に、パリティデータ冗長及び回復保護機能を用いて、ディスク配列における書込操作を実行する方法に関する。[Detailed description of the invention] Performing disk array operations using a non-uniform disk drive size mapping scheme Technological field The present invention provides performance improvements for multiple disk drives in computer systems. In particular, using parity data redundancy and recovery protection features , relates to a method for performing write operations on a disk array.

背景技術ディスクを利用するマイクロプロセッサ及びコンピュータが近年の間にますますいっそう強力になってきている。現在、利用可能なパソコンは、１０年前のメインフレーム及びミニコンピユータを上回る性能を持っている。３２ビツト幅のマイクロプロセッサデータ・バスが広範囲に利用可能となり、一方過去では８ビツトが従来となり、１６ビツトが普通になった。Background technology In recent years, microprocessors and computers that utilize disks have increasingly become It's becoming even more powerful. The computers currently available are from 10 years ago. It has performance that exceeds that of frame and minicomputers. 32-bit width Microprocessor data buses have become widely available, whereas in the past 16 bits became the norm.

パソコンシステムは、長年にわたって発展し、新しい使用が日毎発見されている。使用法が変化し、この結果、完全なコンピュータシステムを形成する種々のサブシステムの異なる要望がある。コンピュータシステムの増大した性能によって固定ディスクドライブ等の大容量記憶サブシステムがコンピュータシステムのデータの転送でますます重要な役割を果たすことが明白になった。過去数年においてはディスク配列サブシステムとして参照される記憶サブシステムにおける新しい傾向がデータ転送性能、容量及び信頼性を改善するために出現してきた。ディスク配列サブシステムを確立する１つの理由は非常に高いデータ転送率を持つ論理的デバイスを創ることである。これは、多重の標準ディスク・ドライブを相互に「集団化」して、これらのドライブの授受データを並列に転送することによって達成できる。従って、データは、各ディスクがデータファイルを備えたデータの部分を持つように、ディスク配列を備えた各ディスクを横断して記憶される。Personal computer systems have evolved over the years and new uses are discovered every day. . Usage changes and this results in different services forming a complete computer system. There are different requirements for different systems. Due to the increased performance of computer systems A mass storage subsystem, such as a fixed disk drive, is a computer system's It has become clear that they will play an increasingly important role in the transfer of data. past few years smell A new feature in the storage subsystem, referred to as the disk array subsystem. New trends have emerged to improve data transfer performance, capacity and reliability. Di One reason for establishing a disk array subsystem is to have a logic with very high data transfer rates. The goal is to create a logical device. This allows multiple standard disk drives to be interconnected. By "grouping" these drives and transferring the data sent and received from these drives in parallel, can be achieved. Therefore, the data is stored in the data file where each disk has a data file. is stored across each disk in the disk array to have a portion of .

もしｎ個のドライブが相互に集団化されるならば、有効データ転送率がｎ倍まで増加し得る。この技術はストライプとして知られ、２次記憶への大量の往来データの転送が頻繁に要求される超計算環境で始めに採用された。If n drives are grouped together, the effective data transfer rate can be increased by a factor of n. It can increase. This technique is known as striping, and is used to remove large amounts of traffic to and from secondary storage. It was first adopted in hyper-computing environments where data transfer is frequently required.

このストライプにおいては、連続データブロックがセクタサイズのような単位長の区分に分割され、連続区分が単一ディスクドライブの序列位置でなく、複数のディスクドライブに順次書き込まれる。各ディスクを横断して記憶されたデータの単位長即ち容量がストライプサイズとして参照される。このストライプサイズは、データ転送特性及びアクセス時間に影響を及ぼして、一般にディスク配列の往来データの転送を最適化するように選ばれる。もしデータブロックがｎ単位長より長いならば、プロセスは、各ディスク・ドライブ上の次のストライプ位置毎に繰り返される。この方法によってｎ物理的ドライブは、単−論理的デバイスになり、ソフトウェア又はハードウェアのいずれかを通して実行されてもよい。In this stripe, consecutive data blocks have a unit length such as sector size. partitions, where consecutive partitions are not a single disk drive ordinal position, but multiple written sequentially to the disk drive. Data stored across each disk The unit length or capacity of is referred to as the stripe size. This stripe size generally affects the data transfer characteristics and access times of disk arrays. chosen to optimize the transfer of traffic data. If the data block is n units long If longer, the process runs every next stripe position on each disk drive. repeated. This method turns a physical drive into a single logical device. and may be implemented through either software or hardware.

ディスク配列サブシステムにおけるデータ保護及び回復用に形成するために用いた１つの技術がパリティスキームとして参照される。このパリティスキームでは、配列内の種々のドライブに書き込まれるデータブロックが用いられ、公知の排他的論理和（ＸＯＲ）技術が配列内の予約即ちパリティ用ドライブに書き込まれたパリティ情報を創るために用いられる。パリティスキームの利点は、配列内のデータ冗長及び回復目的に専念するデータ記憶の量を最小にするために使われてもよいことである。例えば、図１は、３つのディスク即ちディスクＯ、ディスク１及びディスク２がデータ記憶のために使われ、１つのディスク即ちディスク３がパリティ情報を記憶するために使われる伝統的３＋１マツピングスキームを説明している。図１において、番号又は番号に結合した文字Ｐを囲む各長方形は、好ましくは５１２バイトであるセクタに対応する。図１に示すように、各完全なストライブは、ディスク毎にデータ記憶の合計１２セクタのために各ディスクＯ１１及び２から４セクタを使う。５１２バイトの標準セクタサイズを仮定すると、これらの各ディスクストライブのストライブサイズは、ストライブを備えた複数のディスクの１つにストライブを割り当てられた記憶量として定義された２キロバイトである。このように、ストライブに割りあてられた各ディスクの部分の合計を含む各完全なストライブが、データの６キロバイトを記憶することができる。各ストライブのディスク３が、パリティ情報を記憶するために使われる。しかしながら、ディスク配列システムにおけるパリティ欠陥許容技術の使用に多くの不利なことがある。図１で説明された３＋１マツピングスキームにとって１つの不利なことは、データドライブが更新され或は書き込まれる毎に、パリティドライブが更新されなければならないので、ディスク配列内の性能をロスすることである。データは、データドライブにデータを書込むのみでなく、パリティドライブに更新されたパリティ情報を書き込むためにＸＯＲプロセスを通過しなければならない。このプロセスは、分配されたパリティデータを持つことによって、専念のパリティディスク上の負荷を救済して、部分的に軽減されるかもしれない。しかしながらこれは全部のデータ書込操作の回数を減少しない。Used to configure for data protection and recovery in disk array subsystems. One technique is referred to as a parity scheme. In this parity scheme , data blocks written to different drives in the array are used, and the known exclusion An alternative or (XOR) technique is written to a reserved or parity drive within the array. It is used to create parity information. The advantage of the parity scheme is that the Used to minimize the amount of data storage dedicated to data redundancy and recovery purposes It's also a good thing. For example, FIG. 1 shows three disks: disk O, disk 1 and disk 2 are used for data storage, one disk namely disk 3 describes the traditional 3+1 mapping scheme used to store parity information. It's clear. In Figure 1, each rectangle surrounding a number or a letter P combined with a number is It corresponds to a sector which is preferably 512 bytes. As shown in Figure 1, each complete A stripe separates each disk O for a total of 12 sectors of data storage per disk. 11 and 2 to 4 sectors are used. Assuming a standard sector size of 512 bytes , the stripe size of each of these disk stripes is A 2-kilometer storage area defined as the amount of storage allocated to a stripe on one of the number of disks. It's Robite. In this way, the portion of each disk allocated to the stripe is Each complete stripe can store 6 kilobytes of data, including a total of Ru. Disk 3 of each stripe is used to store parity information. death However, the use of parity defect tolerance techniques in disk array systems is There may be disadvantages. One for the 3+1 mapping scheme illustrated in Figure 1 The disadvantage of this is that every time the data drive is updated or written to, the parity Loss of performance within the disk array as the live must be updated It is. Data is not only written to the data drive, but also to the parity drive. must go through an XOR process to write updated parity information to Must be. This process, by having distributed parity data, May be partially alleviated by relieving the load on a dedicated parity disk . However, this does not reduce the number of total data write operations.

パリティ欠陥許容技術にとって他の不利なことは、伝統的オペレーティング・システム即ちオーニスが部分的ストライブ書込操作として参照されるディスク配列のストライブサイズよりしばしば小さいディスクサブシステムに多くの小さい書込を実行することである。例えば、多くの伝統的ファイルシステムは、ファイルシステム内のファイル、ディレクトリ及び自由空間の構造を示すために、小さいデータ構造を使う。代表的ＵＮ　Ｉ　Ｘファイルシステムにおいては、この情報が一般にサイズで２キロバイトであるｌＮ０ＤＥ又はｉノードと言われた構造で保持される。ｌＮ０ＤＥ即ちＩＮＤＥＸ　ＭＯＤＥは、データが配置される位置、所有及び使用ユーザ、ファイルが修正又はアクセスされた最新時刻、ＭＯＤＥが修正された最新時刻及びリンク数又はそのｌＮ０ＤＥと協働したファイル名のファイルのタイプ及びサイズに関する情報を含む。Ｏ５／２高性能フアイルシステムでは、この構造がＦＮＯＤＥと言われる。これらの構造はファイルアクセス及び修正日付及びファイルサイズ情報を含んでいるので、しばしば更新される。Another disadvantage to parity defect tolerant techniques is that traditional operating systems A disk arrangement where the stem or ornis is referred to as a partial stripe write operation. Many small writes on a disk subsystem that are often smaller than the stripe size of It is to carry out the For example, many traditional file systems use files such as A small display to show the structure of files, directories and free space in the system. Use data structures. In a typical UNIX file system, this information is a structure called lN0DE or inode, which is generally 2 kilobytes in size. Retained. lN0DE or INDEX MODE is the position where the data is placed , the owner and user, the most recent time the file was modified or accessed, the MODE The latest time and number of links that were modified or the file name that collaborated with that IN0DE. Contains information about file type and size. O5/2 high performance file system In systems, this structure is called an FNODE. These structures are file access It is updated frequently as it contains modification date and file size information.

これらの構造は、ディスク配列に使われた代表的データストライブサイズと比較してかなり小さく、従って、部分的ストライプ書込操作が多くなる原因になる。These structures are compared to typical data stripe sizes used in disk arrays. is fairly small and therefore causes many partial stripe write operations.

部分的ストライプ書込操作が相当頻繁にディスク配列で発生した時には、後述されるように、ディスク上の現存のデータ又はパリティ情報が新規パリティ情報を生成するために、ディスクから読出さなければならないので、ディスクサブシステムの性能が深刻に縮小される。これは、ディスク・ドライブの余分な回転を生じさせ、要求を実施する時間の遅れとなる。実操作の実施に要求される時間に加えて、ディスクの同じセクタへのＷＲＩＴＥ操作が追従するＲＥＡＤ操作はハード（固定）ディスク・ドライブのある形式で、１ディスク回転即ち約１６．５ミリ秒のロスとなることが認められる。When partial stripe write operations occur fairly frequently on a disk array, The existing data or parity information on disk will be replaced with the new parity information so that The disk subsystem must be read from disk in order to generate system performance is severely reduced. This creates extra spins on the disk drive. This will cause a delay in the time it takes to implement the request. In addition to the time required to perform the actual operation. In addition, a READ operation followed by a WRITE operation to the same sector of the disk is A type of fixed disk drive that rotates one disk revolution or approximately 16.5 millimeters. It is recognized that this will result in a loss of 2 seconds.

データの完全なストライブが配列に書き込まれる限り、パリティ情報は、ドライブ配列に書き込まれたデータから直接生成されてもよく、それ故、ディスクストライプのどんな余分な読出も必要とされない。しかしながら、問題は、前述されるようにディスク配列の制御器が完全なストライブのためにパリティを計算するために書き込まれたデータから十分な情報を得られないので、コンピュータがディスク配列に部分的ストライブのみを書き込む時に発生する。このように部分的ストライプ書込操作は、一般に最初読み出され、ホストシステムで活性なプロセスによって修正され、データディスク上の同じアドレスに書き戻される、ディスクに記憶されたデータを必要とする。この操作は、データディスクＲＥＡＤ、データの修正及び同じアドレスへのデータディスクＷＲＩＴＥから成る。一般に部分的ストライブ書込操作毎にパリティ情報を計算するために使われた２つの技術がある。As long as a complete stripe of data is written to the array, parity information is may be generated directly from data written to disk arrays, and therefore No extra reading of the write is required. However, the problem is The disk array controller calculates parity for a complete stripe so that Because the computer cannot get enough information from the data written to the Occurs when writing only a partial stripe to a disk array. Partially like this A striped write operation is typically read first and is written by an active process on the host system. disk that is modified by the disk and written back to the same address on the data disk. requires data stored on the computer. This operation performs data disk READ, data disk consists of modifying the data and writing the data to the same address. generally Two techniques used to calculate parity information for each fractional stripe write operation There is.

第１の技術では、ＸＯＲパリティ欠陥許容システムにおけるデータディスクへの部分的ストライブ書込がパリティ欠陥許容まずコンピュータシステムは、更新されるデータディスクセクタ毎にパリティディスクからのパリティ情報及びデータディスクから置換される古いデータ値を読み出す。その後ＸＯＲパリティ情報は、古いデータセクタ同士を排他的論理和することによって、関連のパリティセクタを置換するホスト又はローカルプロセッサ又は専用ロジックによって再計算される。これはそれらのデータ値なしでパリティ値を回復する。新規データ値は、新規パリティデータを生成するために、この回復された値に排他的論理和される。その後ＷＲＩＴＥコマンドはデータディスクに更新されたデータを書込み、パリティディスクに新規パリティ情報を書込んで実行される。このプロセスは、新規ＸＯＲパリティ情報の生成前に２つの追加の部分的セクタＲＥＡＤ操作即ちパリティディスクからの読出と、古いデータの読出とが要求されることが認められる。追加的にＷＲＩＴＥ操作は丁度読出された位置にある。従ってデータ転送性能に不利を招く。In the first technique, data disks are Partial stripe writes tolerate parity defects first when the computer system is updated parity information and data from the parity disk for each data disk sector Read old data values to be replaced from disk. After that, the XOR parity information is , the related parity sectors are recomputed by the host or local processor or dedicated logic that replaces the data. It will be done. This recovers parity values without their data values. The new data value is XORed with this recovered value to generate new parity data . The WRITE command then writes the updated data to the data disk and It is executed by writing new parity information to the parity disk. This process Two additional partial sector READ operations or parity operations are performed before generating the standard XOR parity information. It is recognized that reading from the security disk and reading old data is required. Ru. Additionally, a WRITE operation is at the position just read. Therefore data transferability This brings disadvantage to Noh.

第２の方法は、ＷＲＩＴＥ操作で置換されないという事実にもかかわらずストライブのために拒否できないデータの残りを読むことを必要とする。新規パリティ情報は、回復された古い（旧）データ及び新規データを使って、更新している全ストライプのために決定されてもよい。このプロセスは、置換すべきでないデータのＲＥＡＤ操作と、パリティ情報をセーブするフル（全）ストライプＷＲＩＴＥ操作とを必要とする。The second method uses the strut despite the fact that it is not replaced in the WRITE operation. Eve needs to read the rest of the data which cannot be denied. new parity All information is updated using recovered old (old) data and new data. May be determined for stripes. This process identifies data that should not be replaced. Data READ operation and full stripe WRIT to save parity information E operation is required.

先行技術によると、ディスク配列を利用しているパリティ欠陥許容法は、部分的ストライプＷＲＩＴＥ操作を管理するために、上記技術の１つを実行しなければならなかった。それ故、部分的ストライブ書込は、書込まれなかったストライブの残りがフェッチされなければならず、或は、ストライブのための現存のパリティ情報がその情報の実害込より前に読出されなければならないので、システム効率に害させた。従って、部分的ストライブ書込操作の回数を減少させるためには、パリティ欠陥許容ディスク配列におけるディスクＷＲＩＴＥ操作を実行する改善された方法の必要性がある。According to the prior art, parity defect tolerance methods that utilize disk arrays are partially To manage striped WRITE operations, one of the techniques listed above must be performed. did not become. Therefore, a partial stripe write is a stripe write that was not written. must be fetched, or the existing parity for the stripe must be fetched. The system efficiency is reduced because the system information must be read before the information is actually corrupted. caused harm to the rate. Therefore, to reduce the number of partial stripe write operations, , a modification that performs a disk WRITE operation in a parity defect-tolerant disk array. There is a need for improved methods.

ディスク・ドライブ゛のバックグラウンドでフォーマット化が適切であると思われる。また製造業者は、ディスク・ドライブが生産又は製作される時に、ディスクを低レベルにフォーマット化している。低レベルフォーマット操作は、セクタの形成と共に、フォーマット化の完了後にセクタを同定するために使われるそれらのアドレスのマーク化を伴う。セクタのデータ部分が確立されて、ダミーのデータが満たされる。ディスク・ドライブユニットがコンピュータシステムと協働した時には、ディスク制御器及びコンピュータシステムを管理する各オーニスがディスク・ドライブの高レベル即ち論理的フォーマットを形成して、ディスクに「ファイルシステム」を位置させて、ディスク・ドライブを標準のオーニスに合わせるようにしなければならない。この高レベルのフォーマット化は、［メイクファイルシステム」プログラムとして参照されるオーニスサービスと関連して各ディスク制御器によって実行される。ＵＮＩＸオーニスでは、メイクファイルシステムプログラムは、ディスク配列上のファイルシステムを創るために、ディスク制御器と関連して働く。伝統的システムでは、オーニスがブロック又はセクタの連続リストとしてディスクを考察し、メイクファイルシステムプログラムがこれらブロックのトポロジーに関して意識しない。It seems appropriate to format the disk drive in the background. It will be done. Manufacturers also require that the disk drive be is formatted to a low level. Low-level formatting operations are performed on sectors It is used to identify sectors after formatting is complete, along with the formation of with marking of their addresses. The data part of the sector is established and the dummy data data is filled. Disk drive unit collaborates with computer system When each ornis that manages the disk controller and computer system Forms the high-level or logical format of a disk drive and stores it on a disk. Position the “File System” and fit the disk drive into a standard ornis. We have to make sure that we let them do it. This high-level formatting is Each file system is associated with an ornis service referred to as a "file system" program. Executed by the disk controller. On UNIX ornis, the make file system The system program uses the disk array to create a file system on the disk array. It works in conjunction with the controller. In traditional systems, ornis is a block or sector. The makefile system program considers the disk as a contiguous list of They are not aware of the topology of these blocks.

発明の開示本発明は、ディスク配列サブシステムを持つコンピュータシステムにおけるディスク性能を改善する方法及び装置を指向する。本発明に従った方法においては、ディスク配列がデータストライブの可変サイズを持つある指定領域を含む不均一マツピングスキームが使われる。このディスク配列は、ファイルシステムで頻繁に用いられる内部データ構造のサイズと殆ど同じであるストライブサイズを持つ複数のデータストライブからなる領域と、一般のデータ記憶のために使われるより大きいストライブサイズを持つ複数のデータストライブを含む領域とを含む。Disclosure of invention The present invention provides disk storage in a computer system having a disk array subsystem. The present invention is directed to methods and apparatus for improving disk performance. In the method according to the invention, A non-uniform disk array containing some designated area with variable size of data stripes A mapping scheme is used. This disk array is frequently used in file systems. has a stripe size that is almost the same as the size of the internal data structures used in A region consisting of multiple data stripes and a region used for general data storage. A region containing multiple data stripes with a larger stripe size.

小さいデータ構造の１つを伴って書込操作が発生する時には、好ましくはそのデータ構造は、完全なストライブサイズがデータ構造のサイズと一致したディスク配列における小さいストライブ領域にマツプされる。この様にファイルシステムデータ構造が更新する時にはいつでも、操作がフルなストライブ書込である。これは部分的ストライブ書込操作の回数を減少させ、かくして、これらの操作と関連する性能不利益も減少させる。When a write operation occurs involving one of the small data structures, preferably The data structure is a disk whose complete stripe size matches the size of the data structure. Maps to a small stripe region in the array. File system like this Whenever a data structure is updated, the operation is a full stripe write. child This reduces the number of partial stripe write operations and thus reduces the number of The associated performance penalty is also reduced.

本発明のより良い理解は、以下の好ましい実施例の詳細な記述が添付図面とともに熟慮される時に得ることができる。A better understanding of the invention may be obtained from the following detailed description of the preferred embodiments, taken together with the accompanying drawings. It can be obtained when you think about it.

図面の簡単な説明図１は、均一　ストライプサイズを持つ伝統的３＋１デイスク配列マツピングスキームの先行技術図である。Brief description of the drawing Figure 1 shows a traditional 3+1 disk array mapping with uniform stripe size. It is a prior art diagram of Keim.

図２及び３は、本発明の方法が実施されてもよい記述的コンピュータシステムのブロック図である。2 and 3 illustrate a descriptive computer system in which the method of the invention may be implemented. It is a block diagram.

図４は、好ましい実施例のディスクサブシステムのブロック図である。FIG. 4 is a block diagram of the disk subsystem of the preferred embodiment.

図５は好ましい実施例による図４の転送制御器の機能的ブロック図である。FIG. 5 is a functional block diagram of the transfer controller of FIG. 4 in accordance with a preferred embodiment.

図６は、第１の実施例による可変サイズのストライブを持つ３＋１ディスク配列マツピングスキームの図である。FIG. 6 shows a 3+1 disk array with variable size stripes according to the first embodiment. FIG. 2 is a diagram of a mapping scheme.

図７は、本発明の第２の実施例による可変ストライブサイズを持つＲＡＩＤ５　３＋１ディスク配列マツピングスキームの図である。FIG. 7 shows RAID5 with variable stripe size according to the second embodiment of the present invention. FIG. 3 is a diagram of a 3+1 disk array mapping scheme.

図８は発明の好ましい実施例による４＋１ディスク配列マツピングスキームの図である。FIG. 8 is a diagram of a 4+1 disk array mapping scheme according to a preferred embodiment of the invention. It is.

図９は本発明の方法によるＷＲＩＴＥ操作のフローチャート図である。FIG. 9 is a flowchart diagram of a WRITE operation according to the method of the present invention.

図１０は、本発明の方法によるＲＥＡＤ操作のフローチャート図である。FIG. 10 is a flowchart diagram of a READ operation according to the method of the present invention.

発明を実施するための最良の形態以下に記述されたコンピュータシステム及びディスク配列サブシステムは本発明の好ましい実施例を代表している。勿論、以下に記述されたシステムの能力を持たない他のコンピュータシステムが本発明を実施するために用いることが予想される。BEST MODE FOR CARRYING OUT THE INVENTION The computer system and disk array subsystem described below are of the invention. represents a preferred embodiment. Of course, if you have the capabilities of the system described below, It is anticipated that other computer systems may be used to implement the invention. It will be done.

合図２及び３を参照して、文字Ｃは、一般に本発明が実施されてもよいコンピュータシステムを明示している。明確化のためにシステムＣは、丸数字１から８で参照される相互接続部を有する図２及び３の２つの部分で示される。システムＣは４つのバスを経て相互接続される多くのブロック要素を含む。With reference to signals 2 and 3, the letter C generally refers to a computer on which the invention may be implemented. The data system is clearly specified. For clarity, System C is designated by circled numbers 1 through 8. It is shown in two parts in FIGS. 2 and 3 with referenced interconnects. System C contains a number of block elements interconnected via four buses.

中央処理ユニッ１−ＣＰＵは、システムプロセッサバス２６に接続されたシステムプロセッサ２０と、数値コプロセッサ２２と、キャッシュメモリ制御器２４と、関連のロジック回路とを備える。キャッシュ制御器２４には、高速キャッシュデータランダムアクセスメモリ　（ＲＡＭ）２８、非キャッシュ化メモリアドレス（ＮＣＡ）マツププログラミングロジック回路３０、非キャッシュ化アドレス又はＮＣＡメモリマツプ３２、アドレス交換ラッチ回路３４、データ交換トランシーバ３６及びページヒツト検知ロジック４３が協働する。またＣＰＵには、システムプロセッサ用意ロジック回路３８、次のアドレス（ＮＡ）イネーブルロジック回路４０及びバス要求ロジック回路４２が協働する。Central processing unit 1 - CPU is a system processor connected to system processor bus 26. a system processor 20, a numerical coprocessor 22, and a cache memory controller 24. , and associated logic circuits. The cache controller 24 includes a high-speed cache. Data random access memory (RAM) 28, non-cached memory address (NCA) map programming logic circuit 30, non-cached address Or NCA memory map 32, address exchange latch circuit 34, data exchange transformer Seaver 36 and page hit detection logic 43 cooperate. Also, the CPU has System processor preparation logic circuit 38, next address (NA) enable logic The bus request logic circuit 40 and the bus request logic circuit 42 cooperate.

システムプロセッサは、好ましくはインテル社８０３８６マイクロプロセツサである。システムプロセッサ２ｏは、システムプロセッサバス２６にインタフェースされた制御、アドレス及びデータラインを持っている。コプロセッサ２２は、好ましくはローカルプロセッサバス２６及びシステムプロセッサ２゜に公知方法でインタフェースしているインテル８０３８７又はウェイチック社のＷＴＬ３１６７数値プロセッサである。キャッシュＲＡＭ２８は、好ましくは必要とされたキャッシュメモリ操作を実行するために、キャッシュ制御器２４の制御下でバス２６のアドレス及びデータ要素とインタフェースする好適に高速の静的ランダムアクセスメモリである。キャッシュ制御器２４は、好ましくは２ウエイセツト・アソシアチブ・マスタモードで動作するようにコンフィギユレーションされたインテル８２３８５キヤツシユ制御器である。好ましい実施例では、構成要素が各ユニットの３３ＭＨｚバージョンである。所望ならば、インテル８０４８６マイクロプロセツサ及び外部のキャッシュメモリシステムが８０３８６、数値コプロセッサ、８２３８５及びキャッシュＲＡＭを置換してもよい。アドレスラッチ回路３４及びデータトランシーバ３６は、プロセッサ２ｏと共にキャッシュ制御器２４にインタフェースして、プロセッサバス２６及びホスト又はメモリバス４４間をインターフェイスするローカルバスを形成する。回路３８は、バス２６へのアクセスを制御し、次のサイクルを始める時を指示するバス用意信号を供給するロジック回路である。イネーブル回路４０は、パイプラインアドレスモードにおけるサブシステム要素によって利用されるべきデータ又はコードの次のアドレスがローカルバス２６に置かれてもよいことを指示するために利用される。The system processor is preferably an Intel 80386 microprocessor. be. System processor 2o interfaces to system processor bus 26. It has controlled control, address and data lines. The coprocessor 22 is Preferably, the local processor bus 26 and the system processor 2 are connected in a known manner. Intel 80387 or Waychik WTL31 interfaced with 67 numerical processor. Cache RAM 28 is preferably required bus under the control of cache controller 24 to perform cache memory operations. Preferably fast static random interface with 26 address and data elements It is access memory. Cache controller 24 preferably has a two-way set An instance configured to operate in associative master mode. Intel 82385 cache controller. In a preferred embodiment, each component This is the 33MHz version of the unit. Intel 80486 my 80386, numerical coprocessor and external cache memory system The processor, 82385 and cache RAM may be replaced. address latch times A cache controller 34 and a data transceiver 36 together with the processor 2o 24, a processor bus 26 and a host or memory bus 44. form a local bus that interfaces between Circuit 38 connects bus 26 to Controls access and provides bus ready signals to indicate when to start the next cycle It is a logic circuit. Enable circuit 40 is in pipeline address mode. the next address of data or code to be used by the subsystem element is used to indicate that the local bus 26 may be placed on the local bus 26.

非キャッシュ化メモリアドレス（ＮＣＡ）マツププログラマ３０は、プロセッサ２０及び非キャッシュ化アドレスメモリ３２と協働して、非キャッシュ化メモリ位置をマツプする。非キャッシュ化アドレスメモリ３２は、種々の型のキャッシュコヒーレンジ問題を回避するために非キャッシュ化されるシステムメモリの領域を指定するために利用される。バス要求ロジック回路４２は、要求されたデータがキャッシュメモリ２８に位置せず、システムメモリへのアクセスが必要とされる時のような状況でホストバス４４へのアクセスを要求するために、プロセッサ２０及び関連の要素によって利用される。A non-cached memory address (NCA) map programmer 30 is a processor 20 and non-cached address memory 32, the non-cached memory Map your location. Uncached address memory 32 includes various types of caches. areas of system memory that are uncached to avoid cache coherence problems. Used to specify the area. The bus request logic circuit 42 receives the requested data. is not located in cache memory 28 and requires access to system memory. To request access to host bus 44 in situations such as when 20 and related elements.

主メモリ配列即ちシステムメモリ５８はホストバス４４に結合される。主メモリ配列５８は、好ましくはダイナミック・ランダムアクセスメモリである。メモリ５８はＥＩＳＡパスバッファ（ＥＢＢ）データバッファ回路６０、メモリ制御回路６２及びメモリマツパ６８を経てホストバス４４にインタフェースする。このバッファ６０は、データを転送しパリティを生成及び検査する機能を実行する。A main memory array or system memory 58 is coupled to host bus 44 . main memory Array 58 is preferably a dynamic random access memory. memory 58 is an EISA pass buffer (EBB) data buffer circuit 60 and a memory control circuit. 62 and memory mapper 68 to host bus 44 . this Buffer 60 performs the functions of transferring data and generating and checking parity.

メモリ制御器６２及びメモリマツパ６８はアドレスマルチプレクサ及びコラムアドレスストローブ（ＡＤＤＲ／ＣＡＳ）バッファ６６及び７リアドレスストローブ（ＲＡＳ）イネーブルロジック回路６４を経由してメモリ５８にインタフェースする。The memory controller 62 and memory mapper 68 include an address multiplexer and a column addressr. Address strobe (ADDR/CAS) buffer 66 and 7 rear address strobe Interfaces to memory 58 via RAS enable logic circuit 64. To do so.

図面において、システムＣは、プロセッサバス２６又はホストバス４４、拡張型産業標準アーキテクチャ（ＥＩＳＡ）バス４６　（図３）及びＸバス９０（図３）を持つものとして形成される。図３に示され以下に詳述されないシステムの部分の詳細は、充分にコンフィグされたコンピュータシステムの例を説明したものより本発明にとって重要でない。図３に示されたシステムＣの部分は、必要なＥＩＳＡバス４６、ＥＩＳＡバス制御器４８ＥＢＢデータバッファ５０及びアドレスラッチ及びバッファ５２として参照するＥＩＳＡバス４６及びホストバス４４間をインタフェースするデータラッチ及びトランシーバを含む必須的に形成されたＥＩＳＡシステムである。また、図２には、ＥＩＳＡ基準のコンピュータシステムで使われた数々の要素と協働する集積されたシステム周辺機器（ＩＳＰ）５４が示される。In the drawings, system C includes processor bus 26 or host bus 44, an extended Industrial Standard Architecture (EISA) bus 46 (Figure 3) and X bus 90 (Figure 3) ). Parts of the system shown in Figure 3 and not detailed below The details describe an example of a fully configured computer system. less important to the invention. The part of system C shown in FIG. ISA bus 46, EISA bus controller 48 EBB data buffer 50 and address EISA bus 46 and host bus 44 referred to as slatch and buffer 52 The data latches and transceivers that interface between It is an EISA system. Figure 2 also shows the EISA standard computer system. An integrated system peripheral (ISP) that works with a number of elements used in the system5 4 is shown.

集積されたｌ５Ｐ５４は、プロセッサ２０へのアクセスを必要としないで、主メモリ５８　（図１）又はＥＩＳＡスロットに差し込まれるメモリ及び入力／出力（Ｉｌｏ）位置間のアクセスを制御するために直接メモリアクセス制御器５６を含む。また、ｌ５Ｐ５４は、割込み制御器７０、マスクされない割込みロジック７２及び割込み信号の制御を許容し、ＥＩＳＡ仕様書及び従来の慣習に従った方法で必要なタイミング信号及びウェイト状態を発生するシステムタイマ７４を含む。好ましい実施例では、割込み要求を発生したプロセッサが、従来のインテル８２５９割込み制御器をエミュレートし、拡張した２つの割り込み制御回路を経由して制御される。また、ｌ５Ｐ５４はバス制御器４８と協力して、ＥＩＳＡバス４６に配置されるキャッシュ制御器２４、ＤＭＡ制御器５６及びバスマスクデバイスによるＥＩＳＡバス４６用の種々の要求間で制御し仲介するバス仲介ロジック７５を含む。The integrated I5P54 provides access to the main system without requiring access to the processor 20. Memory and input/output inserted into memory 58 (Figure 1) or EISA slot (Ilo) direct memory access controller 56 to control access between locations; include. Also, l5P54 is the interrupt controller 70, unmasked interrupt logic. 72 and interrupt signals and comply with EISA specifications and conventional practices. The system includes a system timer 74 that generates the timing signals and wait states required by the method. nothing. In the preferred embodiment, the processor that generated the interrupt request is a traditional Intel It emulates the 8259 interrupt controller and goes through two extended interrupt control circuits. controlled by The l5P54 also cooperates with the bus controller 48 to The cache controller 24, DMA controller 56 and bus mask device located in the bus 46 A bus mediation logic that controls and mediates between the various requests for the EISA bus 46 by devices. 75.

７８と、ＩＳＡ及びＥＩＳＡデータ・バス８０及び８２とを含み、ＥＩＳＡバス４６からＩＳＡ制御制御バス上６力してＸバス９０を経てインタフェースされる。Ｘバス９０のための制御及びデータ／アドレス転送は、Ｘバス制御ロジック９２、データバッファ９４及びアドレスバッファ９６によって容易にされる。78 and ISA and EISA data buses 80 and 82; 46 on the ISA control bus and interfaced via the X bus 90. . Control and data/address transfers for the XBus 90 are provided by the XBus control logic 9. 2, facilitated by data buffer 94 and address buffer 96.

Ｘバスには好適なキーボード及びマウスをコネクタ１００及び１０２を経由してＸバス９０に各々インタフェースするキーボード／マウス制御器９８のような種々の周辺デバイスが取付けられる。またＸバスには、システムＣおよびシステムビデオ操作のために基本操作ソフトウェアを含む読出専用メモリ即ちＲＯＭ回路１０６が取付られる。また、直列通信ボート１０８は、Ｘバス９０を経てシステムＣに接続している。ブロック回路１１０には、フロッピ・ディスクサポート、並列ポート、第２の直列ポート及びビデオサポート回路がある。A suitable keyboard and mouse are connected to the X-bus via connectors 100 and 102. Species, such as keyboard/mouse controllers 98, each interface to an X bus 90. Various peripheral devices are attached. The X bus also includes system C and system Read-only memory or ROM circuit containing basic operating software for video operation 106 is attached. Further, the serial communication boat 108 connects to the system via the X bus 90. connected to system C. The block circuit 110 includes a floppy disk support, There is a parallel port, a second serial port and video support circuitry.

コンピュータシステムＣは、ディスク配列制御器１１２、固定ディスクコネクタ１１４及び固定ディスク配列１１６を含むディスクサブシステム１１１を含む。Computer system C includes a disk array controller 112 and a fixed disk connector. 114 and a fixed disk array 116 .

ディスク配列制御器１１２はＥＩＳＡバス４６に好ましくはスロットによって接続されて、ＥＩＳＡバス４６を通してデータ及びアドレス情報を通信する。固定ディスクコネクタ１１４はディスク配列制御器１１２に接続し、順に固定ディスク配列１１６に接続している。Disk array controller 112 is connected to EISA bus 46, preferably by a slot. are connected to communicate data and address information through the EISA bus 46. fixed Disk connector 114 connects to disk array controller 112, which in turn connects to fixed disks. is connected to the network array 116.

今図４を参照して、本発明の詳細な説明するために使われたディスクサブシステム１１１が示される。ディスク配列制御器１１２は、好ましくはインテル８０１８６のローカルプロセッサ１３０を持っている。このローカルプロセッサ１３０は多重化されたアドレス／データバスＵＡＤ及び制御出力ＵＣを持っている。多重化されたアドレス／データバスＵＡＤは、出力がローカルプロセッサデータ・バスＵＤであるトランシーバ１３２に接続している。また、多重化されたアドレス／データバスＵＡＤは、Ｑ出力がローカルプロセッサアドレス・バスＵＡを形成するラッチ１３４のＤ入力にも接続している。ローカルプロセッサ１３０は、多重化されたアドレス／データバスＵＡＤ及びアドレスデータ・バスＵＡを経て結合されるランダムアクセスメモリ　（ＲＡＭ）１３６と協働する。ＲＡＭ１３６は適切なタイミング信号を展開するために、プロセッサ制御バスＵＣに接続している。同様に読出専用メモリ　（ＲＯＭ）１３８は、多重化されたアドレス／データバスＵＡＤ、プロセッサアドレス・バスＵＡ及びプロセッサ制御バスＵＣに接続している。このように、ローカルプロセッサ１３０は、そのデータ記憶用及び操作を制御するために自前の局所メモリを持っている。プログラムできる配列ロジック（ＰＡＬ）デバイス１４０は、ディスク配列制御器１１２で利用された追加の制御信号を展開するために、ローカルプロセッサ制御バスＵＣに接続している。Referring now to FIG. 4, the disk subsystem used to provide a detailed description of the present invention. 111 is shown. Disk array controller 112 is preferably an Intel 801 It has 86 local processors 130. This local processor 130 has a multiplexed address/data bus UAD and control outputs UC. Many The duplicated address/data bus UAD provides outputs for local processor data and It is connected to transceiver 132, which is bus UD. Also, multiplexed addresses address/data bus UAD whose Q output forms the local processor address bus UA. It is also connected to the D input of the latch 134. The local processor 130 is Via multiplexed address/data bus UAD and address data bus UA It cooperates with an associated random access memory (RAM) 136. RAM13 6 connects to the processor control bus UC to develop the appropriate timing signals. ing. Similarly, read-only memory (ROM) 138 stores multiplexed addresses/ Data bus UAD, processor address bus UA and processor control bus UC is connected to. In this way, local processor 130 has a and has its own local memory to control operations. Programmable arrangement A column logic (PAL) device 140 is utilized in the disk array controller 112. connected to the local processor control bus UC to deploy additional control signals. ing.

また、ローカルプロセッサアドレス・バスＵＡ、ローカルプロセッサデータ・バスＵＤ及びローカルプロセッサ制御バスＵＣは、バスマスクインターフェイス制御器（ＢＭＩＣ）１４２に接続している。８ＭＩＣ１４２はＥＩＳＡ又はＭＣＡバスのような標準バスにディスク配列制御器１１２をインタフェースする機能を提供し、バスマスタとして作用する。好ましい実施例では、８ＭＩＣ１４２がＥＩＳＡバス４６にインタフェースされるインテル８２３５５である。このように８ＭＩＣ１４２は、ローカルプロセッサバスＵＡ、ＵＤ及びＵＣとの接続によってローカルプロセッサ１３０にインタフェースして、データ及び制御情報がホストシステムＣ及びローカルプロセッサ１３０間で通過することができる。Also, local processor address bus UA, local processor data bus bus UD and local processor control bus UC are bus mask interface based. It is connected to the controller (BMIC) 142. 8MIC142 is EISA or MCA The ability to interface the disk array controller 112 to a standard bus such as and act as a bus master. In the preferred embodiment, 8 MIC142 An Intel 82355 interfaced to ISA bus 46. in this way 8MIC142 is connected to local processor buses UA, UD and UC. interfaces to the local processor 130 so that data and control information can be can be passed between host system C and local processor 130.

更に、ローカルプロセッサデータ・バスＵＤ及びローカルプロセッサ制御バスＵＣは好ましくは転送制御器１４４に接続されている。一般に転送制御器１４４は、ディスク配列制御器１１２にある種々の他のデバイス及び転送バッファＲＡＭ１４６間でデータを転送するために用いられた特殊化されたマルチチャンネル直接メモリアクセス（ＤＭＡ）制御器である。例えば転送制御器１４４はＢＭＩＣデータラインＢＤ及びＢＭＩＣ制御ラインうＣによって８ＭＩＣ１４２に接続される。このように、このインターフェイス上でＲＥＡＤ操作が要求されるならば、転送制御器１４４は、転送バッファＲＡＭ１４６から８ＭＩＣ１４２にデータを転送することができる。ＷＲＩＴＥ操作が要求されるならば、データは、８ＭＩＣ１４２から転送バッファＲＡＭ１４６に転送することができる。その後転送制御器１４４は、転送バッファＲＡＭ１４６からディスク配列１１６にこの情報を渡すことができる。転送制御器１４４は、米国出願第４３１，７３５号及び欧州対応の１９９１年４月４日に公開された欧州特許出願公開番号０４２７１１９でより詳細に記述され、これらをここに参照する。Additionally, a local processor data bus UD and a local processor control bus U C is preferably connected to transfer controller 144. Generally, the transfer controller 144 , various other devices in disk array controller 112 and transfer buffer RAM. A specialized multi-channel direct link used to transfer data between It is a direct memory access (DMA) controller. For example, the transfer controller 144 is a BMIC Connected to 8MIC142 by data line BD and BMIC control line C It will be done. Thus, if a READ operation is requested on this interface , the transfer controller 144 transfers data from the transfer buffer RAM 146 to the 8MIC 142. can be transferred. If a WRITE operation is requested, the data is 8M It can be transferred from the IC 142 to the transfer buffer RAM 146. then transfer Controller 144 transfers this information from transfer buffer RAM 146 to disk array 116. can be passed. Transfer controller 144 is described in U.S. Application No. 431,735 and European patent application publication number 0427119 published on April 4, 1991 corresponding to the state These are described in more detail in , and are referred to here.

転送制御器１４４は、ディスクデータ・バスＤＤ及びディスクアドレス・バス及び制御バスＤＡＣを含む。ディスクアドレス及び制御バスＤＡＣは、固定ディスクコネクタ１１４の部分である２つのバッファ１６５及び１６６に接続して、転送制御器１４４及びディスク配列１１６間で制御信号を授受するために用いる。The transfer controller 144 includes a disk data bus DD and a disk address bus. and control bus DAC. The disk address and control bus DAC is a fixed disk Connector 114 is connected to two buffers 165 and 166 for transfer. It is used to exchange control signals between the transmission controller 144 and the disk array 116.

ディスクデータ・バスＤＤは、固定ディスクコネクタ１１４の部分である２つのデータトランシーバ１４８及び１５０に接続している。トランシーバ１４８及び転送ノ＜・ノファ１４６の出力が、２つのディスク・ドライブポートコネクタ１５２及び１５４に接続している。同様のファッションにおいて２つのコネクタ１６０及び１６２はトランシーバ１５０及びノ（・ノファ１６６の出力に接続している。２つのハードディスクは、各コネクタ１５２．１５４．１６０及び１６２に接続することができる。このように、８つまでのディスク・ドライブは転送制御器１４４に接続及び結合することができる。後述するように、好ましい実施例では、５つのディスク・ドライブは、転送制御器１４４に結合されて、４＋１マツピングスキームが使われる。Disk data bus DD includes two Connected to data transceivers 148 and 150. Transceiver 148 and The output of the transfer port 146 is connected to the two disk drive port connectors 1. 52 and 154. Two connectors 1 in similar fashion 60 and 162 are connected to the outputs of transceiver 150 and 166. There is. Two hard disks are connected to each connector 152.154.160 and 162 can be connected to. In this way, up to eight disk drives can be transferred The controller 144 can be connected and coupled to the controller 144 . As discussed below, preferred embodiments In this case, five disk drives are coupled to transfer controller 144 to form a 4+1 disk drive. A tuping scheme is used.

記述的ディスク配列システム１１２においては、互換性ポート制御器ＣＰＣ１６４がＥＩＳＡバス４６にも接続している。In descriptive disk array system 112, compatibility port controller CPC16 4 is also connected to the EISA bus 46.

ＣＰＣ１６４は、互換性データラインＣＤ及び互換性制御ラインＣＣに亙って転送制御器１４４に接続している。そこでＣＰＣ１６４は、ＥＩＳＡ特定の空間上にアドレスされて、非常に高いスルーブツトを許容するディスク配列制御器１１２及びその８ＭＩＣ１４２を持たない従来のコンピュータシステムのために書かれたソフトウェアがソフトウェアの書き直しを要求しないで操作できるように形成される。このように、ＣＰＣ１６４は、ハードディスクとのインタフェースに既に利用した種々の制御ポートをエミュレートする。The CPC 164 transfers across the compatibility data line CD and the compatibility control line CC. It is connected to the transmission controller 144. Therefore, CPC164 is a disk array controller 11 that is addressed to allow very high throughput. 2 and its 8 written for legacy computer systems that do not have MIC142. formatted software so that it can operate without requiring software rewrites. will be accomplished. In this way, the CPC 164 interfaces with the hard disk. Emulate the various control ports you have already used.

今図５を参照して、転送制御器１４４は、それ自身一連の分離回路ブロックを含む。転送制御器１４４は、ＲＡＭ制御器１７０及びディスク制御器１７２として参照される２つのメイン・ユニットを含む。ＲＡＭ制御器１７０は、転送バッファＲＡＭ１４６へのアクセスを持つ種々のインターフェイスデバイスを制御するためのアービタと、データが転送バッファＲＡＭ１４６を行き来できるようにするマルチプレクサとを持っている。Referring now to FIG. 5, transfer controller 144 itself includes a series of separate circuit blocks. nothing. The transfer controller 144 functions as a RAM controller 170 and a disk controller 172. Contains two main units referenced. The RAM controller 170 has a transfer buffer. control various interface devices that have access to the RAM 146 and an arbiter for data transfer to and from the transfer buffer RAM 146. It has a multiplexer.

同様に、ディスク制御器１７２は、種々のデバイスのどれが集積ディスクインターフェイス１７４へのアクセスを持つかを決定するアービタを含み、データが集積ディスクインターフェイス１７４を通して前後に適切に転送できる能力を多重化することを含む。Similarly, disk controller 172 determines which of the various devices are integrated disk interfaces. - Contains an arbiter to determine who has access to the face 174, and where the data is collected. multiplexing capabilities for proper forward and backward transfer through the product disk interface 174. including becoming

転送制御器１４４は、好ましくは７つのＤＭＡチャンネルを含む。１つのＤＭＡチャンネル１７６は、ＢＭＩＣ１４２と協力するために割り当てされる。第２のＤＭＡチャンネル１７８は、ＣＰＣ１６４と協働するために設計される。これらの２つのデバイス即ちＢＭＩＣ１４２及びバス互換性ポート制御器１６４は、それらの適切なりＭＡチャンネル１７６及び１７８及びＲＡＭ制御器１７０を通して転送バッファＲＡＭ１４６にのみ結合される。ＢＭＩＣ１４２及び互換性ポート制御器１６４は集積ディスクインターフェイス１７４及びディスク配列１１６への直接アクセスを持っていない。ローカルプロセッサ１３０（図３）は、ローカルプロセッサＤＭＡチャンネル１８０を通してＲＡＭ制御器１７０に接続され、ローカルプロセッサディスクチャンネル１８２を通してディスク制御器１７２に接続されている。このように、要望されるように、ローカルプロセッサ１３０は、転送バッファＲＡＭ１４６及びディスク配列１１６に接続している。Transfer controller 144 preferably includes seven DMA channels. 1 DMA Channel 176 is assigned to cooperate with BMIC 142. second DMA channel 178 is designed to work with CPC 164. these The two devices, BMIC 142 and bus compatibility port controller 164, through their appropriate MA channels 176 and 178 and RAM controller 170. and is coupled only to the transfer buffer RAM 146. BMIC142 and compatible port The controller 164 has integrated disk interface 174 and disk array 116. do not have direct access to. Local processor 130 (FIG. 3) is connected to the RAM controller 170 through a processor DMA channel 180. , the disk controller 172 through the local processor disk channel 182. It is connected to the. Thus, as desired, the local processor 130 is connected to transfer buffer RAM 146 and disk array 116.

更に、転送制御器１４４は、情報が個々に同時にディスク配列Ａ及びＲＡＭ１４６の間で通過できる４つのＤＭＡ／ディスクチャンネル１８４．１８６．１８８及び１９０を含む。また、第４のＤＭＡ／ディスクチャンネル１９０は、パリティ操作がローカルプロセッサ１３０での計算を必要としないで、転送制御器１４４で容易に実行することができるように、排他的論理和能力を含むことが注目される。上記コンピュータシステムＣ及びディスク配列サブシステム１１１は、本発明の方法の実施のために好ましいコンピュータシステムを代表する。Further, the transfer controller 144 may individually and simultaneously transfer information to the disk array A and the RAM 14. 6 DMA/disk channels that can pass between 184.186.188 and 190. Additionally, the fourth DMA/disk channel 190 has a parity transfer controller 14 without requiring calculations on local processor 130. It is noted that it includes an exclusive-or capability so that it can be easily implemented in It will be done. The computer system C and the disk array subsystem 111 are 2 represents a preferred computer system for implementing the method of the invention.

コンピュータシステムＣは、他のオーニスが使われてもよいが、好ましくは、ＵＮＩＸオーニスを利用する。背景に記述されたように、ＵＮＩＸオーニスは、メイクファイルシステムプログラムとして参照されたサービスを含む。好ましい実施例では、メイクファイルシステムプログラムは、ｌＮ０ＤＥが創られた数及びｌＮ０ＤＥのサイズに関する情報をディスク制御器１１２に供給する。選択的にメイクファイルシステムは、小さいストライプ及び大きいストライプ領域における所望のストライプサイズと、これらの領域を分割する境界とに関する情報をディスク制御器１１２に知らせるに十分な知能を含む。既に述べたように、ｌＮ０ＤＥの数は、システムに許されるべきファイルの数に略等しい。ディスク配列制御器１１２は、この情報を配列１１６を備えた各ディスク上にファイルシステムを展開するために用いる。Computer system C is preferably U, although other ornis may be used. Use NIX ornis. As mentioned in the background, UNIX ornis Includes services referred to as file system programs. favorable fruit In the example, the makefile system program determines the number and number of lN0DEs created. Information regarding the size of the IN0DE is provided to the disk controller 112. selectively The makefile system supports small stripe and large stripe areas. information about the desired stripe size and the boundaries dividing these areas. Contains sufficient intelligence to inform disk controller 112. As already mentioned, lN0 The number of DEs is approximately equal to the number of files that should be allowed in the system. Disk arrangement system Controller 112 stores this information in a file system on each disk with array 116. Used to expand.

ディスク配列制御器１１２は、ディスク配列１１６を小さいストライプ及び大きいストライブ領域に区画する本発明に従う多重マツピングスキームを用いる。小さいストライブ領域は、好ましくは各ディスクの第１のＮセクタを占有して、ｌＮ０ＤＥデータ構造のために予約され、配列における残りのストライプが、データ記憶用に用いられる自由空間を備えた大きいストライブ領域を形成する。それ故、好ましい実施例では、ディスク制御器１１２は、小さいストライプ領域用に、配列における各ディスクの第１のＮセクタを割りあてる。各ディスクの残りのセクタが、大きいストライブ領域にフォーマット化される。ディスク配列制御器１１２は、ＲＡＭ１３６における小さいストライプ及び大きいストライブ領域を分割する境界を記憶する。その後ｌＮ０ＤＥデータ構造が各ディスクに書き込まれる時には、ディスク配列制御器１１２はこの境界を利用して、配列１１６の小さいストライプ部分にｌＮ０ＤＥを書き込む。Disk array controller 112 divides disk array 116 into small stripes and large stripes. We use a multiple mapping scheme according to the invention that partitions into striped regions. small The small stripe area preferably occupies the first N sectors of each disk and Reserved for the N0DE data structure, the remaining stripes in the array Creates a large stripe area with free space used for data storage. that Therefore, in the preferred embodiment, the disk controller 112 is configured for small stripe areas. , allocate the first N sectors of each disk in the array. the rest of each disk Sectors are formatted into large striped areas. disk array controller 112 represents the small stripe and large stripe areas in RAM 136. Remember the dividing boundary. The lN0DE data structure is then written to each disk. disk array controller 112 uses this boundary to limit the size of array 116. Write lN0DE to the small stripe portion.

この様に、ｌＮ０ＤＥが更新する時はいつでも、結果の操作がフルなストライプ書込である。これは、既に述べたように、部分的ストライプ書込操作は一般に先行する読出操作を必要とし、性能を減少するので、システム性能を増大させる。In this way, whenever lN0DE updates, the resulting operation is a full stripe. This is writing. This is because, as already mentioned, partial stripe write operations are generally This increases system performance because it requires additional read operations and reduces performance.

本発明の代わりの実施例では、小さいストライブ領域が各ディスクの第１のＮセクタを占有しないが、むしろ小さいストライブ領域が、大きいストライプ領域間に散在させられた複数の領域を含む。In an alternative embodiment of the invention, a small stripe area is located in the first N sections of each disk. Rather, small stripe regions occupy space between large stripe regions. Contains multiple areas scattered throughout.

この実施例では、小さいストライプ及び大きいストライブ領域を分割する複数の境界は、ディスク制御器１１２が小さいストライブ領域にｌＮ０ＤＥデータ構造を書き込むことができるように、ＲＡＭ１３６に記憶される。In this example, there are multiple stripes dividing the small stripe and large stripe area. The boundary is that the disk controller 112 places the lN0DE data structure in the small stripe area. is stored in the RAM 136 so that it can be written.

本発明の代わりの実施例でＯ５／２オーニスは使われる。この実施例では、上述されたメイクファイルシステムプログラムに類似したＯ５／２サービスは、ＦＮＯＤＥが創られた数及びＦＮＯＤＥのサイズに関する情報をディスク制御器１１２に供給する。その後ディスク制御器は、上述されたものに類似した多重マツピングスキームを用いて、ディスク配列１１６を小さいストライプ及び大きいストライブ領域に分割し、小さいストライブ領域がＦＮＯＤＥデータ構造のための予約である。本発明がどの種類のオーニスまたはファイルシステムとともに操作できることは注目される。In an alternative embodiment of the invention, an O5/2 ornis is used. In this example, the above The O5/2 service, which is similar to the makefile system program created by FN Information regarding the number of ODEs created and the size of the FNODE is sent to the disk controller 11. Supply to 2. The disk controller then uses multiple map pins similar to those described above. The disk array 116 is divided into small stripes and large strips using a Split into live regions, with a small stripe region reserved for the FNODE data structure. It is about. What kind of ornis or file system can the invention operate with? What you can do will get noticed.

図１を再び参照して２キロバイトのサイズを持つｌＮ０ＤＥデータ構造が先行技術による２キロバイトの均一ディスクストライプサイズを持つディスク配列の例えばストライプ０に書き込まれる時には、全ｌＮ０ＤＥがディスク０のセクタ０．１．２及び３に書き込まれ、ディスク１及び２が未使用される。効果的にこの操作は、１つのディスクのみがアクセスされるので、ディスク配列システムという長所を否定する。また、未使用空間の量がかなりになって、ディスク・ドライブの非効率的な使用になる。もし、データが順次ストライプの残り即ちディスク１及び２にあるストライプの部分に書込みできるならば、このストライプにおける結果の操作が、上述されるように性能不利益を持つ部分的ストライプ書込操作から成る。Referring again to Figure 1, the lN0DE data structure with a size of 2 kilobytes is the prior art. Example of a disk array with a uniform disk stripe size of 2 KB using the technique For example, when writing to stripe 0, all lN0DEs are written to sector 0 of disk 0. ．． 1.2 and 3, leaving disks 1 and 2 unused. effectively this The operation is called a disk array system because only one disk is accessed. deny the advantages of Also, the amount of unused space can become significant and the disk drive This results in inefficient use of space. If the data is sequential stripe remaining i.e. disk If it is possible to write to the part of the stripe in 1 and 2, then in this stripe The resulting operation is a partial stripe write operation with a performance penalty as described above. Consists of.

合図６を参照して、本発明の一実施例に従う多重ストライプサイズを利用している３＋１マツピングスキームを説明している図が示される。図６は、実施例のみであり、ディスク配列１ｌ６が非常に多い数の各サイズのストライプを用いることが注目される。好ましい実施例で使われたディスク・ドライブは、各々が５１２バイトの記憶を持つ多くのセクタを含む。図６で示された実施例では、ディスク配列は、２つのストライプサイズ即ち、１キロバイトのディスクストライプサイズおよび２キロバイトのディスクストライプサイズを利用する。図６に示されるように、ストライプＯ１１及び２は、完全なストライプ毎に合計６セクタ又は３キロバイトのデータ記憶のために、ディスク０．１及び２におけるディスク毎に２つのセクタを用いた１キロバイトのディスクストライプサイズを利用する。Referring to Figure 6, utilizing multiple stripe sizes according to an embodiment of the present invention. A diagram illustrating a 3+1 mapping scheme is shown. Figure 6 is only an example. , and the disk array 1l6 uses a very large number of stripes of each size. is attracting attention. The disk drives used in the preferred embodiment each have 51 Contains many sectors with 2 bytes of storage. In the embodiment shown in FIG. The disk array has two stripe sizes: 1 kilobyte disk stripe size. and a disk stripe size of 2KB. Shown in Figure 6 Stripes O11 and 2 have a total of 6 sectors per complete stripe or per disk on disks 0.1 and 2 for 3 kilobytes of data storage. A disk stripe size of 1 kilobyte with two sectors is used.

更に、ストライプ０，１及び２は、各ストライプ用にパリティ情報を記憶するためにディスク３の２つのセクタを利用する。ストライプ３〜６は、２キロバイトのディスクストライプサイズを利用し、ディスク０，１及び２上のディスク毎に４セクタが各ストライプ毎にデータ記憶に割りあてられ、ディスク３上の４セクタが各ストライプのためにパリティ情報として予約される。Additionally, stripes 0, 1 and 2 are used to store parity information for each stripe. Two sectors of disk 3 are used for this purpose. Stripes 3-6 are 2KB for each disk on disks 0, 1, and 2 using a disk stripe size of Four sectors are allocated for data storage for each stripe, and four sectors are allocated for data storage on disk 3. is reserved as parity information for each stripe.

この実施例ではｌＮ０ＤＥデータ構造は、小さいストライプサイズ即ちストライプ０．１又は２を持つディスク配列の部分に書き込まれる。既に述べたように、ｌＮ０ＤＥ構造は、好ましい実施例ではサイズで２キロバイトであると仮定される。In this example, the lN0DE data structure has a small stripe size, i.e. 0.1 or 2 of the disk array. As already mentioned, The lN0DE structure is assumed to be 2 kilobytes in size in the preferred embodiment. Ru.

それ故国６に示されるように、ストライプＯ１１又は２に書き込まれたｌＮ０ＤＥ構造はディスクＯ及び１の領域を一杯にし、ディスク２が一般に未使用され、ディスク３が各パリティ情報を記憶するために使われる。この実施例においてはデータはｌＮ０ＤＥがそこで書き込まれた後に、各ストライプのディスク２への書込が許可されない。このように、部分的ストライプ書込操作は、発生が防止される。それ故、ディスク配列１１６の部分のより小さいストライプサイズを用いて、ｌＮ０ＤＥが書き込まれた後にデータが未使用空間に書き込まれることを防止することによって、これらの構造の書込操作がフルなストライプ書込をエミュレートする。しかしながら、このフルなストライプ書込中には、ディスク２が未使用或は書込まれず、かくしてディスクエリアの非効率的な使用に結果することは注目される。更に、ディスク２がｌＮ０ＤＥ構造が書き込まれる限り各小さいストライプ毎に一般に未使用されるので、ディスク配列システムからのデータ転送帯域幅が減少され、配列がこれらの例で２＋１マツピングスキームとして必須的に動作する。Therefore lN0D written in stripe O11 or 2 as shown in country 6 The E structure fills up space on disks O and 1, leaving disk 2 generally unused and Disk 3 is used to store each parity information. In this example Data is written to disk 2 of each stripe after lN0DE is written there. Writing is not allowed. In this way, partial stripe write operations are prevented from occurring. It will be done. Therefore, using a smaller stripe size for portions of disk array 116 to prevent data from being written to unused space after lN0DE is written. By stopping, write operations on these structures emulate full striped writes. Rate. However, during this full stripe write, disk 2 remains unused. not being used or written, thus resulting in inefficient use of disk area. attracts attention. Furthermore, each small disk 2 has a lN0DE structure written to it. Because each stripe is generally unused, data transfer from disk array systems is The transmission bandwidth is reduced and the array is required as a 2+1 mapping scheme in these examples. It works properly.

この問題の１つの解は、図７で示されるように、各ストライプの異なるディスクを横断して未使用空間を分配することである。この様に、データ転送帯域幅の減少は、各ディスクが略均等に使われるので重要でない。しかしながら、各ディスクアクセスが未使用されたディスクを伴うので、データ転送帯域幅は準最適である。さらにこの方法は、未使用ディスク空間の望ましくない量を生成する。One solution to this problem is to use different disks in each stripe, as shown in Figure 7. The goal is to distribute unused space across the In this way, data transfer bandwidth is reduced. The smaller number is not important since each disk is used approximately equally. However, each disk Data transfer bandwidth is sub-optimal since the disk accesses involve unused disks. Ru. Furthermore, this method generates an undesirable amount of unused disk space.

本発明の好ましい実施例ではメ８で示されるように、４＋１マツピングスキームは使われる。図８が例示のみであり、好ましい実施例のディスク配列１１６が各小さいストライプ及び大きいストライプ領域における相当多数のストライプを利用することは、再び注目される。小さいストライプ領域におけるストライプ即ちストライプＯ〜４のディスクストライプサイズは５１２バイトであり、ストライプサイズは、ストライプに割りあてられた各ディスクの量として定義される。この様に、小さいストライプ領域における各完全なストライプは、ｌＮ０ＤＥ構造のサイズに略等価である２キロバイトのデータを正確に保持する。従って、ｌＮ０ＤＥ構造がディスク配列１１６における小さいストライプに書き込まれる時には、フルなストライプ書込操作が実行され、ｌＮ０ＤＥがどの未使用空間なしでも全ストライプを占有する。この様にディスク配列１１６の帯域幅はどのディスクも各アクセスを共にし、どんな未使用空間も生じないので、最適に使われる。In a preferred embodiment of the present invention, a 4+1 mapping scheme is used, as shown in M8. is used. FIG. 8 is illustrative only; the preferred embodiment disk array 116 is Utilizing a significant number of stripes in small stripes and large stripe areas Once again, the use of Stripes in small stripe areas i.e. The disk stripe size for stripes O to 4 is 512 bytes; stripe size is defined as the amount of each disk allocated to a stripe. child As in, each complete stripe in a small stripe region has an lN0DE structure It holds exactly 2 kilobytes of data, which is roughly equivalent to the size of . Therefore, lN When a 0DE structure is written to a small stripe in disk array 116 If a full stripe write operation is performed and the lN0DE is empty without any unused space, also occupies the entire stripe. In this way, the bandwidth of the disk array 116 is The blocks also share each access and are optimally used since no unused space is created.

図４を再び参照して、好ましい実施例においては、ディスク要求は、好ましくはＥＩＳＡバス４６及び８ＭＩＣ１４２を通してディスク配列制御器１１２にシステムプロセッサ２０によって提出される。８ＭＩＣ１４２を通してこの要求を受信したローカルプロセッサ１３０は、ローカルプロセッサＲＡＭメモリ１３６にデータ構造を確立する。このデータ構造はコマンドリストとして知られ、ディスク配列１１６に向けた単純なＲＥＡＤ或いはＷＲＩＴＥ要求であってもよく、或いは多重ＲＥＡＤ／ＷＲＩＴＥ又は診断及びコンフィギユレーション要求を含むより精巧な要求セットであってもよい。その後コマンドリストが処理用のローカルプロセッサ１３０に提出される。その後ローカルプロセッサ１３０がデータの転送を含むコマンドリストの実行を監督する。コマンドリストの実行が完了した時にはローカルプロセッサ１３０は、システムマイクロプロセッサ２０で動作しているオーニスデバイスドライバに通知する。コマンドリストの提出及びコマンドリスト完了の通知は、８ＭＩＣ１４２に配ｌされたＩ１０レジスタを用いるプロトコルによって達成される。Referring again to FIG. 4, in the preferred embodiment, disk requests are preferably The system is connected to the disk array controller 112 through the EISA bus 46 and 8 MICs 142. system processor 20. This request is received through 8MIC142. The local processor 130 that received the information stores the information in the local processor RAM memory 136. Establish data structures. This data structure is known as a command list and is It may be a simple READ or WRITE request to the array 116; or multiple READ/WRITE or diagnostic and configuration requests. It may also be a more elaborate set of requirements. The command list then becomes the locale for processing. file processor 130. Local processor 130 then processes the data. Supervise the execution of command lists, including transfers. Command list execution completed Sometimes local processor 130 operates on system microprocessor 20. ornis device driver. Submission of command list and commands The notification of the completion of the list is sent by a program using the I10 register allocated to the 8MIC142. Achieved by rotocol.

ディスク配列制御器１１２によって実行されたＲＥＡＤ及びＷＲＩＴＥ操作は、ローカルプロセッサ１３０で動作している多くのアプリケーションタスクとして実行される。相互作用的なＩ１０１００性質の為に、例示的コンピュータシステムＣがローカルプロセッサ１３０上の単一バッチタスクとしてディスクコマンドを処理することが非実用的である。従ってローカルプロセッサ１３０は、本発明の方法を含む多重タスクがローカルプロセッサ１３０によってアドレスできる実時間マルチタスク使用システムを利用する。好ましくはローカルプロセッサ１３０上のオーニスは、コダック社によるＡＭＸ８６マルチタスク管理プログラムである。ＡＭＸオーニスのカーネルは、本発明の方法でセットされたアプリケーションに加えて多くのシステムサービスを供給する。READ and WRITE operations performed by disk array controller 112 are As many application tasks running on local processor 130 executed. Due to the interactive nature of the I10100, an exemplary computer system System C executes disk commands as a single batch task on local processor 130. It is impractical to process. Therefore, the local processor 130 An implementation that allows multiple tasks to be addressed by local processor 130, including methods of Utilize time multitasking system. Preferably local processor 13 Ornis on 0 is an AMX86 multitasking management program by Kodak. be. The AMX Ornis kernel is an application configured using the method of the present invention. in addition to providing many system services.

合図９を参照して、知的ディスク配列制御器１１２を含むコンピュータシステムＣで実行されるＷＲＩＴＥ操作のフローチャート図が示される。ＷＲＩＴＥ操作は、アクティブなプロセス又はアプリケーションがディスクデバイスドライバに渡されたＷＲＩＴＥ要求をシステムプロセッサ２０に発生させるステップ２００から始める。このディスクデバイスドライバは、コンピュータシステムＣ好ましくはディスクユニットと実インターフェイス操作を実行するシステムメモリ５８内に含まれたソフトウェアの部分である。ディスクデバイスドライバソフトウェアは、必要とされたＩ１０操作を実行するために、特定のタスクを形成するシステムプロセッサ２０の制御を仮定する。Referring to signal 9, a computer system including an intelligent disk array controller 112 A flowchart diagram of a WRITE operation performed in C is shown. WRITE operation is an active process or application that has access to a disk device driver. Step 200 of issuing the passed WRITE request to the system processor 20 Start with This disk device driver is installed on computer system C. system memory 58 that performs disk unit and actual interface operations; It is a part of the software contained within. Disk device driver software The system administrator configures the specific task to perform the required I10 operation. Assume that the system processor 20 is under control.

制御は、ディスクデバイスドライバがシステムプロセッサ２０の制御を仮定し、ＷＲＩＴＥコマンドリストを生成するステップ２０２に遷移する。The control assumes that the disk device driver is under control of the system processor 20, The process moves to step 202 where a WRITE command list is generated.

ステップ２０４では、デバイスドライバがＢＭＩＣ１４２又はＣＰＣ１６４を経てディスク制御器１１２にＷＲＩＴＥコマンドリストを提出する。その後デバイスドライバは、ディスク配列制御器１１２から完了信号を待つために、待ち状態を入る。In step 204, the device driver passes through the BMIC 142 or CPC 164. and submits the WRITE command list to the disk controller 112. then device The driver enters a wait state to wait for a completion signal from the disk array controller 112. enter.

操作の論理的流れは、ローカルプロセッサ１３０がＷＲＩＴＥコマンドリストを受信して、ｌＮ０ＤＥデータ構造がディスク配列１１６に書き込まれているか否かを決めるステップ２０６に進行する。この決定の実行において、ローカルプロセッサは好ましくは小さいストライプ及び大きいストライプ領域間の境界を利用する。本発明の代わりの実施例では、デバイスドライバは、小さいストライプ及び大きいストライプ領域間の境界を利用して、ＷＲＩＴＥコマンドリストにこの情報を協働する知性がデバイスドライバと協働する。もしｌＮ０ＤＥデータ構造がディスク配列１１６に書き込まれているならば、ステップ２０８においてローカルプロセッサ１３０は、小さいストライブ領域へのフルなストライプＷＲＩＴＥ操作のために、ディスク特定ＷＲＩＴＥ命令を確立する。その後制御は転送制御器チップ（ＴＣＣ）１４４がディスク配列１１６に書き込まれていたｌＮ０ＤＥからパリティデータを生成するステップ２１０に遷移する。小さいストライプ領域にｌＮ０ＤＥを書き込む操作がフルなストライプ書込操作として扱われ、かくして部分的ストライプ書込操作と関連するどの先行ＲＥＡＤ操作も遭遇しないことは注目される。その後操作の制御は、ＴＣＣｌ　４４がディスク配列１１６内のディスクにデータ及び最近生成されたパリティ情報を書き込むステップ２１２に遷移する。その後制御は、ローカルプロセッサ１３０は追加のデータがディスク配列１１６に書き込まれるか否かを決めるステップ２１４に遷移する。The logical flow of operations is that local processor 130 executes a WRITE command list. received and whether the lN0DE data structure has been written to disk array 116. The process proceeds to step 206 in which it is determined whether In implementing this decision, local The processor preferably utilizes the boundaries between the small stripe and large stripe areas. do. In an alternative embodiment of the invention, the device driver This can be added to the WRITE command list, taking advantage of the boundaries between large stripe areas. Intelligence that collaborates with information collaborates with device drivers. If lN0DE data structure has been written to disk array 116, the load is performed in step 208. The processor 130 writes a full stripe WRIT to a small stripe area. Establish a disk-specific WRITE command for the E operation. After that, control is transferred The control chip (TCC) 144 was written to the disk array 116. The process moves to step 210 in which parity data is generated from E. small stripes An operation that writes lN0DE to an area is treated as a full stripe write operation, and Thus, any preceding READ operations associated with the partial stripe write operation are not encountered. This is noteworthy. After that, the operation is controlled by the TCCl 44 from the disk array 116. step 21 of writing data and recently generated parity information to the disk in the Transition to 2. Control then determines that the local processor 130 has access to additional data. The process moves to step 214 in which it is determined whether or not the data will be written to the disk array 116.

追加のデータがディスク配列１１６に書き込まれるならば、制御は、ローカルプロセッサ１３０がメモリアドレスをインクリメントして、転送されるべきバイト数デクリメントするステップ２１６に遷移する。その後制御はステップ２０６に戻る。もしどの追加のデータもディスク配列１１６に書き込まれないならば、制御はローカルプロセッサ１３０がＷＲＩＴＥ完了を信号するステップ２１４からステップ２２４に遷移する。もし、ローカルプロセッサ１３０がＷＲＩＴＥコマンドリストを受信して、ｌＮ０ＤＥ構造がディスク配列１１６に書込まれていないことを決定するならば、ステップ２１８でローカルプロセッサ１３０は、大きいストライブ領域に書込まれるべきデータのためにディスク特定ＷＲＩＴＥ命令を確立する。この操作は、ＲＡＭ１３６に記憶された小さいストライプ及び大きいストライプ領域間の境界を利用することをローカルプロセッサ１３０に要求して、適切な物理的ディスクアドレスが展開されるように、異なったサイズストライプ用に修正する適切なバイアス又はオフセットを展開することが注目される。If additional data is written to disk array 116, control is transferred to the local Processor 130 increments the memory address and the byte to be transferred. The process moves to step 216 where the number is decremented. Control then passes to step 206. return. If no additional data is written to disk array 116, the Control begins at step 214 where local processor 130 signals WRITE completion. The process moves to step 224. If the local processor 130 If the lN0DE structure has not been written to disk array 116, If the local processor 130 determines that the large Disk specific WRITE command for data to be written to a new stripe area Establish. This operation includes the small and large stripes stored in RAM 136. requests the local processor 130 to utilize boundaries between different stripe areas. different size struc- tures to ensure that the appropriate physical disk addresses are deployed. It is noted that the appropriate bias or offset is developed to modify for the type.

選択的に、この知性は、デバイスドライバに確立することができ、デバイスドライバは、小さいストライプ及び大きいストライプ領域間の境界にアクセス及び用い、このオフセット情報をＷＲＩＴＥコマンドリストに協働させる。この実施例でローカルプロセッサ１３０は、この知性がデバイスドライバに協働されるので、小さいストライプ及び大きいストライブ領域の間の境界を利用することが要求されない。Optionally, this intelligence can be established in the device driver and access and use the boundaries between small stripe and large stripe areas. Then, use this offset information in the WRITE command list. This example In the local processor 130, this intelligence is cooperated with the device driver. , it is required to exploit the boundaries between small stripes and large stripe regions. Not done.

ステップ２２０で転送制御器チップ１４４は、書き込まれたデータ用にのみパリティ情報を生成する。ここで、もし書込操作がフルなストライプ書込であるならば、ディスク制御器１１２が書き込まれたデータのみからパリティ情報を生成することができることが注目される。しかしながら、書込操作が部分的ストライプ書込操作であるならば、先行続出操作は、ディスク上の現行データ又はパリティ情報を読むために実行される必要があってもよい。既に述べたように、部分的ストライプ書込操作の結果で生じるこれらの追加の読出操作はディスクシステム１１１の性能を減少させる。部分的ストライプ書込操作の性能向上に用いた技術には、１９９１年８月３０日に出願された米国特許出願シリアル番号７５２，７７３の名称［ＭＥＴＨｏＤＦｏＲＰＥＲＦＯＲＭＩＮＧ　ＷＲＩＴＥ　０ＰＥＲＡＴＩＯＮＳ　ＩＮ　Ａ　ＰＡＲＩＴＹ　ＦＡＵＬＴ　ＴＯＬＥＲＡＮＴＤＩＳＫ　ＡＲＲＡＹＪ及び１９９１年１２月２７日に出願された、米国特許出願シリアル番号８１５，１１８の名称「佃ＴＨＯＤ　ＦＯＲＩＭＰＲＯＶＩＮＧ　ＰＡＲＴｌ、ＡＬ　５ＴＲＩＰＥ　ＷＲＩＴＥ　ＰＥＲＦＯＲＭＡＮＣＥ　ＩＮ　ＤＩＳＫ　ＡＲＲＡＹＳＵＢＳＹＳＴＥＭＳＪが参照され、これら両者がこの発明と同じ譲受人に譲渡され、ここに参照物として協働する。ステップ２２２では、ディスク制御器１１２がデータ及びパリティ情報を大きいストライブ領域に書き込む。その後制御は追加のデータがディスク配列１１６に書き込まれるべきか否かをローカルプロセッサ１３０が決めるステップ２１４に遷移する。ステップ２１４において、どの追加データも転送しないことを決定するならば、制御は、ディスク配列制御器１１２がディスクデバイスドライバにＷＲＩＴＥ完了を信号するステップ２２４に遷移する。その後制御は、デバイスドライバがアプリケーション・プログラムの実行を続けるためにシステムプロセッサ２０の制御を解放するステップ２２６に進行する。これはＷＲＩＴＥ序列の操作を完了する。In step 220, the transfer controller chip 144 sends a parity only for written data. Generate tea information. Here, if the write operation is a full striped write For example, the disk controller 112 generates parity information only from written data. It is noteworthy that it is possible to However, if the write operation is partially striped If it is a write operation, the look-ahead operation uses the current data or parity on disk. May need to be executed to read the information. As already mentioned, partial These additional read operations that result from the tripe write operation are performed by disk system 1. 11 performance is reduced. Technology used to improve performance of partial stripe write operations is a U.S. patent application serial number 752,77 filed on August 30, 1991. Name of 3 [METHoDFoRPERFORMING WRITE 0PERA TIONS IN A PARITY FAULT TOLERANT DISK ARRAYJ and US Patent Application Syria filed on December 27, 1991 Name of file number 815,118 “THOD FORIMPROVING PAR Tl, AL 5TRIPE WRITE PERFORMANCE IN DI Reference is made to SK ARRAYSUBSYSTEMSJ, both of which are associated with this invention. Assigned to the same assignee and incorporated herein by reference. In step 222, the The disk controller 112 writes data and parity information to the large stripe area. nothing. Control then determines whether additional data is to be written to disk array 116. The process moves to step 214, where the local processor 130 determines. Step 21 4, if it is decided not to transfer any additional data, control disk array controller 112 signals WRITE completion to disk device driver The process moves to step 224. Control is then given to the device driver by the application. release control of the system processor 20 to continue execution of the program Proceed to step 226. This completes the operation of the WRITE sequence.

合図１０を参照して、知的ディスク配列制御器１１２を用いたディスク配列サブシステム１１１で実行されるＲＥＡＤ操作が示される。ＲＥＡＤ操作は、アクティブなプロセス又はアプリケーション・プログラムがシステムプロセッサ２０にディスクデバイスドライバに渡されたＲＥＡＤコマンドを生成させるステップ２５０から始まる。制御はディスクデバイスドライバがシステムプロセッサ２０の制御を仮定し、システムプロセッサ２０に、米国特許出願連番４３１，７３７に記載され、本発明の譲受人のコンパツクコンピュータ株式会社に譲渡されたものに類似したＲＥＡＤコマンドリストを生成させるステップ２５２に遷移する。このＲＥＡＤコマンドリストは、ステップ２５４でディスクサブシステム１１１に送られ、その操作後デバイスドライバは、ＲＥＡＤ完了信号を受信するまで待機する。Referring to signal 10, the disk array sub using intelligent disk array controller 112 A READ operation performed on system 111 is shown. READ operation is An active process or application program runs on system processor 20. Step 2 Generating the READ command passed to the disk device driver Starting from 50. The disk device driver controls the system processor 20. Assuming control, the system processor 20 is and assigned to Compact Computer Co., Ltd., the assignee of this invention. The process moves to step 252 where a READ command list similar to is generated. child The READ command list is sent to the disk subsystem 111 in step 254. After the operation, the device driver waits until it receives the READ completion signal. do.

ステップ２５６でディスク制御器１１２は、ＢＭＩＣ１４２又はＣＰＣ１４６及び転送制御器１４４を経てＲＥＡＤコマンドリストを受信し、読出操作が小さいストライプ領域のデータ即ちｌＮ０ＤＥ又は大きいストライプ領域のデータのアクセスを意図しているか否かを決める。この決定の実行においてディスク制御器１１２は、好ましくは要求されたデータのディスクアドレスを、ＲＡＭ１３６に記憶された小さいストライプ及び大きいストライプ領域間の境界と比較して、どの領域がアクセスされているかを決定する。選択的に、更なる知性は、デバイスドライバはどの領域がＲＥＡＤコマンドリストにアクセスされた情報と協働するように、デバイスドライバに確立することができる。この実施例によれば、ディスク制御器１１２は、小さい余分な知性を必要として、単にディスク特定ＲＥＡＤ要求を生成することでＲＥＡＤコマンドリストにこの情報を利用するのみである。In step 256, the disk controller 112 connects the BMIC 142 or CPC 146 to The READ command list is received via the transfer controller 144 and the read operation is small. Accessing stripe area data, i.e. lN0DE or large stripe area data. determine whether access is intended. In performing this decision the disk controller 112 preferably stores the disk address of the requested data in RAM 136. Compared to the boundaries between the stored small stripe and large stripe areas, Determine which area is being accessed. Optionally, further intelligence can be added to the device The driver works with the information which areas are accessed in the READ command list. so that it can be established in the device driver. According to this embodiment, the di The disk controller 112 requires little extra intelligence and simply controls the disk specific REA. It only uses this information in the READ command list by generating a D request. Ru.

もし小さいストライプ領域がアクセスされるならば、ローカルプロセッサ１３０は、ステップ２６０において、要求されたｌＮ０ＤＥ及び小さいストライプ領域におけるその関連のパリティ情報のためのディスク特定ＲＥＡＤ要求を生成し、ローカルＲＡＭ１３６における要求を列に並ばせる。制御は、要求が実行され、要求されたデータがディスク配列１１２から転送制御器１４４及びＢＭＩＣ１４２又はＣＰＣｌ３４を経由して、要求されたタスクによって指示されたアドレスのシステムメモリ５８に転送されるステップ２６４に遷移する。読出操作が大きいストライプ領域におけるデータのアクセスを意図することをディスク制御器１１２が決めたならば、ローカルプロセッサ１３０は、ステップ２６２で、要求されたデータ及び大きいストライプ領域におけるその関連のパリティ情報のためのディスク特定ＲＥＡＤ要求を生成し、ローカルＲＡＭ１３６におけるリクエスト即ち要求を列に並ばせる。これらの要求は実行され、データがステップ２６４で転送される。ステップ２６４におけるデータ転送の完了時にはディスク配列制御器１１２がステップ２６６でディスクデバイスドライバにＲＥＡＤ完了を信号し、システムプロセッサ２０の制御を解放する。If a small stripe area is accessed, the local processor 130 In step 260, the requested lN0DE and small stripe area generate a disk-specific READ request for its associated parity information at; Queuing requests in local RAM 136. Control is when the request is executed and The requested data is transferred from the disk array 112 to the transfer controller 144 and the BMIC 14. 2 or the address indicated by the requested task via CPCl34. The process moves to step 264 where the data is transferred to the system memory 58 of the system. Large read operation Disk controller 1 intends to access data in a striped area. 12, the local processor 130, in step 262, data and its associated parity information in large stripe areas. Generates a disk specific READ request and executes the request in local RAM 136 That is, requests are queued. These requests are executed and the data is retrieved in step 264. be transferred. Upon completion of the data transfer in step 264, the disk array control device 112 signals READ completion to the disk device driver in step 266. , releases control of the system processor 20.

それ故、ディスク配列の可変ストライブサイズを形成し、特に、ディスク配列に頻繁に書き込まれる小さいデータ構造のサイズに等しい完全なストライプサイズを有する領域を形成することによって、部分的ストライプ書込操作の回数が減少される。Therefore, forming a variable stripe size of the disk array, in particular Full stripe size equal to the size of frequently written small data structures The number of partial stripe write operations is reduced by forming regions with be done.

データストライブが正確にデータ構造のサイズに一致した小さいストライプ領域にこれらの小さいデータ構造を配置することによって、結果の操作はフルなストライブ書込である。これは、部分的ストライプ書込操作と関連する性能不利益が除去されるので、ディスク性能を増加させる。A small stripe area where the data stripe exactly matches the size of the data structure By placing these smaller data structures in This is a live write. This eliminates the performance penalty associated with partial stripe write operations. Since it is removed, it increases disk performance.

発明の前述の開示及び記述が記述的及び説明用であり、記述的ロジック及びフローチャートの詳細と同様に構成要素、方法及び操作における種々の変更も、発明の精神を逸脱しないで作られてもよい。The foregoing disclosure and description of the invention is descriptive and illustrative, and does not include the descriptive logic and flow. - Various changes in components, methods and operations, as well as details of the charts, are subject to invention. may be made without departing from the spirit of

イ反ｔ、ｆｆ１ｉ３＋１　叉ッヒ・シク゛ス★−４：Ｔ身スクＯテ゛≧スク１　子）スフ２　テ盲ス？３図１％ｒの喫λ包仔弓の３＋１　マツ巳°じり゛スキーへ図６図７女誉まし＼＼旬λ包イブリの４＋１　マ・・ノと０レク゛′Ｚ牛−乙１図８国際調査報告　。ｒＴ／Ｉ＋。。５／Ｉ＋２９’１フロントページの続き（８１）指定国　ＥＰ（ＡＴ、ＢＥ、ＣＨ，ＤＥ。I anti-t, ff1i3+1 Cross six ★-4: T body sk O text ≧ sk 1 Child) Sufu 2 Te blind Sufu? 3 figure 1 %r's signature 3+1 skiing to Matsumi°jiriji Figure 6 Figure 7 Woman's honor \\ Shun λ package Iburi's 4 + 1 Ma...no and 0 Reku゛'Z cow - Otsu 1 Figure 8 International search report. rT/I+. . 5/I+29’1 Continuation of front page (81) Designated countries EP (AT, BE, CH, DE.

ＤＫ、ＥＳ、ＦＲ，ＧＢ、ＧＲ，ＩＥ、ＩＴ、ＬＵ、ＭＣ，ＮＬ、ＰＴ、ＳＥ）、０Ａ（ＢＦ、ＢＪ、ＣＦ、ＣＧ、　ＣＩ、　ＣＭ、　ＧＡ、　ＧＮ、　ＭＬ、　ＭＲ，ＳＮ、　ＴＤ。DK, ES, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE) , 0A (BF, BJ, CF, CG, CI, CM, GA, GN, ML, MR, SN, TD.

ＴＧ）、　ＡＴ、　ＡＵ、　ＢＧ、　ＢＲ，ＣＡ、　ＣＨ，Ｃ３゜ＤＥ、ＤＫ、ＥＳ、ＦＩ、ＧＢ、ＨＵ、ＪＰ、ＫＲ，ＮＬ、　Ｎｏ、　ＰＬ、　ＲＯ，ＲＵ、ＳＥTG), AT, AU, BG, BR, CA, CH, C3゜DE, DK, ES, FI, GB, HU, JP, KR, NL, No, PL, RO, RU, S.E.

Claims

[Claims] 1. A first area consisting of a plurality of data stripes having a first stripe size and multiple data with a stripe size larger than this first stripe size. a second area consisting of data stripes, the first stripe size being diagonal; How to improve the performance of disk arrays depending on the size of the data structures used in disk arrays In, Generates a data write operation to a disk array, and the data comprises one of the data structures. If the data comprises one of the data structures, The data is placed in a data stripe in a first area with a stripe size of 1. write, If the data does not include one of the data structures, the data in the second area A computer system comprising the step of writing said data to a data stripe. How to improve disk array performance in disk arrays. 2. Writing the data structure to the first area is a full stripe write operation. The method according to claim 1. 3. A disk controller in which the data writing determining step is coupled to a disk array. 2. The method of claim 1, carried out by. 4. The step of determining data writing is performed in a computer system. 2. The method of claim 1, executed by a processor. 5. Generate a read operation that requests data from the disk array, and the read operation When accessing the first stripe size area or the second stripe size area decide whether or not you intend to the first or second stripe size area depending on the determining step of the read operation; 2. The method of claim 1, further comprising the step of providing the requested data from an area. Method. 6. The disk array utilizes parity defect tolerance technology to improve parity for the data. and if said data comprises one of the data structures, write the parity information in the area, If the data does not include one of the data structures, the parity is stored in a second area. 2. The method of claim 1, further comprising the step of: writing personal information. 7. System bus and A plurality of data streams coupled to this system bus have a first stripe size. A first region consisting of tripes and a strut larger than this first stripe size. a second area consisting of a plurality of data stripes having a size of A stripe size of 1 corresponds to the size of the data structures used in the disk array. disk array, A means coupled to the system bus for generating data write operations to the disk array. step by step, coupled to the generating means and the system bus so that the data is one of the data structures. means for determining whether or not a system bus is included; and if the data comprises one of the data structures, the first stripe sensor means for writing the data to a data stripe in the size area; coupled to said determining means and said system bus, said data being one of the data structures; If the data stripe in the second stripe size area does not have one a computer for performing a disk array write operation, comprising means for writing said data to a disk array; computer system equipment. 8. The data structure write operation to the first stripe size area 8. The apparatus of claim 7, which is a tripe write operation. 9. Having multiple variable-sized data stripes to store data, the first The stripe size corresponds to the size of the data structures used in the disk array. How to improve disk array performance in computer system disk arrays generates a data write operation to a disk array, and the data is one of the data structures. If the data comprises one of the data structures, then The data stripe has a first size corresponding to the size of the data structure. write the data, If said data does not include one of the data structures, it does not have a first size. A method comprising writing said data to a data stripe. 10. The data structure write to the data stripe having a first size 10. The method of claim 9, wherein the method is a regular stripe write operation. 11. The disk array has a first stripe size for storing data structures. and a region with a second stripe size, generating a read operation requesting data from a disk array, said read operation requesting data from said first disk array; when accessing the stripe size area or the second stripe size area. Decide whether to use Mori or not. from the first or second stripe area depending on the read operation decision; 10. The method of claim 9, further comprising the step of providing the requested data. 12. System bus, Multiple variable-sized devices are coupled to this system bus to store data. data stripe, and the first stripe size is the data used for the disk array. a disk array corresponding to the size of the data structure; A means coupled to the system bus for generating data write operations to the disk array. step by step, coupled to the generating means and the system bus so that the data is one of the data structures. means for determining whether or not a system bus is included; and if said data includes one of the data structures, then the size of said data structure writing the data structure to a data stripe having the corresponding first size; a means of entering, coupled to said determining means and said system bus, said data being one of the data structures; If the data stripe does not have the first size, then the data stripe does not have the first size. a computer that performs disk array write operations, having a means for writing data; ta system. 13. Forming a file system on the disk array and partitioning the disk array. a first area consisting of a plurality of data stripes having a first stripe size; and a plurality of data streams having a stripe size larger than the first stripe. a second area consisting of stripes, and the first stripe size is corresponding to the size of the data structure used for the array, Generates a data write operation to a disk array, and the data contains one of the data structures. and if the data includes one of the data structures, the first Write the data to a data stripe in a first area with a write size. fruit, If the data does not include one of the data structures, the data structure in the second area is Utilizes a parity defect tolerant technique with the step of writing said data in a tripe. Improve disk array performance on computer system disk arrays used Method. 14. The data structure write to the first area is a full stripe write operation. 14. The method of claim 13. 15. Generate a read operation that requests data from the disk array, and if the read operation Accessing the first stripe size area or the second stripe size area decide whether you intend to the first or second stripe size area depending on the determining step of the read operation; 14. The method of claim 13, further comprising the step of providing the requested data from an area. the method of.