JP2005301350A

JP2005301350A - Fault tolerant server

Info

Publication number: JP2005301350A
Application number: JP2004112114A
Authority: JP
Inventors: Naotaka Nakamura; 尚貴中村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-04-06
Filing date: 2004-04-06
Publication date: 2005-10-27

Abstract

<P>PROBLEM TO BE SOLVED: To construct an inexpensive hardware RAID environment for a fault tolerant server using a disk array controller of an existing level and a disk enclosure without using any expensive external disk array device or software mirroring. <P>SOLUTION: This fault tolerant server 100 is provided with CPU modules 1 and 2, PCI modules 3 and 4 multiplied to work as a master and a slave, a plurality of disk HDD from 01 to N-1, and disk array controllers 7 and 8 mounted in the PCI modules 3 and 4 individually for communicating with each other while recognizing each other as a virtual I/O disk mutually and sharing disk configuration information of the disk HDD from 01 to N-1 mutually. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明はフォールトトレラントサーバに関し、特に、外部記憶装置などの周辺機器とのインタフェース部等をモジュール化し、多重化したフォールトトレラントサーバに関する。 The present invention relates to a fault-tolerant server, and more particularly to a fault-tolerant server in which an interface unit with a peripheral device such as an external storage device is modularized and multiplexed.

フォールトトレラントサーバは、ＣＰＵや、外部記憶装置などの周辺機器とのインタフェース部等をモジュール化し、多重化することで、対故障設計されている。またインタフェース部は、一つをマスタ、その他をスレーブとして動作させ、正常時はマスタ経由、障害時はスレーブ経由で外部記憶装置などの周辺機器とのＩ／Ｏを行っている。 The fault-tolerant server is designed to be fault-tolerant by modularizing and multiplexing an interface unit with peripheral devices such as a CPU and an external storage device. The interface unit operates as one master and the other as slaves, and performs I / O with peripheral devices such as an external storage device via the master when normal and via the slave when a failure occurs.

例えば、特許文献１には、原用及び予備系のＣＰＵの両者に２重化されたディスク制御装置がそれぞれ接続され、各ディスク制御装置はそれぞれ原用及び予備系のディスクと接続され、片系のディスク制御装置に障害が発生しても両系のディスクに対して同一データの二重書きをなすことが可能なものが開示されている。 For example, in Patent Document 1, a duplicated disk control device is connected to both the primary and standby CPUs, and each disk control device is connected to a primary and standby disk respectively. Even if a failure occurs in the disk control device, the same data can be double-written on both disks.

特開昭６１−１６０１５０号公報JP 61-160150 A

外部記憶装置に対する信頼性を向上させるためには、ディスクアレイ装置を備えることにより実現できるが、ディスクアレイコントローラ部を筐体内にもつ高価な外付ディスクアレイ装置が必要となる。このかわりにＯＳによりソフト的なミラーリングを行うことも考えられるが、ハード的構造に対して耐障害性が下がり、また、ＣＰＵリソースが必要なことから処理能力の低下のおそれがあった。 In order to improve the reliability of the external storage device, it can be realized by providing a disk array device, but an expensive external disk array device having a disk array controller unit in the housing is required. Instead of this, it is conceivable to perform soft mirroring with the OS, but the fault tolerance with respect to the hardware structure is reduced, and the CPU resources are required, so that the processing capability may be reduced.

本発明のフォールトトレラントサーバは、ＣＰＵモジュールと、マスタ／スレーブとして動作する多重化されたＰＣＩモジュールと、複数のディスクと、前記多重化されたＰＣＩモジュールにそれぞれ対応しお互いを仮想的なＩ／Ｏデバイスとして認識して通信を行い複数のディスクのディスク構成情報を共有するディスクアレイコントローラとを有する。 The fault tolerant server of the present invention includes a CPU module, a multiplexed PCI module operating as a master / slave, a plurality of disks, and a virtual I / O corresponding to each of the multiplexed PCI modules. A disk array controller that communicates by recognizing as a device and sharing the disk configuration information of a plurality of disks.

本発明によれば、マスタ／スレーブとして動作するＰＣＩモジュールに搭載された各ディスクアレイコントローラを、お互いに仮想的なＩ／Ｏデバイスとして認識させることにより、既存のＳＣＳＩ等をインタコネクトとして利用してディスクアレイコントローラ間の通信を行い、ディスク構成情報、障害情報等の共有を行う。これにより、高価な外付ディスクアレイ装置やソフトミラーリングを用いずに、既存レベルのディスクアレイコントローラとディスクエンクロージャーを利用した安価なフォールトトレラントサーバ向けのハードウェアＲＡＩＤ環境の構築が可能となる。 According to the present invention, each disk array controller mounted on a PCI module operating as a master / slave is mutually recognized as a virtual I / O device, so that existing SCSI or the like can be used as an interconnect. Communication between disk array controllers is performed to share disk configuration information, failure information, and the like. This makes it possible to construct a hardware RAID environment for an inexpensive fault-tolerant server using an existing level disk array controller and disk enclosure without using an expensive external disk array device or soft mirroring.

次に、本発明を実施するための最良の形態について図面を参照して詳細に説明する。 Next, the best mode for carrying out the present invention will be described in detail with reference to the drawings.

図１に示すように、フォールトトレラントサーバ１００は、ＣＰＵモジュール１，２それぞれに、ＰＣＩモジュール３，４を接続し、ＰＣＩモジュール３，４それぞれに共用のデバイスユニット（以下、ＤＥＵという）９を接続する構成とする。ＤＥＵ９は、ディスクＨＤＤ０１〜Ｎ−１を含み、ＰＣＩモジュール３，４は、それぞれＰＣＩスロット５，６を備え、ＰＣＩスロット５，６にはディスクアレイコントローラ（以下ＤＡＣという）７，８が装着されている。二重化の対象となる２枚のＤＡＣ７，８は、ＳＣＳＩ等により相互接続され、ＳＣＳＩ等のプロトコルを用いて情報を通信する。また、ディスクＨＤＤ０１〜Ｎ−１は、ＤＡＣ７，８の通信経路上に接続される。 As shown in FIG. 1, the fault tolerant server 100 connects PCI modules 3 and 4 to the CPU modules 1 and 2, and connects a common device unit (hereinafter referred to as DEU) 9 to the PCI modules 3 and 4, respectively. The configuration is as follows. The DEU 9 includes disk HDDs 01 to N-1, and the PCI modules 3 and 4 have PCI slots 5 and 6, respectively. The PCI slots 5 and 6 have disk array controllers (hereinafter referred to as DACs) 7 and 8 attached thereto. Yes. The two DACs 7 and 8 to be duplexed are interconnected by SCSI or the like, and communicate information using a protocol such as SCSI. The disks HDD01 to N-1 are connected on the communication path of the DACs 7 and 8.

ＰＣＩモジュール３，４は互いにマスタ／スレーブとして動作し、正常動作時、ＣＰＵモジュール１，２とＤＥＵ９との間のＩ／Ｏは、マスタ側ＰＣＩモジュールを経由して行われる。また障害発生時にはスレーブ側を介した経路に変更してＣＰＵモジュール１，２がＤＥＵ９へのアクセスを行う。 The PCI modules 3 and 4 operate as master / slave with each other, and during normal operation, I / O between the CPU modules 1 and 2 and the DEU 9 is performed via the master-side PCI module. When a failure occurs, the path is changed to the path via the slave side, and the CPU modules 1 and 2 access the DEU 9.

ＣＰＵモジュール１，２は、それぞれＣＰＵ（図示せず）と、メモリ（図示せず）と、ＰＣＩモジュール３，４に対するインタフェースを行うコントローラ（図示せず）とを備え、ＰＣＩモジュール３，４は、それぞれＣＰＵモジュール１，２に対するインタフェースを行うコントローラ（図示せず）を備えている。 Each of the CPU modules 1 and 2 includes a CPU (not shown), a memory (not shown), and a controller (not shown) for interfacing with the PCI modules 3 and 4. Each has a controller (not shown) for interfacing with the CPU modules 1 and 2.

図２は、図１のＤＡＣ７，８及びＤＥＵ９の詳細の構成を示すブロック図である。 FIG. 2 is a block diagram showing a detailed configuration of the DACs 7 and 8 and the DEU 9 in FIG.

ＤＡＣ７，８は主に、仮想デバイスユニット（以下ＶＤＵという）１２，１６、本体側Ｉ／Ｆ１０，２０、ディスク側Ｉ／Ｆ１１，２１、ディスク構成情報テーブル１３，１７、キャッシュコントローラ１４，１８、キャッシュ１５，１９からなる。ディスク構成情報テーブル１３は、ＨＤＤ０１〜Ｎ−１のＲＡＩＤ構成情報を保持する。 The DACs 7 and 8 mainly include virtual device units (hereinafter referred to as VDUs) 12 and 16, main body side I / Fs 10 and 20, disk side I / Fs 11 and 21, disk configuration information tables 13 and 17, cache controllers 14 and 18, caches 15 and 19. The disk configuration information table 13 holds RAID configuration information of the HDDs 01 to N-1.

ＶＤＵ１２，１６は、接続した１組のＤＡＣ７，８が、互いにディスクＨＤＤ０１〜Ｎ−１と同レベルの仮想的なＩ／Ｏデバイスとして認識するよう処理を行う。具体的には、マスター側ＶＤＵは、スレーブ側ＶＤＵに対しＩＤ＝０をもつデバイスとして認識され、スレーブ側ＶＤＵは、マスター側ＶＤＵにＩＤ＝Ｎ（ＤＡＣで採用するプロトコルの最大接続デバイスＩＤ）として認識させるよう処理を行う。これによりマスター側ＶＤＵとスレーブ側ＶＤＵとは、ディスクＨＤＤ０１〜Ｎ−１が接続された通信経路を経由して相手のＶＤＵをターゲットとして指定することでＳＣＳＩ等のプロトコルを用いて情報伝送を行うことができる。
ＶＤＵ１２又はＶＤＵ１６は、ＤＡＣ７又はＤＡＣ８がマスタ側ＤＡＣとして動作中の場合は、スレーブ側のディスク構成情報テーブル１３又はディスク構成情報テーブル１７へディスク構成情報の送信を行い、ディスクの障害情報の共有を行う。またキャッシュコントローラの情報制御を行い、マスタ側であるＤＡＣ７又はＤＡＣ８のキャッシュ１５またはキャッシュ１９の状態の変更情報の送信を行う。また、ＤＡＣ７又はＤＡＣ８がスレーブ側ＤＡＣとして動作中の場合は、マスタ側であるＶＤＵ１６又はＶＤＵ１２からの情報を受信し、待機中のＤＡＣ７又はＤＡＣ８のキャッシュ１５又はキャッシュ１９、ディスク構成情報テーブル１３又はディスク構成情報テーブル１７のディスク構成情報の更新を行う。 The VDUs 12 and 16 perform processing so that the pair of connected DACs 7 and 8 recognize each other as virtual I / O devices at the same level as the disks HDD01 to N-1. Specifically, the master-side VDU is recognized as a device having ID = 0 with respect to the slave-side VDU, and the slave-side VDU has the ID = N (the maximum connection device ID of the protocol adopted by the DAC) as the master-side VDU. Process to make it recognized. As a result, the master VDU and the slave VDU perform information transmission using a protocol such as SCSI by specifying the partner VDU as a target via the communication path to which the disks HDD01 to N-1 are connected. Can do.
When the DAC 7 or the DAC 8 is operating as the master-side DAC, the VDU 12 or the VDU 16 transmits the disk configuration information to the slave-side disk configuration information table 13 or the disk configuration information table 17 and shares the disk failure information. . In addition, information control of the cache controller is performed, and change information on the state of the cache 15 or the cache 19 of the DAC 7 or DAC 8 on the master side is transmitted. When the DAC 7 or DAC 8 is operating as a slave-side DAC, it receives information from the VDU 16 or VDU 12 on the master side, and waits for the cache 15 or cache 19 of the DAC 7 or DAC 8, the disk configuration information table 13 or the disk The disk configuration information in the configuration information table 17 is updated.

またＣＰＵモジュール１，２からディスクＨＤＤ０１〜Ｎ−１にアクセスを行うと、マスター側のＤＡＣ７又はＤＡＣ８において、本体側Ｉ／Ｆ１０又は本体側Ｉ／Ｆ２０からＶＤＵ１２又はＶＤＵ１６へデータ入力があり、ＶＤＵ１２又はＶＤＵ１６がマスターとしてスレーブ側ＶＤＵ１６又はＶＤＵ１２への情報送信を行う。スレーブ側のＤＡＣ８又はＤＡＣ７においては、本体側Ｉ／Ｆ２０又は本体側Ｉ／Ｆ１０からＶＤＵ１２又はＶＤＵ１６へデータ入力がなく、ＶＤＵ１２又はＶＤＵ１６はスレーブとしてマスター側のＶＤＵ１６又はＶＤＵ１２からの情報の受信を行う。 When the CPU modules 1 and 2 access the disks HDD01 to N-1, the master side DAC 7 or DAC 8 has data input from the main body side I / F 10 or main body side I / F 20 to the VDU 12 or VDU 16, and the VDU 12 or The VDU 16 transmits information to the slave-side VDU 16 or VDU 12 as a master. In the DAC 8 or DAC 7 on the slave side, no data is input from the main body side I / F 20 or the main body side I / F 10 to the VDU 12 or VDU 16, and the VDU 12 or VDU 16 receives information from the VDU 16 or VDU 12 on the master side as a slave.

また、ＤＡＣ７，８は、初期状態でのディスクアレイ構成を行う際に、マスタ側ＤＡＣとして動作中の場合は、ディスク構成情報をスレーブ側ＤＡＣに送信する。また、マスタ側ＤＡＣからスレーブ側ＤＡＣへ情報を通知するのとあわせて、マスタ側ＤＡＣがディスクＨＤＤ０１〜Ｎ−１へも同様の構成情報を書き込む。この情報はマスタ側ＤＡＣ７に障害が発生した場合、その交換復旧に際してＤＡＣ７、８、ディスクの３種の情報を比較し多数決をとることで復旧時の情報を決定するのに用いられる。 Further, when performing the disk array configuration in the initial state, the DACs 7 and 8 transmit the disk configuration information to the slave-side DAC when operating as the master-side DAC. In addition to the notification of information from the master-side DAC to the slave-side DAC, the master-side DAC writes similar configuration information to the disks HDD01 to N-1. This information is used to determine information at the time of restoration by comparing the three types of information of the DACs 7 and 8 and the disk and taking a majority decision when a failure occurs in the master-side DAC 7.

また、ＶＤＵ１２，１６はデータアクセス時において、ディスク構成情報テーブル１３，１７、キャッシュコントローラ１４，１８の情報制御を行い、ディスクＨＤＤ０１〜Ｎ−１の障害情報の共有、マスタ側ＤＡＣのキャッシュ状態の変更情報を受信し、待機中のＤＡＣのキャッシュの更新を行う。 Further, the VDUs 12 and 16 control information of the disk configuration information tables 13 and 17 and the cache controllers 14 and 18 during data access, share fault information of the disks HDD01 to N-1, and change the cache state of the master-side DAC. Information is received and the cache of the waiting DAC is updated.

次に本発明の変形例について説明する。図３は図２の変形例を示すブロック図である。 Next, a modified example of the present invention will be described. FIG. 3 is a block diagram showing a modification of FIG.

図２の例では、ディスクＨＤＤ０１〜Ｎ−１上にディスク構成情報２２を保持するのみであるが、本変形例ではそれに加えディスク上にキャッシュ情報２３等、マスタ側ＤＡＣからスレーブ側ＤＡＣへ送信する情報すべてをディスクＨＤＤ０１〜Ｎ−１の情報保持領域に保持し、マスタ側ＶＤＵからディスクＨＤＤ０１〜Ｎ−１へのデータ書き込みとあわせて当該領域への書き込みを実施する点で異なる。これによりＶＤＵ自身は相手からデバイスとして認識されることなくターミネーションとして動作し、マスタ側ＤＡＣの障害により、上位からスレーブ側への入力が発生した時点で、スレーブ側ＶＤＵはディスク構成情報２２等を読み込み処理を継続する。これによりマスタ／スレーブ間の通信量を減らし処理効率を向上することが可能となる。 In the example of FIG. 2, only the disk configuration information 22 is held on the disks HDD01 to N-1, but in this modified example, cache information 23 and the like are transmitted on the disk from the master side DAC to the slave side DAC. The difference is that all the information is held in the information holding area of the disks HDD01 to N-1, and writing to the area is performed together with the data writing from the master-side VDU to the disks HDD01 to N-1. As a result, the VDU itself operates as a termination without being recognized as a device by the other party, and the slave VDU reads the disk configuration information 22 and the like when an input from the host to the slave occurs due to a failure of the master DAC. Continue processing. As a result, the communication amount between the master and the slave can be reduced and the processing efficiency can be improved.

外部記憶装置などの周辺機器とのインタフェース部等をモジュール化し、二重化したフォールトトレラントサーバに利用可能である。 It can be used in a fault-tolerant server that has a modularized interface unit with peripheral devices such as an external storage device.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１のディスクアレイコントローラ７，８及びデバイスユニット９の詳細の構成を示すブロック図である。FIG. 2 is a block diagram showing a detailed configuration of disk array controllers 7 and 8 and device unit 9 in FIG. 1. 図２の変形例を示すブロック図である。It is a block diagram which shows the modification of FIG.

Explanation of symbols

１，２ＣＰＵモジュール
３，４ＰＣＩモジュール
５，６ＰＣＩスロット
７，８ディスクアレイコントローラ
９デバイスユニット
１０，２０本体側Ｉ／Ｆ
１１，２１ディスク側Ｉ／Ｆ
１２，１６仮想デバイスユニット
１３，１７ディスク構成情報テーブル
１４，１８キャッシュコントローラ
１５，１９キャッシュ
２２ディスク構成情報
２３キャッシュ情報
１００フォールトトレラントサーバ
1, 2 CPU module 3, 4 PCI module 5, 6 PCI slot 7, 8 Disk array controller 9 Device unit 10, 20 Main unit side I / F
11, 21 Disc side I / F
12, 16 Virtual device unit 13, 17 Disk configuration information table 14, 18 Cache controller 15, 19 Cache 22 Disk configuration information 23 Cache information 100 Fault tolerant server

Claims

A CPU module, a multiplexed PCI module that operates as a master / slave, a plurality of disks, and the multiplexed PCI module respectively correspond to each other as a virtual I / O device to communicate with each other. A fault tolerant server comprising a disk array controller sharing disk configuration information of a plurality of disks.

The fault tolerant server according to claim 1, wherein the disk array controller performs communication between the disk array controllers using a SCSI protocol.

When the disk array controller is operating as a master disk array controller, the disk array information is transmitted to the slave disk array controller and written to the plurality of disks when performing the disk array configuration in the initial state. The fault tolerant server according to claim 2.

2. The disk array controller according to claim 1, wherein each of the disk array controllers has a cache, and when the disk array controller is operating as a master disk array controller, the cache information is transmitted to the slave disk array controller and written to the plurality of disks. 3. The fault tolerant server according to 3.