JP2014178793A

JP2014178793A - Information processing system

Info

Publication number: JP2014178793A
Application number: JP2013051203A
Authority: JP
Inventors: Akira Aoki; 亮青木; Toshiyuki Ukai; 敏之鵜飼
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2013-03-14
Filing date: 2013-03-14
Publication date: 2014-09-25

Abstract

【課題】信頼性の向上を実現可能な情報処理システムを提供する。
【解決手段】外部からの共通の入力に応じて同一の処理を実行し、第１〜第３処理結果をそれぞれ出力する処理ブロック３１０Ａ〜３１０Ｃと、各処理ブロックにそれぞれ対応する診断ブロック３００Ａ〜３００Ｃと、外部送受信ブロック１０αとを備える。処理ブロック３１０Ａ〜３１０Ｃは、再構成可能回路部３１９Ａ〜３１９Ｃを含む。外部送受信ブロックおよび診断ブロックのそれぞれは、第１〜第３処理結果を多数決判定する多数決判定部１４ａα，１４ｂ〜１４ｄと、多数決判定結果にエラーが有った場合に診断を行う障害診断部２３ａα，２３ｂ〜２３ｄとを有する。ここで、障害診断部のそれぞれは、エラーを反映して生成した障害診断情報を他の障害診断部との間で第一通信部を介して共有する処理を行う。
【選択図】図１ＡAn information processing system capable of improving reliability is provided.
Processing blocks 310A to 310C that execute the same processing in response to a common input from the outside and output first to third processing results, respectively, and diagnostic blocks 300A to 300C that correspond to the processing blocks, respectively And an external transmission / reception block 10α. Processing blocks 310A-310C include reconfigurable circuit portions 319A-319C. Each of the external transmission / reception block and the diagnosis block includes a majority decision determination unit 14aα and 14b to 14d that determine the majority of the first to third processing results, and a failure diagnosis unit 23aα that performs diagnosis when there is an error in the majority determination result. 23b-23d. Here, each of the failure diagnosis units performs a process of sharing failure diagnosis information generated by reflecting an error with other failure diagnosis units via the first communication unit.
[Selection] Figure 1A

Description

本発明は、情報処理システムに関し、例えば、高信頼性が要求される金融機関向け等の情報処理システムに関する。 The present invention relates to an information processing system, for example, an information processing system for a financial institution that requires high reliability.

例えば、特許文献１には、ＦＰＧＡ（Field Programmable Gate Array）内において、三重化された処理系統と、その各処理結果の多数決を行う多数決判定論理部と、障害が発生した処理系統を検出し、当該処理系統を正常な処理系統の回路構成情報を用いて再構成する各種回路部とを備えた情報処理装置が示されている。 For example, Patent Document 1 detects a triple processing system, a majority decision logic unit that performs majority determination of each processing result, and a processing system in which a failure has occurred, in an FPGA (Field Programmable Gate Array), An information processing apparatus including various circuit units for reconfiguring the processing system using circuit configuration information of a normal processing system is shown.

特開２０１１−２１６０２０号公報JP 2011-216002 A

例えば、金融取引分野等では、取引遅延を原因とする損失を削減したいという継続的な需要があり、これまで、様々な方法で低遅延化が行われてきた。近年では、人に変わり、コンピュータによって取引を行うアルゴリズム取引が一般になったことで、低遅延化の需要はコンピュータが動作するマイクロ秒、ナノ秒オーダーにまで進んでいる。従来は、ソフトウェア技術やコンピュータシステムの技術進歩によって低遅延化が行われてきたが、マイクロ秒、ナノ秒オーダーの低遅延化に対しては限界があり、金融取引処理の専用回路による、さらなる低遅延化（低レイテンシ化）が望まれている。 For example, in the field of financial transactions and the like, there is a continuous demand for reducing losses caused by transaction delays, and so far, delays have been reduced by various methods. In recent years, instead of people, algorithmic trading using a computer has become common, and the demand for low delay has progressed to the order of microseconds and nanoseconds in which computers operate. Conventionally, the delay has been reduced by the advancement of software technology and computer system technology, but there is a limit to reducing the delay in the order of microseconds and nanoseconds. Delay (lower latency) is desired.

従来、専用回路を用いて低遅延化を行うため、低遅延化対象処理のＡＳＩＣ（Application Specific Integrated Circuit）化が行われてきた。しかし近年では、半導体技術の発展により、ＡＳＩＣに比べ低コストで、ハードウェアの配線や論理を動的に変更可能なＦＰＧＡを用いることが一般的である。ＦＰＧＡは多様なＩ／Ｏ技術に対応しており、一般的なコンピュータネットワークへの接続も可能なため、一般的なコンピュータによって従来行われていた金融処理全てをＦＰＧＡで実現する事も可能である。また、ＦＰＧＡを用いることで、一般的なコンピュータを使用した場合に冗長であった処理を廃し、金融取引処理に最低限必要な処理だけをＦＰＧＡ上に実装する事が可能となり、従来の一般的なコンピュータでは達成できなかった低遅延化が可能になる。 Conventionally, in order to reduce the delay using a dedicated circuit, an ASIC (Application Specific Integrated Circuit) has been performed for the processing to be reduced. However, in recent years, due to the development of semiconductor technology, it is common to use FPGAs that can dynamically change hardware wiring and logic at lower cost than ASICs. Since the FPGA supports various I / O technologies and can be connected to a general computer network, it is possible to realize all financial processing conventionally performed by a general computer with the FPGA. . Also, by using an FPGA, it becomes possible to eliminate the processing that was redundant when using a general computer, and to implement only the minimum processing necessary for financial transaction processing on the FPGA. This makes it possible to reduce the delay that could not be achieved with a simple computer.

しかし、ＦＰＧＡを用いることで新たな問題が発生する。ＦＰＧＡは回路の構成を自由にプログラミング可能にするため、回路構成情報を書き換え可能なメモリに格納している。この回路構成情報を格納しているメモリはＣＲＡＭ（Configuration RAM）等と呼ばれる。近年の高性能化や高機能化を支えるため、ＣＲＡＭには高速大容量化に向いているＳＲＡＭ（Static Random Access Memory）が使用される場合が多い。そのため、ＣＲＡＭでは、アルファ線や中性子線等の放射線によって、ビット反転が起こる可能性が高い。このビット反転によって発生する一過性の障害はソフトエラーと呼ばれる。ソフトエラーの発生確率は、半導体製造プロセスの微細化により増加している。 However, a new problem occurs by using FPGA. The FPGA stores circuit configuration information in a rewritable memory so that the circuit configuration can be freely programmed. A memory storing the circuit configuration information is called a CRAM (Configuration RAM) or the like. In order to support high performance and high functionality in recent years, SRAM (Static Random Access Memory) suitable for high speed and large capacity is often used for CRAM. Therefore, in CRAM, there is a high possibility that bit inversion occurs due to radiation such as alpha rays and neutron rays. A transient failure caused by this bit inversion is called a soft error. The probability of occurrence of soft errors is increasing due to miniaturization of semiconductor manufacturing processes.

このような問題に対処するため、ソフトエラーを検出、訂正する方法を備えたＦＰＧＡも存在する。ただし、このようなＦＰＧＡでは、ソフトエラーの検出に伴いミリ秒オーダーの遅延が発生する恐れがある。したがって、マイクロ秒、ナノ秒オーダーの金融取引処理の中で、このようなＦＰＧＡ内のソフトエラー検出機構を利用して信頼性の向上を図ったとしても、エラーを検出した時点では、既に誤った処理結果がシステム外部のユーザへ渡り、誤った処理結果を用いて次の取引を行う可能性がでてくる。金融取引処理において、誤った処理結果をシステムが提供することは、ユーザに対して多額の損害を与える可能性があるため、このようなＦＰＧＡ内のソフトエラー検出機構をマイクロ秒、ナノ秒オーダーの低遅延化が必要な金融取引処理で活用することは困難となる。 In order to cope with such a problem, there is an FPGA having a method for detecting and correcting a soft error. However, in such an FPGA, there is a possibility that a delay on the order of milliseconds occurs with the detection of a soft error. Therefore, even in the case of improving the reliability by using such a soft error detection mechanism in the FPGA in the microsecond and nanosecond order financial transaction processing, when the error is detected, it is already wrong. There is a possibility that the processing result is transferred to a user outside the system and the next transaction is performed using the wrong processing result. In financial transaction processing, if the system provides an incorrect processing result, there is a possibility of causing a great deal of damage to the user. Therefore, the soft error detection mechanism in the FPGA is in the order of microseconds or nanoseconds. It will be difficult to utilize in financial transaction processing that requires low latency.

こうした低遅延環境において、エラーをほぼリアルタイムに検出、訂正するため、例えば、特許文献１に示されるような多数決方式を用いることが考えられる。これは三つ以上の同じ処理を行う処理系統を用意し、各系統それぞれで処理した結果を比べ、最も確からしい結果を正常な結果とみなす方法である。一般に、二重障害が起きる確率は非常に小さいため、三重化を行うことで、仮に一系統が障害しても、他の二系統が正しい結果を出力すれば、出力結果の多数決によってエラーを検出・訂正できる。また、エラーの原因がＦＰＧＡのＣＲＡＭ上で起きたソフトエラーである場合、一過性の現象であるため、特許文献１のように、正常な回路構成情報を再びＦＰＧＡのＣＲＡＭに入力すれば回復が可能である。 In such a low-delay environment, in order to detect and correct errors almost in real time, for example, it is conceivable to use a majority voting system as disclosed in Patent Document 1. This is a method of preparing three or more processing systems that perform the same processing, comparing the results of processing in each system, and considering the most probable result as a normal result. In general, the probability of a double failure is very small, so if you perform a triple, even if one system fails, if the other two systems output the correct result, an error is detected by the majority of the output results・ Can be corrected. Also, if the cause of the error is a soft error that has occurred on the FPGA CRAM, this is a transient phenomenon. Therefore, as shown in Patent Document 1, if normal circuit configuration information is input again to the FPGA CRAM, the error is recovered. Is possible.

しかしながら、特許文献１の技術では、多数決によってエラー検出や訂正を行える範囲が部分的となる恐れがある。具体的には、ＦＰＧＡ内の三重化回路のエラー検出や訂正、およびＣＲＡＭの再構成は行えるが、多数決の判定を行う多数決判定論理部や、ＦＰＧＡ本体や、ＦＰＧＡの自動回復を行う装置等が単一障害点を持つエラーを発生すると、そのエラーは特許文献１の技術では検出、訂正が困難となり、ＣＲＡＭの再構成も困難となる。金融取引サービスの性質上、単一障害の発生後もサービスを継続する可用性が必須である。可用性を実現するためには、単一障害の発生後もエラー検出を行え、最悪でもエラーを外部へ漏らさずに停止する必要がある。 However, in the technique of Patent Document 1, there is a possibility that the range in which error detection and correction can be performed by a majority decision becomes partial. Specifically, the error detection and correction of the triple circuit in the FPGA can be performed, and the CRAM can be reconfigured. However, a majority decision logic unit that performs majority decision, an FPGA main body, a device that performs automatic recovery of the FPGA, and the like. When an error having a single point of failure occurs, it is difficult to detect and correct the error with the technique of Patent Document 1, and it is also difficult to reconfigure the CRAM. Due to the nature of financial transaction services, the availability of continuing services after a single failure is essential. In order to realize availability, it is necessary to detect an error even after a single failure occurs, and to stop without leaking the error to the outside at worst.

本発明は、このようなことを鑑みてなされたものであり、その目的の一つは、信頼性の向上を実現可能な情報処理システムを提供することにある。本発明の前記並びにその他の目的と新規な特徴は、本明細書の記述及び添付図面から明らかになるであろう。 The present invention has been made in view of the above, and one of its purposes is to provide an information processing system capable of improving reliability. The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的な実施の形態の概要を簡単に説明すれば、次のとおりである。 Of the inventions disclosed in the present application, the outline of a typical embodiment will be briefly described as follows.

本実施の形態による情報処理システムは、外部からの共通の入力に応じて同一の処理を実行し、第１〜第３処理結果をそれぞれ出力する第１〜第３処理ブロックと、第１〜第３処理ブロックにそれぞれ対応して設けられる第１〜第３診断ブロックと、第４診断ブロックとを備える。第Ｎ処理ブロック（Ｎ＝１，２，３）は、回路の構成情報を記憶する第Ｎ構成情報記憶部と、当該回路の構成情報に応じた回路を構築し、当該回路の動作によって第Ｎ処理結果を出力する第Ｎ再構成可能回路部とを有する。第Ｍ診断ブロック（Ｍ＝１，２，３，４）は、第１〜第３処理結果が入力され、その中のいずれかにエラーが有る場合には当該エラーが有る処理結果を特定する第Ｍエラー判定結果を出力する第Ｍ多数決判定部と、第Ｍエラー判定結果が入力され障害を診断する第Ｍ障害診断部と、他の診断ブロックとの間で通信を行うための第Ｍ通信部とを有する。ここで、第Ｍ障害診断部は、第Ｍエラー判定結果を反映した第Ｍ診断情報を第Ｍ通信部を介して他の障害診断部に送信し、他の障害診断部からの診断情報を第Ｍ通信部を介してそれぞれ受信する第Ｍ共有部を有する。 The information processing system according to the present embodiment executes the same processing according to a common input from the outside, and outputs first to third processing results, respectively, and first to third processing blocks. 1st-3rd diagnostic block provided corresponding to each of 3 process blocks, and a 4th diagnostic block are provided. The N-th processing block (N = 1, 2, 3) constructs an N-th configuration information storage unit that stores circuit configuration information and a circuit corresponding to the circuit configuration information. And an Nth reconfigurable circuit unit that outputs a processing result. The Mth diagnosis block (M = 1, 2, 3, 4) receives the first to third processing results, and if there is an error in any of them, the Mth diagnosis block (M = 1, 2, 3, 4) specifies the processing result having the error. An M-th majority determination unit that outputs an M error determination result, an M-th failure diagnosis unit that receives the M-th error determination result and diagnoses a failure, and an M-th communication unit that communicates with other diagnostic blocks And have. Here, the M-th fault diagnosis unit transmits the M-th diagnosis information reflecting the M-th error determination result to the other fault diagnosis unit via the M-th communication unit, and receives the diagnosis information from the other fault diagnosis unit. It has the Mth sharing part which receives via M communication part, respectively.

本願において開示される発明のうち、代表的な実施の形態によって得られる効果を簡単に説明すると、情報処理システムにおいて、信頼性の向上が実現可能になる。 Of the inventions disclosed in this application, the effects obtained by the representative embodiments will be briefly described. In the information processing system, it is possible to improve the reliability.

本発明の実施の形態１における情報処理システムの全体構成例を示すブロック図である。It is a block diagram which shows the example of whole structure of the information processing system in Embodiment 1 of this invention. 図１Ａにおける外部送受信ブロックのより詳細な構成例を示すブロック図である。It is a block diagram which shows the more detailed structural example of the external transmission / reception block in FIG. 1A. 図１Ａにおける診断ブロックのより詳細な構成例を示すブロック図である。It is a block diagram which shows the more detailed structural example of the diagnostic block in FIG. 1A. 図１Ａ〜図１Ｃの情報処理システムにおいて、その全体の処理内容の一例を示すフローチャートである。2 is a flowchart showing an example of the entire processing content in the information processing system of FIGS. 1A to 1C. 図２において、その障害回復処理の概略的な処理内容の一例を示すフローチャートである。In FIG. 2, it is a flowchart which shows an example of the schematic process content of the failure recovery process. 図１Ａ〜図１Ｃの情報処理システムにおいて、その構成不良箇所診断部の処理内容の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of processing contents of a defective configuration portion diagnosis unit in the information processing system of FIGS. 1A to 1C. FIG. 図１Ａ〜図１Ｃの情報処理システムにおいて、そのシステム障害診断部の処理内容の一例を示すフローチャートである。2 is a flowchart illustrating an example of processing contents of a system failure diagnosis unit in the information processing system of FIGS. 1A to 1C. 図１Ａ〜図１Ｃの情報処理システムにおいて、その障害診断情報共有部の処理内容の一例を示すフローチャートである。2 is a flowchart illustrating an example of processing contents of a failure diagnosis information sharing unit in the information processing system of FIGS. 1A to 1C. 図１Ａ〜図１Ｃの情報処理システムにおいて、その回復プラン判定部および回復プラン合意部の処理内容の一例を示すフローチャートである。2 is a flowchart illustrating an example of processing contents of a recovery plan determination unit and a recovery plan agreement unit in the information processing system of FIGS. 1A to 1C. 図７のフローチャートにおいて、回復プランを判定する際の判定基準の一例を示す表である。FIG. 8 is a table showing an example of determination criteria for determining a recovery plan in the flowchart of FIG. 7. 図１Ａ〜図１Ｃの情報処理システムにおいて、その外部送受信ブロック内の回復プラン実行部の詳細な構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a detailed configuration example of a recovery plan execution unit in the external transmission / reception block in the information processing system of FIGS. 1A to 1C. 図１Ａ〜図１Ｃの情報処理システムにおいて、その診断ブロック内の回復プラン実行部の詳細な構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a detailed configuration example of a recovery plan execution unit in the diagnostic block in the information processing system of FIGS. 1A to 1C. 図１０の回復プラン実行部において、当該回復プラン実行部が回復先の処理系統に属する場合の処理内容の一例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of processing contents when the recovery plan execution unit belongs to a recovery destination processing system in the recovery plan execution unit of FIG. 10. 図１０の回復プラン実行部において、当該回復プラン実行部が回復元の処理系統に属する場合の処理内容の一例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of processing contents when the recovery plan execution unit belongs to the recovery processing system in the recovery plan execution unit of FIG. 10. 図１０の回復プラン実行部において、当該回復プラン実行部が回復先および回復元の処理系統のいずれにも属しない場合の処理内容の一例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of processing contents when the recovery plan execution unit does not belong to either the recovery destination or the recovery processing system in the recovery plan execution unit of FIG. 10. 図９の回復プラン実行部の処理内容の一例を示すフローチャートである。It is a flowchart which shows an example of the processing content of the recovery plan execution part of FIG. 図１Ａ〜図１Ｃの情報処理システムにおいて、その回復プラン実行部が行う手動回復処理の処理内容の一例を示すフローチャートである。2 is a flowchart illustrating an example of processing contents of a manual recovery process performed by the recovery plan execution unit in the information processing system of FIGS. 1A to 1C. 図１Ａ〜図１Ｃの情報処理システムにおいて、そのモード情報に格納される各モードの一例およびその関係の一例を表す状態遷移図である。In the information processing system of Drawing 1A-Drawing 1C, it is a state transition diagram showing an example of each mode stored in the mode information, and an example of the relation. 図１Ａ〜図１Ｃの情報処理システムにおいて、そのシステム構成情報に格納される内容の一例を示す図である。In the information processing system of Drawing 1A-Drawing 1C, it is a figure showing an example of the contents stored in the system configuration information. 図１Ａ〜図１Ｃの情報処理システムにおいて、その障害診断情報共有部が行う図６とは異なる処理内容の一例を示すフローチャートである。7 is a flowchart illustrating an example of processing contents different from those in FIG. 6 performed by the failure diagnosis information sharing unit in the information processing system of FIGS. 1A to 1C. 図６のフローチャートにおいて、障害診断情報の交換方法の一例を示すシーケンス図である。FIG. 7 is a sequence diagram illustrating an example of a fault diagnosis information exchange method in the flowchart of FIG. 6. 図７のフローチャートにおいて、回復プランのビザンチン合意プロトコルに基づく交換方法の一例を示すシーケンス図である。FIG. 8 is a sequence diagram showing an example of an exchange method based on the recovery plan Byzantine agreement protocol in the flowchart of FIG. 7. 図７のフローチャートにおいて、回復プランのビザンチン合意プロトコルに基づく交換方法の一例を示すシーケンス図である。FIG. 8 is a sequence diagram showing an example of an exchange method based on the recovery plan Byzantine agreement protocol in the flowchart of FIG. 7. 本発明の実施の形態２による情報処理システムにおいて、その処理ブロックの構成例を示すブロック図である。It is a block diagram which shows the structural example of the processing block in the information processing system by Embodiment 2 of this invention.

以下の実施の形態においては便宜上その必要があるときは、複数のセクションまたは実施の形態に分割して説明するが、特に明示した場合を除き、それらは互いに無関係なものではなく、一方は他方の一部または全部の変形例、詳細、補足説明等の関係にある。また、以下の実施の形態において、要素の数等（個数、数値、量、範囲等を含む）に言及する場合、特に明示した場合および原理的に明らかに特定の数に限定される場合等を除き、その特定の数に限定されるものではなく、特定の数以上でも以下でも良い。 In the following embodiment, when it is necessary for the sake of convenience, the description will be divided into a plurality of sections or embodiments. However, unless otherwise specified, they are not irrelevant, and one is the other. Some or all of the modifications, details, supplementary explanations, and the like are related. Further, in the following embodiments, when referring to the number of elements (including the number, numerical value, quantity, range, etc.), especially when clearly indicated and when clearly limited to a specific number in principle, etc. Except, it is not limited to the specific number, and may be more or less than the specific number.

さらに、以下の実施の形態において、その構成要素（要素ステップ等も含む）は、特に明示した場合および原理的に明らかに必須であると考えられる場合等を除き、必ずしも必須のものではないことは言うまでもない。同様に、以下の実施の形態において、構成要素等の形状、位置関係等に言及するときは、特に明示した場合および原理的に明らかにそうでないと考えられる場合等を除き、実質的にその形状等に近似または類似するもの等を含むものとする。このことは、上記数値および範囲についても同様である。 Further, in the following embodiments, the constituent elements (including element steps and the like) are not necessarily indispensable unless otherwise specified and apparently essential in principle. Needless to say. Similarly, in the following embodiments, when referring to the shapes, positional relationships, etc. of the components, etc., the shapes are substantially the same unless otherwise specified, or otherwise apparent in principle. And the like are included. The same applies to the above numerical values and ranges.

《実施の形態の概要》
本実施の形態では、情報処理システム内に、単一障害点を少なくとも１つ含む場合を想定する。例えば、特許文献１と同様に、再構成可能回路部（代表的にはＦＰＧＡ）内に、三重化された処理系統と、その各処理結果を多数決判定する多数決判定部とを設けると、処理系統ではなく、多数決判定部に単一障害点が含まれる場合に、エラーの検出・訂正が困難となる。 << Summary of Embodiment >>
In the present embodiment, it is assumed that at least one single point of failure is included in the information processing system. For example, as in Patent Document 1, if a reconfigurable circuit unit (typically an FPGA) is provided with a triple processing system and a majority decision determining unit that determines the majority of each processing result, the processing system Instead, when a single point of failure is included in the majority decision determination unit, it becomes difficult to detect and correct an error.

また、システムの信頼性を高めるためには、再構成可能回路部によって構成される処理系統に障害が生じた際に、それを検出し、当該障害が発生した処理系統を自動回復するための診断ブロックを設けることが望ましい。ただし、多数決判定部の場合と同様に、処理系統ではなく、診断ブロックに単一障害点が含まれる場合がある。そこで、本実施の形態では、処理系統のみならず、多数決判定部および診断ブロックを少なくとも三重化させた情報処理システムを構築する。この際には、少なくとも三重化させる形態として様々な選択肢が考えられるため、どのような形態を用いるのが適切かを考慮する必要がある。 In addition, in order to improve the reliability of the system, when a failure occurs in the processing system configured by the reconfigurable circuit unit, it is detected and a diagnosis for automatically recovering the processing system in which the failure has occurred It is desirable to provide a block. However, as in the case of the majority decision unit, a single failure point may be included in the diagnosis block instead of the processing system. Therefore, in the present embodiment, an information processing system is constructed in which not only the processing system but also the majority decision unit and the diagnostic block are at least tripled. In this case, since various options can be considered as at least a triple form, it is necessary to consider what form is appropriate.

さらに、別の要求事項として、例えば金融取引処理等では、入力データと前回の処理結果等の内部状態（ステート）を持ち、当該内部状態によって次の処理結果が変化する処理（以降、ステートフル処理と呼ぶ）が用いられる。例えば、特許文献１の技術では、内部状態を持たない（すなわちステートレスな）処理系統を想定しているため、ある処理系統に障害が生じた際に、その処理系統（すなわち再構成可能回路部）の回路構成を再構成できたとしても、内部状態（ステート）を回復することは困難となる。そこで、この内部状態（ステート）の回復方法も含めて、前述した情報処理システムの適切な形態を考慮する必要がある。 Furthermore, as another requirement, for example, in financial transaction processing, etc., there is an internal state (state) such as input data and the previous processing result, and the next processing result changes depending on the internal state (hereinafter referred to as stateful processing). Is used). For example, since the technique of Patent Document 1 assumes a processing system having no internal state (that is, a stateless), when a failure occurs in a certain processing system, the processing system (that is, a reconfigurable circuit unit). Even if the circuit configuration can be reconfigured, it is difficult to recover the internal state. Therefore, it is necessary to consider an appropriate form of the information processing system described above, including the internal state (state) recovery method.

このようなことを鑑みて、本実施の形態よる情報処理システムは、例えば、図１Ａ〜図１Ｃ、図９および図１０を参照して、代表的には、以下のような構成および動作を備える。 In view of the above, the information processing system according to the present exemplary embodiment typically includes the following configurations and operations with reference to FIGS. 1A to 1C, 9, and 10, for example. .

（１）本実施の形態による情報処理システムは、外部からの共通の入力に応じて同一の処理を実行し、第１〜第３処理結果をそれぞれ出力する第１〜第３処理ブロック（３１０Ａ〜３１０Ｃ）と、第１〜第３処理ブロックにそれぞれ対応して設けられる第１〜第３診断ブロック（３００Ａ〜３００Ｃ）と、第４診断ブロック（１０α）とを有する。第Ｎ処理ブロック（Ｎ＝１，２，３）は、回路の構成情報を記憶する第Ｎ構成情報記憶部（３１８Ａ〜３１８Ｃ）と、当該回路の構成情報に応じた回路を構築し、当該回路の動作によって第Ｎ処理結果を出力する第Ｎ再構成可能回路部（３１９Ａ〜３１９Ｃ）とを有する。第Ｍ診断ブロック（Ｍ＝１，２，３，４）は、第Ｍ多数決判定部（１４ｂ〜１４ｄ，１４ａα）と、第Ｍ共有部（１８ｂ〜１８ｄ，１８ａα）を含む第Ｍ障害診断部（２３ｂ〜２３ｄ，２３ａα）と、他の診断ブロックとの間で通信を行うための第Ｍ通信部（３０１Ａ〜３０１Ｃ，１３α）とを有する。 (1) The information processing system according to the present embodiment executes the same processing in response to a common input from the outside, and outputs first to third processing results (310A to 310A, respectively). 310C), first to third diagnosis blocks (300A to 300C) provided corresponding to the first to third processing blocks, respectively, and a fourth diagnosis block (10α). The N-th processing block (N = 1, 2, 3) constructs an N-th configuration information storage unit (318A to 318C) that stores circuit configuration information and a circuit corresponding to the circuit configuration information. The N-th reconfigurable circuit unit (319A to 319C) that outputs the N-th processing result by the above operation. The M-th diagnosis block (M = 1, 2, 3, 4) includes an M-th failure diagnosis unit (14b to 14d, 18aα) and an M-th failure diagnosis unit (18b to 18d, 18aα). 23b to 23d, 23aα) and an M-th communication unit (301A to 301C, 13α) for performing communication with other diagnostic blocks.

例えば、第１〜第３処理ブロックのいずれかに単一障害が生じた場合、３個の処理ブロック内の再構成可能回路部から出力される第１〜第３処理結果の中の一つが他と異なる。第Ｍ診断ブロック内の第Ｍ多数決判定部は、第１〜第３処理結果が入力され、第１〜第３処理結果の中のいずれかにエラーが有る場合には、エラーが有る処理結果を特定する情報を含む第Ｍエラー判定結果を第Ｍ障害診断部に出力する。この際に、第１〜第４診断ブロックは正常であることを前提とすると、第１〜第４多数決判定部から出力される第１〜第４エラー判定結果は全て同じになる。第Ｍ障害診断部は、第Ｍエラー判定結果に基づいて、どの処理ブロック内の再構成可能回路部に単一障害が有るかを判断することができる。 For example, when a single failure occurs in any of the first to third processing blocks, one of the first to third processing results output from the reconfigurable circuit unit in the three processing blocks is another And different. The M-th majority decision determination unit in the M-th diagnosis block inputs the first to third processing results, and if there is an error in any of the first to third processing results, the processing result having an error is displayed. The Mth error determination result including the specified information is output to the Mth fault diagnosis unit. At this time, assuming that the first to fourth diagnostic blocks are normal, all the first to fourth error determination results output from the first to fourth majority determination units are the same. The Mth fault diagnosis unit can determine in which processing block the reconfigurable circuit unit has a single fault based on the Mth error determination result.

ただし、単一障害が処理ブロックではなく多数決判定部で生じる場合がある。例えば、第１多数決判定部に単一障害が生じた場合、第１障害診断部は、第１多数決判定部から、あたかも第１〜第３処理ブロックの中のいずれかに単一障害が有るかのような第１エラー判定結果を受ける可能性がある。すなわち、第１〜第４障害診断部の中のいずれか一つの障害診断部のみが障害が有ると判定し、残りの障害診断部は障害が無いと判定するような事態が生じ得る。 However, a single failure may occur in the majority decision unit instead of the processing block. For example, when a single failure occurs in the first majority decision determination unit, the first failure diagnosis unit determines whether any of the first to third processing blocks has a single failure from the first majority decision determination unit. There is a possibility of receiving the first error determination result. That is, a situation may occur in which only one of the first to fourth failure diagnosis units determines that there is a failure and the remaining failure diagnosis units determine that there is no failure.

そこで、第Ｍ障害診断部は、第Ｍ多数決判定部から受けた第Ｍエラー判定結果を反映して第Ｍ診断情報を生成し、第Ｍ診断情報を第Ｍ通信部を介して他の障害診断部に向けて送信し、他の障害診断部からの診断情報を第Ｍ通信部を介してそれぞれ受信する。すなわち、第Ｍ障害診断部は、第１〜第４障害診断部の間で第１〜第４診断情報を共有するための第Ｍ共有部を有する。この共有動作によって、各障害診断部は、単一障害が多数決判定部で生じている場合を検出できる。 Therefore, the M-th failure diagnosis unit reflects the M-th error determination result received from the M-th majority decision determination unit, generates M-th diagnosis information, and passes the M-th diagnosis information to another failure diagnosis via the M-th communication unit. And receives diagnostic information from other fault diagnosis units via the Mth communication unit. That is, the Mth failure diagnosis unit includes an Mth sharing unit for sharing the first to fourth diagnosis information between the first to fourth failure diagnosis units. By this sharing operation, each failure diagnosis unit can detect a case where a single failure has occurred in the majority decision determination unit.

具体的には、各障害診断部は、例えば、第１〜第４診断情報の共有が正常に行われた場合、多数決判定部に単一障害は発生していないと判断できる。一方、各障害診断部は、第１〜第４診断情報の中のいずれか一つの診断情報のみを受信したような場合には、当該診断情報を送信した診断ブロック内の多数決判定部に単一障害が生じている可能性があると判断できる。なお、例えば、単一障害の発生箇所が診断ブロック内の通信部で有った場合でも、当該診断ブロック内の多数決判定部に向けた第１〜第３処理結果の入力動作が不安定となるため、結果的に多数決判定部に単一障害が生じた場合と同様の事態が生じ得る。このように、第Ｍ共有部を有することで、特許文献１と異なり多数決判定部に障害が生じている場合を検出することが可能となり、情報処理システムの信頼性の向上が図れる。 Specifically, each failure diagnosis unit can determine that no single failure has occurred in the majority decision determination unit, for example, when the first to fourth diagnosis information is normally shared. On the other hand, when each failure diagnosis unit receives only one of the first to fourth diagnosis information, the failure diagnosis unit is connected to the majority decision unit in the diagnosis block that has transmitted the diagnosis information. It can be determined that a failure may have occurred. Note that, for example, even when a single failure occurs in the communication unit in the diagnostic block, the input operation of the first to third processing results toward the majority decision determination unit in the diagnostic block becomes unstable. As a result, the same situation as when a single failure occurs in the majority decision determination unit may occur. Thus, by having the M-th sharing unit, it is possible to detect a case where a failure has occurred in the majority decision determination unit unlike Patent Document 1, and the reliability of the information processing system can be improved.

（２）また、第Ｍ障害診断部（２３ｂ〜２３ｄ，２３ａα）は、さらに、第Ｍ回復プラン判定部（１９ｂ〜１９ｄ，１９ａα）を有し、第Ｍ診断ブロック（３００Ａ〜３００Ｃ，１０α）は、さらに、第Ｍ回復プラン実行部（２１ｂ〜２１ｄ，２１ａα）を有する。第Ｍ回復プラン判定部は、第Ｍ共有部で共有した第１〜第４診断情報に基づいて、第１〜第３処理ブロックの中のいずれか一つに障害が有るか否かを判定し、第１〜第３処理ブロックの中のいずれか一つに障害が有る場合には第Ｍ回復プランを生成する。第Ｍ回復プラン実行部は、第Ｍ回復プランに基づいて、回復先の処理ブロックの自動回復処理を実行する。 (2) The M-th failure diagnosis unit (23b-23d, 23aα) further includes an M-th recovery plan determination unit (19b-19d, 19aα), and the M-th diagnosis block (300A-300C, 10α) In addition, an M-th recovery plan execution unit (21b to 21d, 21aα) is included. The Mth recovery plan determination unit determines whether any one of the first to third processing blocks has a failure based on the first to fourth diagnosis information shared by the Mth sharing unit. When any one of the first to third processing blocks has a failure, the Mth recovery plan is generated. The Mth recovery plan execution unit executes an automatic recovery process of the recovery destination processing block based on the Mth recovery plan.

具体的には、第Ｍ回復プラン判定部は、例えば、第１〜第４診断情報の共有が正常に行われ、第１〜第４診断情報に包含される第１〜第４エラー判定結果が全て一致している場合には、第１〜第３処理ブロックの中のいずれか一つの単一障害であると判断する。そして、第Ｍ回復プラン判定部は、当該処理ブロックの単一障害を自動回復する（すなわち再構成可能回路部を再構成する）ための回復プラン（自動回復プラン）を定める。自動回復プランの中には、例えば、一致している第１〜第４エラー判定結果から判明する回復先の処理ブロック（すなわち単一障害が生じている処理ブロック）および回復元の処理ブロック（すなわち正常な処理ブロック）の情報等が含まれる。 Specifically, the M-th recovery plan determination unit, for example, normally shares the first to fourth diagnosis information, and the first to fourth error determination results included in the first to fourth diagnosis information are If all match, it is determined that the failure is any one of the first to third processing blocks. Then, the Mth recovery plan determination unit determines a recovery plan (automatic recovery plan) for automatically recovering a single failure of the processing block (that is, reconfiguring the reconfigurable circuit unit). In the automatic recovery plan, for example, a recovery destination processing block (that is, a processing block in which a single failure has occurred) and a recovery block processing block (that is, a processing block in which a single failure has occurred) determined from the matching first to fourth error determination results (that is, Normal processing block) information and the like are included.

一方、例えば、第１〜第４診断情報の中のいずれか一つの診断情報のみが生成される場合には、単一障害が診断ブロック内で生じている可能性が有る。すなわち、前述した多数決判定部や通信部で生じている場合が考えられ、これに限らず、障害診断部で生じている場合も考えられる。このような場合には、自動回復を行うことが困難となり得るため、各障害診断部は、手動での回復が必要であることを表す回復プラン（手動回復プラン）を定める。手動回復プランの中には、例えば、第１〜第４障害情報から推定される障害の発生箇所（すなわちどの診断ブロックで障害が発生しているか）の情報等が含まれる。例えば、第１〜第４診断情報の中のいずれか一つの診断情報のみが生成される場合、この一つの診断情報を生成した診断ブロックに障害が発生していると推定される。第Ｍ回復プラン実行部は、第Ｍ障害診断部から自動回復プランが送信された場合には、当該自動回復プランに基づく自動回復処理を実行し、手動回復プランが送信された場合には、外部に向けて手動回復が必要な旨を表す通知等を行う。 On the other hand, for example, when only one of the first to fourth diagnostic information is generated, there is a possibility that a single failure has occurred in the diagnostic block. That is, the case where it arises in the majority decision determination part and communication part which were mentioned above is considered, and not only this but the case where it arises in a fault diagnosis part is also considered. In such a case, since it may be difficult to perform automatic recovery, each failure diagnosis unit defines a recovery plan (manual recovery plan) indicating that manual recovery is necessary. The manual recovery plan includes, for example, information on the location of the failure estimated from the first to fourth failure information (that is, in which diagnostic block the failure has occurred). For example, when only one of the first to fourth diagnostic information is generated, it is estimated that a failure has occurred in the diagnostic block that generated the one diagnostic information. The M-th recovery plan execution unit executes an automatic recovery process based on the automatic recovery plan when an automatic recovery plan is transmitted from the M-th failure diagnosis unit, and when a manual recovery plan is transmitted, an external recovery plan is executed. A notification indicating that manual recovery is necessary is performed.

（３）回復プラン実行部によって自動回復処理が行われている期間では、回復先の処理ブロックを除く残り２個の処理ブロック（その内の１個は回復元の処理ブロックでもある）は、外部からの共通の入力に応じた同一の処理をそのまま継続する。この際に、多数決判定部は、当該２個の処理ブロックからの２個の処理結果を受け、当該２個の処理結果が同一の場合には当該同一の処理結果を外部に向けて出力する。一方、多数決判定部は、当該２個の処理結果が異なる場合には、システムの停止等を行う。 (3) During the period in which automatic recovery processing is performed by the recovery plan execution unit, the remaining two processing blocks excluding the recovery destination processing block (one of which is also a recovery block) are external The same processing according to the common input from is continued as it is. At this time, the majority decision determination unit receives the two processing results from the two processing blocks, and outputs the same processing result to the outside when the two processing results are the same. On the other hand, when the two processing results are different, the majority decision determining unit stops the system.

また、回復プラン実行部によって手動回復の通知がなされてから、これに応じた手動回復が完了するまでの期間では、正常な診断ブロックに対応する処理ブロックは、外部からの共通の入力に応じた同一の処理をそのまま継続する。正常な診断ブロックは、前述したように、障害診断部からの手動回復プランに基づいて特定することができる。このように、情報処理システムは、仮に１個の処理ブロックで障害が生じている間でも残り２個の処理ブロックを用いて縮退運転を行う。この縮退運転の期間では、情報処理システムは、多数決判定部を用いてエラーを訂正することは困難となるものの、多数決判定部を用いてエラーを検出することは可能となっている。これによって、情報処理システムの稼働を継続しつつ誤った処理結果が外部に流出することを防止することができ、情報処理システムの信頼性を向上させることが可能になる。 In addition, during the period from when manual recovery is notified by the recovery plan execution unit to when manual recovery corresponding to this is completed, the processing block corresponding to the normal diagnostic block corresponds to the common external input. The same processing is continued as it is. As described above, the normal diagnosis block can be specified based on the manual recovery plan from the failure diagnosis unit. In this way, the information processing system performs the degenerate operation using the remaining two processing blocks even if a failure occurs in one processing block. In this degenerate operation period, it is difficult for the information processing system to correct an error using the majority decision determination unit, but it is possible to detect an error using the majority decision determination unit. As a result, it is possible to prevent an erroneous processing result from leaking to the outside while continuing the operation of the information processing system, and to improve the reliability of the information processing system.

（４）第Ｍ障害診断部（２３ｂ〜２３ｄ，２３ａα）は、さらに、第Ｍ回復プラン合意部（２０ｂ〜２０ｄ，２０ａα）を有する。第Ｍ回復プラン合意部は、第Ｍ回復プラン判定部によって生成された第Ｍ回復プランを、第Ｍ通信部を介して、ビザンチン合意プロトコルを用いて他の回復プラン判定部との間で交換し、当該交換した結果に基づいて確定した第Ｍ回復プランを第Ｍ回復プラン実行部に出力する。例えば、回復プラン判定部で単一障害が生じた場合や、あるいは通信部等でビザンチン障害が生じた場合等では、各障害診断部によって誤った回復プランが定められる可能性が有る。そこで、各障害診断部は、ビザンチン合意プロトコルを用いて回復プランを合意する。 (4) The M-th failure diagnosis unit (23b-23d, 23aα) further includes an M-th recovery plan agreement unit (20b-20d, 20aα). The Mth recovery plan agreement unit exchanges the Mth recovery plan generated by the Mth recovery plan determination unit with another recovery plan determination unit using the Byzantine agreement protocol via the Mth communication unit. The M-th recovery plan determined based on the exchange result is output to the M-th recovery plan execution unit. For example, when a single failure occurs in the recovery plan determination unit, or when a Byzantine failure occurs in the communication unit or the like, there is a possibility that each failure diagnosis unit may determine an incorrect recovery plan. Therefore, each failure diagnosis unit agrees on a recovery plan using a Byzantine agreement protocol.

具体的には、まず、第Ｍ回復プラン合意部は、第Ｍ回復プラン判定部によって生成された第Ｍ回復プランを他の障害診断部との間でビザンチン合意プロトコルにより交換する。次いで、第Ｍ回復プラン合意部は、このビザンチン合意プロトコルの交換によって得られた第１〜第４回復プランが全て一致する場合に、当該回復プランを一致した回復プランとして第Ｍ回復プラン実行部に送信する。一方、第１〜第４回復プランが全て一致する場合に該当しない場合には、ビザンチン障害等を含めて診断ブロック内に障害が存在する可能性が有る。この場合、第Ｍ回復プラン合意部は、例えば、第１〜第４回復プランの中から合意によって得た回復プラン（例えば、４個の回復プランの中で一致する３個の回復プラン）を手動回復プランとして第Ｍ回復プラン実行部に送信する。 Specifically, first, the Mth recovery plan agreement unit exchanges the Mth recovery plan generated by the Mth recovery plan determination unit with another failure diagnosis unit by a Byzantine agreement protocol. Next, when all of the first to fourth recovery plans obtained by exchanging this Byzantine agreement protocol match, the Mth recovery plan agreement unit determines that the recovery plan is the same as the recovery plan to the Mth recovery plan execution unit. Send. On the other hand, when the first to fourth recovery plans all match, there is a possibility that a failure exists in the diagnostic block including a Byzantine failure. In this case, for example, the Mth recovery plan agreement unit manually selects a recovery plan obtained by agreement from the first to fourth recovery plans (for example, three recovery plans that match among the four recovery plans). The recovery plan is transmitted to the Mth recovery plan execution unit.

第Ｍ回復プラン実行部は、第Ｍ回復プラン合意部から一致した回復プランが送信され、それが自動回復プランである場合には、当該自動回復プランに基づく自動回復処理を実行する。一方、第Ｍ回復プラン実行部は、第Ｍ回復プラン合意部から手動回復プランが送信された際には、外部に向けて手動回復が必要な旨を表す通知等を行う。このように、ビザンチン合意プロトコルを経た上で自動回復プランに基づく自動回復処理を行うことで、誤った自動回復処理が行われるような事態を防止でき、信頼性を高めることが可能になる。例えば、ある診断ブロックが自身の単一障害によって誤った回復プランを定めた場合や、ある診断ブロックでビザンチン障害が生じた場合には、ビザンチン合意プロトコルで一致した回復プランが得られないため、情報処理システムとして誤った自動回復処理は実行されない。 The Mth recovery plan execution unit executes the automatic recovery process based on the automatic recovery plan when the recovery plan that matches from the Mth recovery plan agreement unit is transmitted and is the automatic recovery plan. On the other hand, when a manual recovery plan is transmitted from the Mth recovery plan consensus unit, the Mth recovery plan execution unit performs an external notification indicating that manual recovery is necessary. In this way, by performing the automatic recovery process based on the automatic recovery plan after passing through the Byzantine agreement protocol, it is possible to prevent a situation in which an erroneous automatic recovery process is performed, and to improve the reliability. For example, if a diagnostic block determines a wrong recovery plan due to its own single failure, or if a Byzantine failure occurs in a diagnostic block, a consistent recovery plan cannot be obtained by the Byzantine agreement protocol. The automatic recovery process that is erroneous as the processing system is not executed.

また、第１〜第４回復プランが全て一致する場合に該当しない場合でも、ビザンチン合意プロトコルを用いた合意によって、障害が生じていない系統に属する３個の診断ブロックの間で同じ手動回復プランを定めることが可能になる。そして、この手動回復プランに基づいて、各診断ブロックの中から正常な診断ブロックを特定することができ、当該正常な診断ブロックを用いて前述したような縮退運転を実行することが可能になる。 In addition, even if the first to fourth recovery plans do not all match, the same manual recovery plan is established among the three diagnostic blocks belonging to the system that has not failed by the agreement using the Byzantine agreement protocol. It becomes possible to determine. And based on this manual recovery plan, a normal diagnostic block can be specified from each diagnostic block, and it becomes possible to perform the above-mentioned degenerate operation using the normal diagnostic block.

（５）回復プラン実行部が、自動回復プランに基づいて自動回復処理を実行する際に、各処理ブロックでの処理内容がステートフル処理であることを前提とすると、再構成可能回路部の自動回復処理（回路の再構成）に加えて、そのステートも回復する必要がある。そこで、第Ｎ処理ブロック（３１０Ａ〜３１０Ｃ）は、さらに、第Ｎ再構成可能回路部の処理の実行過程で得られるステートを保持する第Ｎステート記憶部（３１４Ａ〜３１４Ｃ）を有する。そして、第Ｍ回復プラン実行部（２１ｂ〜２１ｄ，２１ａα）は、回復先の処理ブロックの自動回復処理を実行する際に、回復先の処理ブロック内の構成情報記憶部の記憶情報に加えて、ステート記憶部の記憶情報を回復させる。これによって、特許文献１と異なり、ステートフル処理を含む金融機関向け等の情報処理システムにおいても自動回復処理を実現することが可能になる。 (5) When the recovery plan execution unit executes automatic recovery processing based on the automatic recovery plan, assuming that the processing content in each processing block is stateful processing, automatic recovery of the reconfigurable circuit unit In addition to processing (circuit reconfiguration), the state must also be restored. Therefore, the Nth processing block (310A to 310C) further includes an Nth state storage unit (314A to 314C) that holds a state obtained in the process of executing the process of the Nth reconfigurable circuit unit. Then, the M-th recovery plan execution unit (21b to 21d, 21aα), when executing the automatic recovery processing of the recovery destination processing block, in addition to the storage information of the configuration information storage unit in the recovery destination processing block, The stored information in the state storage unit is recovered. Thus, unlike Patent Document 1, it is possible to realize automatic recovery processing even in an information processing system for a financial institution including stateful processing.

（６）回復先の処理ブロック内の構成情報記憶部の記憶情報を回復させる際には、特に限定はされないが、例えば、回復元の処理ブロック内の構成情報記憶部の記憶情報をコピーする処理や、あるいは、別途、不揮発性メモリ等に回路の構成情報を記憶させておき、当該記憶情報をコピーする処理等を行えばよい。一方、回復先の処理ブロック内のステート記憶部の記憶情報を回復させる際には、例えば、以下のような構成を備えればよい。 (6) When the storage information in the configuration information storage unit in the recovery destination processing block is recovered, there is no particular limitation. For example, the process of copying the storage information in the configuration information storage unit in the recovery processing block Alternatively, separately, the circuit configuration information may be stored in a nonvolatile memory or the like, and the stored information may be copied. On the other hand, when the storage information in the state storage unit in the recovery destination processing block is recovered, for example, the following configuration may be provided.

第Ｎステート記憶部（３１４Ａ〜３１４Ｃ）は、ステートの回復を開始する第１時点におけるステートが保持される第Ｎステート退避部（３１５Ａ１〜３１５Ｃ１）を有する。第４回復プラン実行部（２１ａα）は、当該第１時点の後に外部から入力されるデータを順次保持するデータ保持部（２１８ａα）を有する。第１〜第３回復プラン実行部（２１ｂ〜２１ｄ）のそれぞれは、回復先の処理ブロック内の構成情報記憶部の記憶情報を回復させる構成回復部（２１４ｂ〜２１４ｄ）と、ステート記憶部の記憶情報を回復させるステート回復部（２１５ｂ〜２１５ｄ）とを有する。ステート回復部は、まず、回復先の処理ブロック内のステート記憶部に対して回復元の処理ブロック内のステート退避部で保持されるステートを設定する。次いで、回復先の処理ブロック内の構成情報記憶部の記憶情報が回復したのちに回復先の処理ブロックに向けて第４回復プラン実行部内のデータ保持部で保持されるデータを順次出力する。これにより、障害から回復した処理ブロックは、ステートを含めて正常な処理ブロックと同じ状態に順次近づいていき、正常な処理ブロックと同じ状態に達した時点で、ステートを含めて完全に障害から回復する。 The Nth state storage units (314A to 314C) include Nth state saving units (315A1 to 315C1) that hold the state at the first time point when the state recovery starts. The fourth recovery plan execution unit (21aα) includes a data holding unit (218aα) that sequentially holds data input from the outside after the first time point. Each of the first to third recovery plan execution units (21b to 21d) includes a configuration recovery unit (214b to 214d) that recovers the storage information of the configuration information storage unit in the recovery destination processing block, and the storage of the state storage unit. A state recovery unit (215b to 215d) for recovering information. First, the state recovery unit sets a state held by the state saving unit in the processing block for recovery and restoration in the state storage unit in the recovery destination processing block. Next, after the storage information in the configuration information storage unit in the recovery destination processing block is recovered, the data held in the data holding unit in the fourth recovery plan execution unit is sequentially output toward the recovery destination processing block. As a result, the processing block recovered from the failure sequentially approaches the same state as the normal processing block including the state, and when it reaches the same state as the normal processing block, it completely recovers from the failure including the state. To do.

（７）第４診断ブロック（１０α）は、例えば、外部との間のインタフェースを担う外部送受信ブロック（１０α）内に設けられてもよい。そして、第４診断ブロック内の第４多数決判定部（１４ａα）は、第１〜第３処理結果に対して多数決判定を行うことで外部に出力する処理結果を定める。すなわち、外部に出力するための最終的な処理結果は、第４多数決判定部によって定められる。ただし、この場合、第４多数決判定部に障害が生じた場合には、外部に向けて正しい処理結果を出力することが困難となる。そこで、外部送受信ブロックは、通常用（１０α）とバックアップ用（１０β）を含めて冗長化することが望ましい。第４多数決判定部の障害の有無は、例えば、前述したように第Ｍ共有部での第１〜第４診断情報の共有結果に基づいて判定することができる。したがって、この判定結果に基づいて通常用とバックアップ用の切り替えを行えばよい。 (7) The fourth diagnosis block (10α) may be provided, for example, in the external transmission / reception block (10α) that serves as an interface with the outside. And the 4th majority decision part (14a (alpha)) in a 4th diagnostic block determines the process result output outside by performing majority decision with respect to a 1st-3rd process result. That is, the final processing result for output to the outside is determined by the fourth majority decision determination unit. However, in this case, when a failure occurs in the fourth majority decision determination unit, it is difficult to output a correct processing result to the outside. Therefore, it is desirable to make the external transmission / reception block redundant including the normal use (10α) and the backup use (10β). The presence / absence of a failure in the fourth majority determination unit can be determined based on, for example, the sharing result of the first to fourth diagnostic information in the Mth sharing unit as described above. Therefore, switching between normal use and backup use may be performed based on the determination result.

なお、多数決判定部の障害を想定して、例えば、第１〜第４多数決判定部の間で、それぞれの多数決判定によって得られた処理結果をビザンチン合意し、これによって外部に出力するための最終的な処理結果を定めるような方式等が考えられる。ただし、この場合、ビザンチン合意に伴い、外部に向けて処理結果を出力する際のレイテンシが増大し、情報処理システムの高速性が阻害される恐れがある。そこで、このように、１個の多数決判定部で最終的な処理結果を定める方式とし、かつこの１個の多数決判定部（それを含んだ外部送受信ブロック）を冗長化することで、高速性と信頼性の両立を図ることが可能になる。 In addition, assuming the failure of the majority decision unit, for example, between the first to fourth majority decision units, the processing result obtained by each majority decision is agreed by the Byzantine, and thereby the final for output to the outside A method for determining a typical processing result can be considered. However, in this case, with the Byzantine agreement, there is a possibility that the latency when outputting the processing result to the outside increases and the high speed of the information processing system is hindered. Thus, by adopting a method in which the final processing result is determined by one majority deciding unit, and by making this one majority deciding unit (external transmission / reception block including it) redundant, high speed and It becomes possible to achieve both reliability.

以上のように、本実施の形態の情報処理システムを用いることで、代表的には、信頼性の向上が実現可能になる。例えば、ＦＰＧＡ等の再構成可能回路を含むコンポーネントが少なくとも三重化された情報処理システムにおいて、任意のコンポーネントに単一障害が発生した後も、情報処理を停止することなく、障害前と同じステートフル処理を継続して実行することが可能になる。 As described above, by using the information processing system according to the present embodiment, typically, improvement in reliability can be realized. For example, in an information processing system in which components including a reconfigurable circuit such as an FPGA are at least tripled, even after a single failure occurs in any component, the same stateful processing as before the failure without stopping the information processing Can be executed continuously.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

（実施の形態１）
《情報処理システムの全体構成》
図１Ａは、本発明の実施の形態１における情報処理システムの全体構成例を示すブロック図である。図１Ｂは、図１Ａにおける外部送受信ブロックのより詳細な構成例を示すブロックであり、図１Ｃは、図１Ａにおける診断ブロックのより詳細な構成例を示すブロック図である。図１Ａの情報処理システムは、外部送受信ブロック１０α，１０βと、診断ブロック３００Ａ，３００Ｂ，３００Ｃと、処理ブロック３１０Ａ，３１０Ｂ，３１０Ｃと、通信路Ｌ１，Ｌ２を備える。 (Embodiment 1)
<< Overall configuration of information processing system >>
FIG. 1A is a block diagram showing an example of the overall configuration of the information processing system according to Embodiment 1 of the present invention. 1B is a block diagram showing a more detailed configuration example of the external transmission / reception block in FIG. 1A, and FIG. 1C is a block diagram showing a more detailed configuration example of the diagnostic block in FIG. 1A. The information processing system in FIG. 1A includes external transmission / reception blocks 10α and 10β, diagnostic blocks 300A, 300B, and 300C, processing blocks 310A, 310B, and 310C, and communication paths L1 and L2.

外部送受信ブロック１０α，１０βは、同一構成のブロックからなるＨＡ（High Availability）システム（言い換えれば冗長システム）であり、外部送受信ブロック１０αは実行系（通常用）、外部送受信ブロック１０βは待機系（バックアップ用）である。外部送受信ブロック１０α，１０βは、同一構成からなるＨＡシステムであるが、外部送受信ブロックが単一障害後も正常に稼働できる機能を備えていれば、本実施の形態はこれに限定されない。 The external transmission / reception blocks 10α and 10β are HA (High Availability) systems (in other words, redundant systems) composed of blocks having the same configuration, the external transmission / reception block 10α is an execution system (for normal use), and the external transmission / reception block 10β is a standby system (backup). For). Although the external transmission / reception blocks 10α and 10β are HA systems having the same configuration, the present embodiment is not limited to this as long as the external transmission / reception block has a function capable of operating normally even after a single failure.

診断ブロック３００Ａ，３００Ｂ，３００Ｃは、同一構成であり、三重化された処理ブロック３１０Ａ，３１０Ｂ，３１０Ｃをそれぞれ診断および回復させるためのブロックである。診断ブロック３００Ａ，３００Ｂ，３００Ｃは、例えば、ＦＰＧＡ等の再構成可能回路で構成したり、またはＣＰＵ（Central Processing Unit）等によるプログラム処理を実行するコンピュータシステムで構成したり、あるいはこれらの組合せで構成することが可能である。診断ブロック３００Ａ，３００Ｂ，３００Ｃは、同一構成の三重化されたブロックであるが、診断ブロックにおける単一障害点が三重化されていれば、本実施の形態はこれに限定されない。 The diagnosis blocks 300A, 300B, and 300C have the same configuration, and are blocks for diagnosing and recovering the triple processing blocks 310A, 310B, and 310C, respectively. The diagnostic blocks 300A, 300B, and 300C are configured by, for example, a reconfigurable circuit such as an FPGA, a computer system that executes program processing by a CPU (Central Processing Unit), or a combination of these. Is possible. The diagnosis blocks 300A, 300B, and 300C are triple blocks having the same configuration, but the present embodiment is not limited to this as long as the single failure point in the diagnosis block is tripled.

処理ブロック３１０Ａ，３１０Ｂ，３１０Ｃは、同一構成であり、診断ブロック３００Ａ，３００Ｂ，３００Ｃと同様に三重化されており、外部送受信ブロック１０α，１０βから共通に送信される入力データを処理するためのブロックである。処理ブロック３１０Ａ，３１０Ｂ，３１０Ｃは、同一構成の三重化されたブロックであるが、処理ブロックにおける単一障害点が三重化されていれば、本実施の形態はこれに限定されない。 The processing blocks 310A, 310B, and 310C have the same configuration, are tripled in the same way as the diagnostic blocks 300A, 300B, and 300C, and are blocks for processing input data transmitted in common from the external transmission / reception blocks 10α and 10β. It is. The processing blocks 310A, 310B, and 310C are triple blocks having the same configuration, but the present embodiment is not limited to this as long as the single failure point in the processing block is tripled.

通信路Ｌ１は、外部送受信ブロック１０α，１０βと、図１Ａの情報処理システムを利用する外部の情報処理装置との間で通信を行うための通信路である。通信路Ｌ２は、外部送受信ブロック１０α，１０βと、診断ブロック３００Ａ，３００Ｂ，３００Ｃと、処理ブロック３１０Ａ，３１０Ｂ，３１０Ｃとの間で内部通信を行うための通信路である。例えば、通信路Ｌ１，Ｌ２を構成する各部品等は冗長化されており、高信頼な通信が行えることを前提とする。ただし、通信路Ｌ１，Ｌ２が十分に高信頼であれば、本実施の形態はこれに限定されない。 The communication path L1 is a communication path for performing communication between the external transmission / reception blocks 10α and 10β and an external information processing apparatus using the information processing system of FIG. 1A. The communication path L2 is a communication path for performing internal communication among the external transmission / reception blocks 10α and 10β, the diagnosis blocks 300A, 300B, and 300C and the processing blocks 310A, 310B, and 310C. For example, it is assumed that the components constituting the communication paths L1 and L2 are made redundant so that highly reliable communication can be performed. However, the present embodiment is not limited to this as long as the communication paths L1 and L2 are sufficiently reliable.

外部送受信ブロック１０αは、外部通信部１１αと、入力データ投入部１２αと、第一通信部１３αと、障害診断回復系統ａαを備える。障害診断回復系統ａαは、多数決判定部１４ａαと、障害診断部２３ａαと、回復プラン実行部２１ａαを備える。障害診断部２３ａαは、より詳細には、図１Ｂに示すように、システム障害診断部１６ａαと、障害診断情報共有部１８ａαと、回復プラン判定部１９ａαと、回復プラン合意部２０ａαを有する。また、障害診断回復系統ａαは、より詳細には、図１Ｂに示すように、システム構成情報１７ａαと、モード情報２２ａαを更に有する。 The external transmission / reception block 10α includes an external communication unit 11α, an input data input unit 12α, a first communication unit 13α, and a failure diagnosis recovery system aα. The failure diagnosis recovery system aα includes a majority decision determination unit 14aα, a failure diagnosis unit 23aα, and a recovery plan execution unit 21aα. More specifically, the failure diagnosis unit 23aα includes a system failure diagnosis unit 16aα, a failure diagnosis information sharing unit 18aα, a recovery plan determination unit 19aα, and a recovery plan agreement unit 20aα as illustrated in FIG. 1B. More specifically, the fault diagnosis recovery system aα further includes system configuration information 17aα and mode information 22aα as shown in FIG. 1B.

また、図示していないが、外部送受信ブロック１０βも同様に、外部通信部１１βと、入力データ投入部１２βと、第一通信部１３βと、障害診断回復系統ａβを備える。障害診断回復系統ａβは、多数決判定部１４ａβと、障害診断部２３ａβと、回復プラン実行部２１ａβを備える。障害診断部２３ａβは、より詳細には、図１Ｂと同様に、システム障害診断部１６ａβと、障害診断情報共有部１８ａβと、回復プラン判定部１９ａβと、回復プラン合意部２０ａβを有する。また、障害診断回復系統ａβは、より詳細には、図１Ｂと同様に、システム構成情報１７ａβと、モード情報２２ａβを更に有する。 Although not shown, the external transmission / reception block 10β similarly includes an external communication unit 11β, an input data input unit 12β, a first communication unit 13β, and a failure diagnosis recovery system aβ. The failure diagnosis recovery system aβ includes a majority decision determination unit 14aβ, a failure diagnosis unit 23aβ, and a recovery plan execution unit 21aβ. More specifically, the failure diagnosis unit 23aβ includes a system failure diagnosis unit 16aβ, a failure diagnosis information sharing unit 18aβ, a recovery plan determination unit 19aβ, and a recovery plan agreement unit 20aβ as in FIG. 1B. More specifically, the fault diagnosis recovery system aβ further includes system configuration information 17aβ and mode information 22aβ as in FIG. 1B.

診断ブロック３００Ａは、第一通信部３０１Ａと、第二通信部３０３Ａと、障害診断回復系統ｂを備える。障害診断回復系統ｂは、多数決判定部１４ｂと、障害診断部２３ｂと、回復プラン実行部２１ｂを有する。障害診断部２３ｂは、より詳細には、図１Ｃに示すように、構成不良箇所診断部３０２Ａと、システム障害診断部１６ｂと、障害診断情報共有部１８ｂと、回復プラン判定部１９ｂと、回復プラン合意部２０ｂを有する。また、障害診断回復系統ｂは、より詳細には、図１Ｃに示すように、システム構成情報１７ｂと、モード情報２２ｂを更に有する。 The diagnosis block 300A includes a first communication unit 301A, a second communication unit 303A, and a failure diagnosis recovery system b. The failure diagnosis recovery system b includes a majority decision determination unit 14b, a failure diagnosis unit 23b, and a recovery plan execution unit 21b. More specifically, as shown in FIG. 1C, the failure diagnosis unit 23b includes a configuration failure location diagnosis unit 302A, a system failure diagnosis unit 16b, a failure diagnosis information sharing unit 18b, a recovery plan determination unit 19b, and a recovery plan. It has an agreement part 20b. More specifically, the failure diagnosis recovery system b further includes system configuration information 17b and mode information 22b as shown in FIG. 1C.

診断ブロック３００Ｂ，３００Ｃも、詳細な図示は省略するが、診断ブロック３００Ａと同様の構成を有する。すなわち、診断ブロック３００Ｂは、診断ブロック３００Ａにおける各構成要素（３０１Ａ〜３０３Ａ，ｂ，１４ｂ，１６ｂ〜２１ｂ，２３ｂ）にそれぞれ対応する各構成要素（３０１Ｂ〜３０３Ｂ，ｃ，１４ｃ，１６ｃ〜２１ｃ，２３ｃ）を備える。同様に、診断ブロック３００Ｃは、診断ブロック３００Ａにおける各構成要素（３０１Ａ〜３０３Ａ，ｂ，１４ｂ，１６ｂ〜２１ｂ，２３ｂ）にそれぞれ対応する各構成要素（３０１Ｃ〜３０３Ｃ，ｄ，１４ｄ，１６ｄ〜２１ｄ，２３ｄ）を備える。 The diagnostic blocks 300B and 300C also have the same configuration as the diagnostic block 300A, although detailed illustration is omitted. That is, the diagnostic block 300B includes the respective constituent elements (301B to 303B, c, 14c, 16c to 21c, 23c) corresponding to the respective constituent elements (301A to 303A, b, 14b, 16b to 21b, 23b) in the diagnostic block 300A. ). Similarly, the diagnosis block 300C includes each component (301C to 303C, d, 14d, 16d to 21d, corresponding to each component (301A to 303A, b, 14b, 16b to 21b, 23b) in the diagnosis block 300A, respectively. 23d).

処理ブロック３１０Ａは、第一通信部３１１Ａと、ステートフル処理部３１２Ａと、ステート記憶部３１４Ａと、ステート記憶管理部３１３Ａと、第二通信部３１６Ａと、構成情報記憶部３１８Ａと、構成情報記憶管理部３１７Ａを有する。処理ブロック３１０Ａは、例えば、ＦＰＧＡ等によって構成される。ここでは、ステートフル処理部３１２Ａ、第一通信部３１１Ａおよびステート記憶管理部３１３Ａは、再構成可能回路部３１９Ａによって構成され、当該再構成可能回路部３１９Ａの回路構成は、構成情報記憶部３１８Ａの記憶情報に応じて任意に設定可能となっている。構成情報記憶部３１８Ａは、一般的に、ＣＲＡＭ（Configuration RAM）等と呼ばれ、例えばＳＲＡＭ等で構成される。 The processing block 310A includes a first communication unit 311A, a stateful processing unit 312A, a state storage unit 314A, a state storage management unit 313A, a second communication unit 316A, a configuration information storage unit 318A, and a configuration information storage management unit. 317A. The processing block 310A is configured by, for example, an FPGA. Here, the stateful processing unit 312A, the first communication unit 311A, and the state storage management unit 313A are configured by a reconfigurable circuit unit 319A, and the circuit configuration of the reconfigurable circuit unit 319A is stored in the configuration information storage unit 318A. It can be set arbitrarily according to the information. The configuration information storage unit 318A is generally called a CRAM (Configuration RAM) or the like, and is configured by an SRAM or the like, for example.

構成情報記憶管理部３１７Ａは、診断ブロック３００Ａが第二通信部３０３Ａ，３１６Ａを介して構成情報記憶部３１８Ａにアクセス等を行う際の制御を行う。すなわち、構成情報記憶管理部３１７Ａは、診断ブロック３００Ａが再構成可能回路部３１９Ａの回路の再構成等を行う際の制御を行う。ステート記憶部３１４Ａには、ステートフル処理部３１２Ａでの処理の過程で生じるステートが適宜記憶される。ステート記憶管理部３１３Ａは、このステートを記憶する際の制御を行う。また、ステート記憶管理部３１３Ａは、診断ブロック３００Ａが第二通信部３０３Ａ，３１６Ａを介してステート記憶部３１４Ａにアクセス等を行う際の制御も行う。ステート記憶部３１４Ａは、詳細は後述するが、ステートの回復を実現するために、第一バンク３１５Ａ１と第二バンク３１５Ａ２を備える。 The configuration information storage management unit 317A performs control when the diagnostic block 300A accesses the configuration information storage unit 318A via the second communication units 303A and 316A. That is, the configuration information storage management unit 317A performs control when the diagnostic block 300A performs reconfiguration of the circuit of the reconfigurable circuit unit 319A. The state storage unit 314A appropriately stores states generated in the process of the stateful processing unit 312A. The state storage management unit 313A performs control when storing this state. The state storage management unit 313A also performs control when the diagnostic block 300A accesses the state storage unit 314A via the second communication units 303A and 316A. Although details will be described later, the state storage unit 314A includes a first bank 315A1 and a second bank 315A2 in order to realize state recovery.

処理ブロック３１０Ｂ，３１０Ｃも、詳細な図示は省略するが、処理ブロック３１０Ａと同様の構成を有する。すなわち、処理ブロック３１０Ｂは、処理ブロック３１０Ａにおける各構成要素（３１１Ａ〜３１４Ａ，３１５Ａ１，３１５Ａ２，３１６Ａ〜３１９Ａ）にそれぞれ対応する各構成要素（３１１Ｂ〜３１４Ｂ，３１５Ｂ１，３１５Ｂ２，３１６Ｂ〜３１９Ｂ）を備える。同様に、処理ブロック３１０Ｃは、処理ブロック３１０Ａにおける各構成要素（３１１Ａ〜３１４Ａ，３１５Ａ１，３１５Ａ２，３１６Ａ〜３１９Ａ）にそれぞれ対応する各構成要素（３１１Ｃ〜３１４Ｃ，３１５Ｃ１，３１５Ｃ２，３１６Ｃ〜３１９Ｃ）を備える。 The processing blocks 310B and 310C also have the same configuration as the processing block 310A, although detailed illustration is omitted. That is, the processing block 310B includes constituent elements (311B to 314B, 315B1, 315B2, 316B to 319B) respectively corresponding to the constituent elements (311A to 314A, 315A1, 315A2, 316A to 319A) in the processing block 310A. Similarly, the processing block 310C includes constituent elements (311C to 314C, 315C1, 315C2, 316C to 319C) respectively corresponding to the constituent elements (311A to 314A, 315A1, 315A2, 316A to 319A) in the processing block 310A. .

このように、本実施の形態１の情報処理システムは、図１Ａに示すように、診断ブロック３００Ａおよび処理ブロック３１０Ａを処理系統ＡＡ、診断ブロック３００Ｂおよび処理ブロック３１０Ｂを処理系統ＢＢ、診断ブロック３００Ｃおよび処理ブロック３１０Ｃを処理系統ＣＣとして、三重化された処理系統を備えている。三重化を行うことで、単一障害発生後もエラー検知とエラー訂正を行うための処理結果が３つ得られる。すなわち、処理系統ＡＡ〜処理系統ＣＣ内のステートフル処理部３１２Ａ〜３１２Ｃは、同一の回路構成が設定されることで同一の処理を行い、その各処理結果を第一通信部３１１Ａ〜３１１Ｃを介して出力する。また、本実施の形態１の情報処理システムは、障害の診断および回復を行うための四重化された障害診断回復系統ａα（又はａβ），ｂ〜ｄを備えている。四重化を行うことで、障害診断回復系統自身がビザンチン障害を含めて単一障害を有する場合にも対応できる。 Thus, as shown in FIG. 1A, the information processing system according to the first embodiment includes the diagnosis block 300A and the processing block 310A as the processing system AA, the diagnosis block 300B and the processing block 310B as the processing system BB, and the diagnosis block 300C. A triple processing system is provided with the processing block 310C as the processing system CC. By performing triple processing, three processing results for error detection and error correction can be obtained even after a single failure occurs. That is, the stateful processing units 312A to 312C in the processing system AA to the processing system CC perform the same processing by setting the same circuit configuration, and the processing results are transmitted via the first communication units 311A to 311C. Output. The information processing system according to the first embodiment includes quadruple fault diagnosis / recovery systems aα (or aβ) and b to d for performing fault diagnosis and recovery. By performing quadruplexing, it is possible to cope with a case where the failure diagnosis recovery system itself has a single failure including a Byzantine failure.

《情報処理システムの全体動作》
図２は、図１Ａ〜図１Ｃの情報処理システムにおいて、その全体の処理内容の一例を示すフローチャートである。図２において、まず、外部通信部１１αは、通信路Ｌ１から受信した入力データを受信し、入力データ投入部１２αへ送信する。入力データ投入部１２αは、システム構成情報１７ａαに格納されている情報をもとに各処理系統の処理ブロック３１０Ａ〜３１０Ｃが有する第一通信部３１１Ａ〜３１１Ｃへ入力データをブロードキャストする（ステップＳ２０１）。 << Overall operation of information processing system >>
FIG. 2 is a flowchart showing an example of the entire processing contents in the information processing system of FIGS. 1A to 1C. In FIG. 2, first, the external communication unit 11α receives the input data received from the communication path L1, and transmits it to the input data input unit 12α. Based on the information stored in the system configuration information 17aα, the input data input unit 12α broadcasts the input data to the first communication units 311A to 311C included in the processing blocks 310A to 310C of each processing system (step S201).

次に、ステートフル処理部３１２Ａ〜３１２Ｃは、第一通信部３１１Ａ〜３１１Ｃを介して入力データを受信し、それぞれ同一の処理を行う（ステップＳ２０２）。続いて、ステートフル処理部３１２Ａ〜３１２Ｃは、ステップＳ２０２の結果、自身の内部状態（ステート）が変化するため、当該ステートをステート記憶管理部３１３Ａ〜３１３Ｃにそれぞれ送信する。ステート記憶管理部３１３Ａ〜３１３Ｃは、受信したステート情報をステート記憶部３１４Ａ〜３１４Ｃへ保存する（ステップＳ２０３）。 Next, the stateful processing units 312A to 312C receive the input data via the first communication units 311A to 311C and perform the same processing (step S202). Subsequently, since the internal state (state) of the stateful processing units 312A to 312C changes as a result of step S202, the stateful processing units 312A to 312C transmit the states to the state storage management units 313A to 313C, respectively. The state storage management units 313A to 313C store the received state information in the state storage units 314A to 314C (step S203).

次に、ステートフル処理部３１２Ａ〜３１２Ｃは、処理結果である出力データを、第一通信部３１１Ａ〜３１１Ｃへ送信する。第一通信部３１１Ａ〜３１１Ｃは、外部送受信ブロック１０αの第一通信部１３αおよび診断ブロック３００Ａ〜３００Ｃの第一通信部３０１Ａ〜３０１Ｃへ当該出力データをブロードキャストする。第一通信部１３αおよび第一通信部３０１Ａ〜３０１Ｃは、第一通信部３１１Ａ〜３１１Ｃから受信した３つの処理結果（出力データ）を受信し、多数決判定部１４ａαおよび多数決判定部１４ｂ〜１４ｄへそれぞれ送信する（ステップＳ２０４）。 Next, the stateful processing units 312A to 312C transmit output data as processing results to the first communication units 311A to 311C. The first communication units 311A to 311C broadcast the output data to the first communication unit 13α of the external transmission / reception block 10α and the first communication units 301A to 301C of the diagnosis blocks 300A to 300C. The first communication unit 13α and the first communication units 301A to 301C receive the three processing results (output data) received from the first communication units 311A to 311C, and send them to the majority decision determination unit 14aα and the majority decision determination units 14b to 14d, respectively. Transmit (step S204).

多数決判定部１４ａα，１４ｂ〜１４ｄは、受信した３つの処理結果（出力データ）を比較し、１つも異なるデータが存在しないか否かを判定する（ステップＳ２０５）。１つも異なるデータが存在しない、つまり、３つの出力データ全てが同一と判定された場合、多数決判定部１４ａαのみ、当該同一の出力データを、出力データ投入部１５αへ送信する。出力データ投入部１５αは、出力データに対応する入力データを送信してきた相手を送信先に設定し、当該出力データを外部通信部１１αへ送信する。外部通信部１１αは、通信路Ｌ１を使用して出力データを送信する。一方、多数決判定部１４ｂ〜１４ｄは、出力データを破棄する（ステップＳ２０６）。ステップＳ２０６が終了したのち、再びステップＳ２０１から処理を繰り返すことで、通常の処理が継続的に行われる。 The majority decision determination units 14aα and 14b to 14d compare the received three processing results (output data) and determine whether there is any different data (step S205). When no different data exists, that is, when it is determined that all three pieces of output data are the same, only the majority decision unit 14aα transmits the same output data to the output data input unit 15α. The output data input unit 15α sets the other party that has transmitted the input data corresponding to the output data as a transmission destination, and transmits the output data to the external communication unit 11α. The external communication unit 11α transmits output data using the communication path L1. On the other hand, the majority decision determining units 14b to 14d discard the output data (step S206). After step S206 is completed, normal processing is continuously performed by repeating the processing from step S201 again.

このように、図１Ａ〜図１Ｃの情報処理システムでは、多数決判定部１４ａαで多数決判定した処理結果（出力データ）が外部に向けて出力される。したがって、仮に多数決判定部１４ａαに障害が生じた際には、外部に向けて正しい処理結果を出力することが困難となる。そこで、ここでは、二重化された多数決判定部１４ａα，１４ａβ（外部送受信ブロック１０α，１０β）を備え、一方に障害が生じた場合には他方に切り替える構成を用いている。なお、詳細は後述するが、多数決判定部１４ａα等の障害の有無は、障害診断部２３ａα，２３ｂ〜２３ｄによって判別することが可能である。 As described above, in the information processing systems of FIGS. 1A to 1C, the processing result (output data) determined by the majority decision by the majority decision determination unit 14aα is output to the outside. Therefore, if a failure occurs in the majority decision determination unit 14aα, it is difficult to output a correct processing result to the outside. Therefore, here, a configuration is adopted in which duplicated majority decision units 14aα and 14aβ (external transmission / reception blocks 10α and 10β) are provided, and when a failure occurs on one side, the other is switched to the other side. Although details will be described later, the presence / absence of a failure such as the majority decision determination unit 14aα can be determined by the failure diagnosis units 23aα and 23b to 23d.

図２のステップＳ２０５において、１つでも異なるデータが存在する、つまり、３つの処理結果（出力データ）中にエラーが存在すると判定された場合、多数決判定部１４ａα，１４ｂ〜１４ｄは、３つの出力データのうち、２つ以上の同一データが存在するか否かを判定する（ステップＳ２０７）。ステップＳ２０７で２つ以上の同一データが存在すると判定された場合、多数決判定部１４ａα，１４ｂ〜１４ｄは、エラーが有る処理結果を特定するエラー判定結果を障害診断部２３ａα，２３ｂ〜２３ｄに出力し、障害回復処理の実行を開始させる（ステップＳ２０８）。 When it is determined in step S205 in FIG. 2 that at least one piece of different data exists, that is, there is an error in the three processing results (output data), the majority decision determining units 14aα and 14b to 14d output three outputs. It is determined whether two or more identical data exist among the data (step S207). When it is determined in step S207 that two or more identical data exist, the majority determination units 14aα and 14b to 14d output an error determination result for specifying a processing result having an error to the failure diagnosis units 23aα and 23b to 23d. Then, the execution of the failure recovery process is started (step S208).

障害回復処理の詳細は、図３以降の説明で述べるが、多数決判定部１４ａα，１４ｂ〜１４ｄのエラー判定結果および出力データは、システム障害診断部１６ａα，１６ｂ〜１６ｄ、及び、構成不良箇所診断部３０２Ａ〜３０２Ｃへ送信され、障害状況の診断が行われる。その後、当該診断の結果から得られる障害診断情報の共有、回復プランの判定、回復プランの合意と進み、最終的に、回復プランの実行が行われる。これらの処理は、図２のフローチャートと並行して実行されるため、２つ以上の同一データが存在し、エラーが訂正可能な状態であれば、通常の処理を継続したまま、障害回復を実行する事が可能である。 Details of the failure recovery processing will be described in the description of FIG. 3 and subsequent figures, but the error determination results and output data of the majority decision determination units 14aα and 14b to 14d are the system failure diagnosis units 16aα and 16b to 16d, and the configuration failure location diagnosis unit. It is transmitted to 302A-302C, and the failure status is diagnosed. Thereafter, sharing of failure diagnosis information obtained from the diagnosis result, determination of the recovery plan, and agreement of the recovery plan proceed, and finally the recovery plan is executed. Since these processes are executed in parallel with the flowchart of FIG. 2, if two or more identical data exist and an error can be corrected, failure recovery is performed while continuing normal processing. It is possible to do.

図２のステップＳ２０８の処理と並行して、多数決判定部１４ａα，１４ｂ〜１４ｄは、エラーを訂正する。具体的には、３つの出力データのうち、同一である２つを正常な出力データとみなし、多数決判定部１４ａαのみ、当該正常な出力データを出力データ投入部１５αへ送信する。出力データ投入部１５αは、出力データに対応する入力データを送信してきた相手を送信先に設定し、出力データを外部通信部１１αおよび通信路Ｌ１を介して送信する（ステップＳ２０９）。また、図２のステップＳ２０７で、２つ以上の同一データが存在しない、つまり、全ての出力データが異なると判定された場合、多数決判定部１４ａα，１４ｂ〜１４ｄは、情報処理システムを停止する（ステップＳ２１０）。 In parallel with the process of step S208 in FIG. 2, the majority decision determination units 14aα and 14b to 14d correct errors. Specifically, two of the three output data that are the same are regarded as normal output data, and only the majority decision unit 14aα transmits the normal output data to the output data input unit 15α. The output data input unit 15α sets the other party that has transmitted the input data corresponding to the output data as the transmission destination, and transmits the output data via the external communication unit 11α and the communication path L1 (step S209). If it is determined in step S207 in FIG. 2 that two or more identical data do not exist, that is, all the output data are different, the majority decision determining units 14aα and 14b to 14d stop the information processing system ( Step S210).

《障害回復処理の概略》
図３は、図２において、その障害回復処理の概略的な処理内容の一例を示すフローチャートである。図２のステップＳ２０７で、２つ以上同一のデータが存在し、エラーが訂正可能な状態と判定された場合、ステップＳ２０８での障害回復処理として、図３のフローチャートが実行される。図３において、まず、障害診断部２３ａα，２３ｂ〜２３ｄは、自身の障害診断回復系統ａα，ｂ〜ｄが、障害回復モードであるか否かを判定する（ステップＳ３０１）。詳細なモードについては、図１６で後述する。 <Outline of failure recovery processing>
FIG. 3 is a flowchart showing an example of a schematic processing content of the failure recovery processing in FIG. If it is determined in step S207 of FIG. 2 that two or more identical data exist and the error can be corrected, the flowchart of FIG. 3 is executed as the failure recovery processing in step S208. In FIG. 3, first, the failure diagnosis units 23aα, 23b to 23d determine whether or not their own failure diagnosis recovery systems aα, b to d are in the failure recovery mode (step S301). The detailed mode will be described later with reference to FIG.

障害診断部２３ａα，２３ｂ〜２３ｄは、モード情報２２ａα，２２ｂ〜２２ｄが、障害回復モードであった場合は処理を終了し、通常モードであった場合は障害回復モードへ移行する（ステップＳ３０２）。具体的には、障害診断部２３ａα，２３ｂ〜２３ｄは、モード情報２２ａα，２２ｂ〜２２ｄを、障害回復モードへ書き変える。 The failure diagnosis unit 23aα, 23b-23d terminates the process when the mode information 22aα, 22b-22d is in the failure recovery mode, and shifts to the failure recovery mode when in the normal mode (step S302). Specifically, the failure diagnosis units 23aα and 23b to 23d rewrite the mode information 22aα and 22b to 22d to the failure recovery mode.

次に、多数決判定部１４ａα，１４ｂ〜１４ｄのエラー判定結果および出力データは、システム障害診断部１６ａα，１６ｂ〜１６ｄ、及び、構成不良箇所診断部３０２Ａ〜３０２Ｃへ送信され、障害の診断が行われる（ステップＳ３０３）。システム障害診断部１６ａα，１６ｂ〜１６ｄで行う診断の詳細は図５で、構成不良箇所診断部３０２Ａ〜３０２Ｃで行う診断の詳細は図４で後述する。概略を説明すると、これらの診断部は、処理ブロックやシステム（具体的には第一通信部）における障害の発生箇所、障害の状態を診断し、例えば、どの処理ブロックに障害が生じたのか、または、どの診断ブロックもしくは外部送受信ブロックに障害が生じたのかを示す障害診断情報を作成する。そして、これらの診断部は、作成した障害診断情報を障害診断情報共有部１８ａα，１８ｂ〜１８ｄへ渡す。 Next, the error determination results and output data of the majority decision determination units 14aα and 14b to 14d are transmitted to the system failure diagnosis units 16aα and 16b to 16d and the configuration failure location diagnosis units 302A to 302C, and the failure diagnosis is performed. (Step S303). Details of the diagnosis performed by the system failure diagnosis units 16aα and 16b to 16d will be described later with reference to FIG. 5, and details of the diagnosis performed by the configuration failure point diagnosis units 302A to 302C will be described later with reference to FIG. To explain the outline, these diagnosis units diagnose failure locations and failure states in processing blocks and systems (specifically, the first communication unit), for example, which processing block has a failure, Alternatively, fault diagnosis information indicating which diagnostic block or external transmission / reception block has a fault is created. Then, these diagnosis units pass the created failure diagnosis information to the failure diagnosis information sharing units 18aα and 18b to 18d.

次に、障害診断情報共有部１８ａαは、システム障害診断部１６ａαの診断結果に基づいて作成した障害診断情報を他の障害診断情報共有部１８ｂ〜１８ｄに送信する。障害診断情報共有部１８ｂは、システム障害診断部１６ｂ及び構成不良箇所診断部３０２Ａの診断結果に基づいて作成した障害診断情報を他の障害診断情報共有部１８ａα，１８ｃ，１８ｄに送信する。同様にして、障害診断情報共有部１８ｃ，１８ｄも障害診断情報を送信する。すなわち、障害診断情報共有部１８ａα，１８ｂ〜１８ｄのそれぞれは、自身の障害診断情報を他の障害診断情報共有部に向けて送信し、また、他の障害診断情報共有部の障害診断情報を受信することで障害診断情報の交換および共有を行う（ステップＳ３０４）。 Next, the fault diagnosis information sharing unit 18aα transmits the fault diagnosis information created based on the diagnosis result of the system fault diagnosis unit 16aα to the other fault diagnosis information sharing units 18b to 18d. The failure diagnosis information sharing unit 18b transmits the failure diagnosis information created based on the diagnosis results of the system failure diagnosis unit 16b and the configuration failure location diagnosis unit 302A to the other failure diagnosis information sharing units 18aα, 18c, and 18d. Similarly, the fault diagnosis information sharing units 18c and 18d also transmit fault diagnosis information. That is, each of the failure diagnosis information sharing units 18aα and 18b to 18d transmits its own failure diagnosis information to other failure diagnosis information sharing units, and receives failure diagnosis information of other failure diagnosis information sharing units. Thus, fault diagnosis information is exchanged and shared (step S304).

具体的な処理は、図６で後述するが、各障害診断回復系統ａα，ｂ〜ｄは、図３のステップＳ３０３の処理を完了した時点では、個々の障害診断回復系統において障害を検知したのみであり、実際の障害の発生箇所によっては、誤検知の可能性がある。当該誤検知は、例えば、多数決判定部等に障害が生じた場合に生じ得る。そのため、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、他の障害診断回復系統の状況を把握するため、障害診断情報をお互いに共有し、共有した各障害診断情報を、回復プラン判定部１９ａα，１９ｂ〜１９ｄへ渡す。 Although specific processing will be described later with reference to FIG. 6, each failure diagnosis recovery system aα, b to d only detects a failure in each failure diagnosis recovery system when the processing of step S303 in FIG. 3 is completed. There is a possibility of erroneous detection depending on the location of the actual failure. The erroneous detection can occur, for example, when a failure occurs in the majority decision determination unit or the like. Therefore, the fault diagnosis information sharing units 18aα, 18b to 18d share fault diagnosis information with each other in order to grasp the status of other fault diagnosis recovery systems, and share the respective fault diagnosis information with the recovery plan determination unit 19aα, It passes to 19b-19d.

ここで、例えば、処理ブロック３１０Ａ〜３１０Ｃの中に障害が無い場合、各障害診断回復系統内の多数決判定部１４ａα，１４ｂ〜１４ｄはエラー判定結果の出力を行わず、当該エラー判定結果等を反映した障害診断情報の生成は行われない。しかしながら、ある障害診断回復系統のみが障害を検知し、これに伴う障害診断情報が、当該障害を検知した障害診断回復系統以外の障害診断回復系統内の障害診断情報共有部へ送信される事態が起こり得る。このような事態は、具体的には、例えば、当該障害を検知した障害診断回復系統内の多数決判定部や、システム障害診断部や、構成不良箇所診断部や、第一通信部に障害が生じた場合に起こり得る。 Here, for example, when there is no failure in the processing blocks 310A to 310C, the majority decision determination units 14aα and 14b to 14d in each failure diagnosis recovery system do not output the error determination result, but reflect the error determination result and the like. Failure diagnosis information generated is not generated. However, there is a situation in which only a fault diagnosis recovery system detects a fault, and fault diagnosis information associated therewith is transmitted to a fault diagnosis information sharing unit in a fault diagnosis recovery system other than the fault diagnosis recovery system that detected the fault. Can happen. Specifically, for example, a failure occurs in the majority decision determination unit, the system failure diagnosis unit, the defective configuration diagnosis unit, or the first communication unit in the failure diagnosis and recovery system that detects the failure. Can happen.

そこで、詳細は図１８で後述するが、各障害診断回復系統の障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、障害回復モード中のステップＳ３０４だけでなく、通常モードの場合も、常に他の障害診断回復系統からの障害診断情報を待ちうけ、障害に対応する。これにより、例えば、障害診断回復系統aαが、他の障害診断回復系統ｂ〜ｃでは障害を検知していないのにも関わらず障害を検知し、他の障害診断回復系統ｂ〜ｄの障害診断情報共有部１８ｂ〜１８ｄが、当該障害に対する障害診断情報を受け取った場合にも適切に障害に対応できる。具体的には、このような場合、診断ブロック３００Ａ〜３００Ｃは、障害診断回復系統ａαに障害が生じていると判断し、例えば、「障害診断回復系統ａαに障害が有る」旨を表す信号を外部送受信ブロック１０βに送信する。これを受けて、外部送受信ブロック１０βは、外部送受信ブロック１０αの全処理を引き継ぐ。 Therefore, although details will be described later with reference to FIG. 18, the fault diagnosis information sharing units 18aα and 18b to 18d of each fault diagnosis / recovery system always perform other faults not only in step S304 in the fault recovery mode but also in the normal mode. Wait for failure diagnosis information from the diagnosis recovery system and respond to the failure. Thereby, for example, the fault diagnosis recovery system aα detects a fault even though the other fault diagnosis recovery systems b to c do not detect the fault, and the fault diagnosis of the other fault diagnosis recovery systems b to d Even when the information sharing units 18b to 18d receive failure diagnosis information for the failure, it is possible to appropriately deal with the failure. Specifically, in such a case, the diagnosis blocks 300A to 300C determine that a failure has occurred in the failure diagnosis recovery system aα, and, for example, output a signal indicating that the failure diagnosis recovery system aα has a failure. It transmits to the external transmission / reception block 10β. In response to this, the external transmission / reception block 10β takes over all processing of the external transmission / reception block 10α.

次に、図３において、回復プラン判定部１９ａα，１９ｂ〜１９ｄは、障害診断情報共有部で共有した各障害診断情報から、実行するべき回復プランを判定する（ステップＳ３０５）。具体的な処理は、図７および図８で後述するが、自動的に回復可能と診断した場合、回復元の処理ブロックと回復先の処理ブロックを定め、この情報等を含んだ自動回復プランを、回復プラン合意部２０ａα，２０ｂ〜２０ｄへ渡す。また、自動的な回復が不可能と診断した場合、障害診断情報（すなわち障害が生じた診断ブロックを特定する情報等）を含めた手動回復プランを、回復プラン合意部２０ａα，２０ｂ〜２０ｄへ渡す。 Next, in FIG. 3, the recovery plan determination units 19aα and 19b to 19d determine a recovery plan to be executed from each failure diagnosis information shared by the failure diagnosis information sharing unit (step S305). The specific processing will be described later with reference to FIGS. 7 and 8. However, when it is automatically diagnosed that recovery is possible, a processing block for recovery and a processing block for recovery are determined, and an automatic recovery plan including this information is established. , To the recovery plan agreement unit 20aα, 20b-20d. In addition, when it is diagnosed that automatic recovery is impossible, a manual recovery plan including failure diagnosis information (that is, information specifying a diagnosis block in which a failure has occurred) is passed to the recovery plan agreement units 20aα, 20b to 20d. .

続いて、回復プラン合意部２０ａα，２０ｂ〜２０ｄは、回復プラン判定部１９ａα，１９ｂ〜１９ｄから渡された回復プランを実行するための合意を回復プラン合意部の間で行う（ステップＳ３０６）。具体的な処理は、図７および図８で後述するが、この合意プロセスによって、例えば、回復プラン判定部の障害や第一通信部等のビザンチン障害等によって、誤った回復プランが生成され、それが実行されるような事態を防止することが可能になる。 Subsequently, the recovery plan agreement units 20aα and 20b to 20d make an agreement between the recovery plan agreement units to execute the recovery plan passed from the recovery plan determination units 19aα and 19b to 19d (step S306). The specific processing will be described later with reference to FIGS. 7 and 8. This consensus process generates an incorrect recovery plan due to, for example, a failure in the recovery plan determination unit or a Byzantine failure in the first communication unit. Can be prevented.

次に、回復プラン実行部２１ａα，２１ｂ〜２１ｄは、合意した回復プランを実行する（ステップＳ３０７）。具体的な処理は、図１１〜図１５で後述するが、回復が正常に行えた場合は通常モードへ移行し、もとの三重系へ復帰する。回復が正常に行えなかった場合は、障害回復処理を中断し、手動での回復を促す。 Next, the recovery plan execution units 21aα and 21b to 21d execute the agreed recovery plan (step S307). Specific processing will be described later with reference to FIGS. 11 to 15, but when the recovery can be normally performed, the mode is changed to the normal mode and the original triple system is restored. If recovery cannot be performed normally, the failure recovery process is interrupted and manual recovery is encouraged.

《構成不良箇所診断部の処理内容》
図４は、図１Ａ〜図１Ｃの情報処理システムにおいて、その構成不良箇所診断部の処理内容の一例を示すフローチャートである。図４のフローチャートは、図３のステップＳ３０３における構成不良箇所診断部３０２Ａ〜３０２Ｃの処理内容を表すものである。図４において、まず、構成不良箇所診断部３０２Ａ〜３０２Ｃは、多数決判定部１４ｂ〜１４ｄから渡されるエラー判定結果から、自身が障害系統かどうか（言い換えれば自身の処理ブロックに障害が生じている可能性が有るか否か）を判定する（ステップＳ４０１）。そして、構成不良箇所診断部３０２Ａ〜３０２Ｃは、障害系統が自身の管理下にあるか否かを判定する（ステップＳ４０２）。《Processing contents of faulty location diagnosis unit》
FIG. 4 is a flowchart illustrating an example of processing contents of the defective configuration portion diagnosis unit in the information processing system of FIGS. 1A to 1C. The flowchart of FIG. 4 represents the processing contents of the defective configuration portion diagnosis units 302A to 302C in step S303 of FIG. In FIG. 4, first, the configuration failure location diagnosis units 302A to 302C determine whether or not they are faulty systems based on the error determination results passed from the majority decision determination units 14b to 14d (in other words, there is a possibility that a failure has occurred in their processing block). Whether or not there is a characteristic) is determined (step S401). Then, the defective configuration portion diagnosis units 302A to 302C determine whether or not the faulty system is under its own management (step S402).

具体的には、例えば、多数決判定部１４ｂ〜１４ｄは、処理系統ＡＡからの処理結果（出力データ）のみ処理系統ＢＢ，ＣＣと異なっている場合は、処理系統ＡＡを障害系統とするエラー判定結果を出力する。この場合、構成不良箇所診断部３０２Ａは、自身が障害系統であり、障害系統が自身の管理下にあると判定し、構成不良箇所診断部３０２Ｂ，３０２Ｃは、処理系統ＡＡが障害系統であり、障害系統は自身の管理下に無いと判定する。 Specifically, for example, when the majority decision determination units 14b to 14d differ from the processing systems BB and CC only in the processing result (output data) from the processing system AA, the error determination result with the processing system AA as the fault system Is output. In this case, the misconfiguration point diagnosis unit 302A determines that it is a fault system and the fault system is under its control, and the misconfiguration point diagnosis units 302B and 302C have the processing system AA as a fault system, It is determined that the fault system is not under its own management.

構成不良箇所診断部３０２Ａ〜３０２Ｃは、障害系統が自身の管理下にないと判定した場合、処理を終了する。一方、構成不良箇所診断部３０２Ａ〜３０２Ｃは、障害系統が自身の管理下にあると判定した場合、第二通信部３０３Ａ〜３０３Ｃおよび第二通信部３１６Ａ〜３１６Ｃを経由して、構成情報記憶管理部３１７Ａ〜３１７Ｃへ障害箇所の情報を要求する（ステップＳ４０３）。なお、実際には、構成不良箇所診断部３０２Ａ〜３０２Ｃの中のいずれか一つ（例えば３０２Ａ）が自身に対応する構成情報記憶管理部（例えば３１７Ａ）へ障害箇所の情報を要求することになる。 If the configuration failure location diagnosis units 302A to 302C determine that the faulty system is not under their control, the processing ends. On the other hand, if the configuration failure location diagnosis units 302A to 302C determine that the faulty system is under their own management, the configuration information storage management is performed via the second communication units 303A to 303C and the second communication units 316A to 316C. Information on the fault location is requested to the units 317A to 317C (step S403). Actually, any one of the configuration failure location diagnosis units 302A to 302C (for example, 302A) requests the configuration information storage management unit (for example, 317A) corresponding to itself for information on the failure location. .

続いて、構成情報記憶管理部３１７Ａ〜３１７Ｃは、構成情報記憶部３１８Ａ〜３１８Ｃを検査し、障害箇所の情報を収集し、構成不良箇所診断部３０２Ａ〜３０２Ｃへ応答する（ステップＳ４０４）。なお、当該処理は、実際には、処理系統ＡＡ〜ＣＣのいずれか一つで行われる。また、具体的な処理内容としては、例えば、ソフトエラーの診断回路を用いる方式や、または、回復プランに含まれる回復元の処理ブロック内の構成情報記憶部の記憶情報を期待値として比較判定する方式や、あるいは、予め正しい記憶情報を不揮発性メモリ等に格納しておき、その記憶情報を期待値として比較判定する方式等が挙げられる。 Subsequently, the configuration information storage management units 317A to 317C inspect the configuration information storage units 318A to 318C, collect information on the failure location, and respond to the configuration failure location diagnosis units 302A to 302C (step S404). Note that the processing is actually performed in any one of the processing systems AA to CC. In addition, as specific processing contents, for example, a method using a soft error diagnostic circuit or a storage information stored in a configuration information storage unit in a processing block for recovery included in a recovery plan is compared and determined as an expected value. A method or a method in which correct stored information is stored in a nonvolatile memory or the like in advance and the stored information is compared and determined as an expected value.

次に、構成不良箇所診断部３０２Ａ〜３０２Ｃは、一定時間以内に、構成情報記憶管理部３１７Ａ〜３１７Ｃから応答があるか否かを判定する（ステップＳ４０５）。構成不良箇所診断部３０２Ａ〜３０２Ｃは、一定時間内に構成情報記憶管理部３１７Ａ〜３１７Ｃから障害箇所に関する情報の応答があった場合、当該情報を障害診断情報共有部１８ｂ〜１８ｄへ渡す（ステップＳ４０６）。一方、構成不良箇所診断部３０２Ａ〜３０２Ｃは、一定時間内に構成情報記憶管理部３１７Ａ〜３１７Ｃからの応答がなかった場合、再構成不能という情報を障害診断情報共有部１８ｂ〜１８ｄへ渡す（ステップＳ４０７）。 Next, the configuration failure location diagnosis units 302A to 302C determine whether or not there is a response from the configuration information storage management units 317A to 317C within a certain time (step S405). If there is a response from the configuration information storage management units 317A to 317C for information regarding the fault location within a certain time, the configuration failure location diagnosis units 302A to 302C pass the information to the fault diagnosis information sharing units 18b to 18d (step S406). ). On the other hand, if there is no response from the configuration information storage management units 317A to 317C within a certain period of time, the configuration failure location diagnosis units 302A to 302C pass information that the reconfiguration is impossible to the fault diagnosis information sharing units 18b to 18d (step) S407).

《システム障害診断部の処理内容》
図５は、図１Ａ〜図１Ｃの情報処理システムにおいて、そのシステム障害診断部の処理内容の一例を示すフローチャートである。図５のフローチャートは、図３のステップＳ３０３におけるシステム障害診断部１６ａα，１６ｂ〜１６ｄの処理内容を表すものである。図５において、まず、システム障害診断部１６ａα，１６ｂ〜１６ｄは、他の障害診断回復系統が有する第一通信部１３α，３０１Ａ〜３０１Ｃが応答するか否かをチェックする（ステップＳ５０１）。具体的には、システム障害診断部１６ａαを例とすると、システム障害診断部１６ａαは、例えば、第一通信部１３αを介して第一通信部３０１Ａ〜３０１Ｃに向けてＰｉｎｇ等のツールを用いて応答があるかどうかをチェックする。 << Processing contents of system fault diagnosis part >>
FIG. 5 is a flowchart illustrating an example of processing contents of the system failure diagnosis unit in the information processing systems of FIGS. 1A to 1C. The flowchart of FIG. 5 represents the processing contents of the system failure diagnosis units 16aα and 16b to 16d in step S303 of FIG. In FIG. 5, first, the system failure diagnosis units 16aα, 16b to 16d check whether or not the first communication units 13α and 301A to 301C included in other failure diagnosis recovery systems respond (step S501). Specifically, taking the system failure diagnosis unit 16aα as an example, the system failure diagnosis unit 16aα responds using a tool such as Ping to the first communication units 301A to 301C via the first communication unit 13α, for example. Check if there is.

次いで、システム障害診断部１６ａα，１６ｂ〜１６ｄは、第一通信部から応答がない場合、通信不可との情報を障害診断情報共有部１８ａα，１８ｂ〜１８ｄへ渡し、処理を終了する（ステップＳ５０２）。一方、全ての第一通信部から応答がある場合、通信可能との情報を障害診断情報共有部１８ａα，１８ｂ〜１８ｄへ渡す（ステップＳ５０３）。これによって、多数決判定部からのエラー判定結果によって示される障害の原因が、第一通信部１３α，３０１Ａ〜３０１Ｃのいずれかにあるのか否かが判別される。 Next, when there is no response from the first communication unit, the system failure diagnosis unit 16aα, 16b to 16d passes information indicating that communication is not possible to the failure diagnosis information sharing unit 18aα, 18b to 18d, and ends the process (step S502). . On the other hand, when there is a response from all the first communication units, information indicating that communication is possible is passed to the fault diagnosis information sharing units 18aα, 18b to 18d (step S503). Thus, it is determined whether or not the cause of the failure indicated by the error determination result from the majority determination unit is in any of the first communication units 13α and 301A to 301C.

《障害診断情報共有部の処理内容》
図６は、図１Ａ〜図１Ｃの情報処理システムにおいて、その障害診断情報共有部の処理内容の一例を示すフローチャートである。図６のフローチャートは、図３のステップＳ３０４において、他の障害診断回復系統の状況を把握するために障害診断情報共有部１８ａα，１８ｂ〜１８ｄで行われる処理内容を表すものである。図６において、まず、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、システム障害診断部１６ａα，１６ｂ〜１６ｄから受け取った診断結果から、通信が可能かどうかをチェックする（ステップＳ６０１）。具体的には、図５のステップＳ５０３に示したように、通信可能との情報を受け取った場合のみ、通信可能と判断する。《Processing contents of failure diagnosis information sharing unit》
FIG. 6 is a flowchart illustrating an example of processing contents of the failure diagnosis information sharing unit in the information processing systems of FIGS. 1A to 1C. The flowchart in FIG. 6 shows the processing contents performed in the fault diagnosis information sharing units 18aα and 18b to 18d in order to grasp the status of other fault diagnosis and recovery systems in step S304 in FIG. In FIG. 6, first, the fault diagnosis information sharing units 18aα and 18b to 18d check whether communication is possible from the diagnosis results received from the system fault diagnosis units 16aα and 16b to 16d (step S601). Specifically, as shown in step S503 in FIG. 5, it is determined that communication is possible only when information indicating that communication is possible is received.

次いで、通信可能と判定された場合、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、自身の障害診断情報をブロードキャストで送信することで、各障害診断情報共有部の間で障害診断情報を交換する（ステップＳ６０２）。詳細な交換方法は図１９で後述する。続いて、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、交換が正常に終了したか否かをチェックする（ステップＳ６０３）。具体的には、タイムアウトや、データのエラー等が無いかをチェックする。 Next, when it is determined that communication is possible, the fault diagnosis information sharing units 18aα and 18b to 18d exchange their fault diagnosis information between the respective fault diagnosis information sharing units by broadcasting their own fault diagnosis information. (Step S602). A detailed exchange method will be described later with reference to FIG. Subsequently, the failure diagnosis information sharing units 18aα and 18b to 18d check whether or not the exchange has been completed normally (step S603). Specifically, it is checked whether there is a timeout or a data error.

交換が正常に終了した場合、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、交換によって取得した各障害診断情報を回復プラン判定部１９ａα，１９ｂ〜１９ｄへ渡し処理を終了する（ステップＳ６０４）。一方、交換が正常に終了しなかった場合、もしくは、ステップＳ６０１において通信不可だと判定された場合、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、手動回復プランを作成し、回復プラン実行部２１ａα，２１ｂ〜２１ｄへ渡し、処理を終了する（ステップＳ６０５）。 When the exchange ends normally, the fault diagnosis information sharing units 18aα, 18b to 18d pass the fault diagnosis information acquired by the exchange to the recovery plan determination units 19aα, 19b to 19d, and the processing is ended (step S604). On the other hand, if the exchange has not been completed normally, or if it is determined in step S601 that communication is not possible, the failure diagnosis information sharing units 18aα and 18b to 18d create a manual recovery plan, and the recovery plan execution unit 21aα. , 21b to 21d, and the process is terminated (step S605).

例えば、多数決判定部に障害が生じた場合には、当該障害が生じた多数決判定部に対応する障害診断情報共有部は、他の障害診断情報共有部から障害診断情報を受信しないため、ステップＳ６０３の処理でタイムアウトとなり、ステップＳ６０５の処理へ移行する。この際に、その他の障害診断情報共有部は、後述する図１８のフローによって、当該障害が生じた多数決判定部の存在を認識する。 For example, when a failure occurs in the majority decision determination unit, the failure diagnosis information sharing unit corresponding to the majority decision determination unit in which the failure has occurred does not receive the failure diagnosis information from the other failure diagnosis information sharing unit. The process timed out, and the process proceeds to step S605. At this time, the other failure diagnosis information sharing unit recognizes the existence of the majority decision determination unit in which the failure has occurred according to the flow of FIG.

《回復プラン判定部および回復プラン合意部の処理内容》
図７は、図１Ａ〜図１Ｃの情報処理システムにおいて、その回復プラン判定部および回復プラン合意部の処理内容の一例を示すフローチャートである。図８は、図７のフローチャートにおいて、回復プランを判定する際の判定基準の一例を示す表である。図７および図８は、図３のステップＳ３０５およびＳ３０６において、回復プランの判定と合意を行うために回復プラン判定部１９ａα，１９ｂ〜１９ｄおよび回復プラン合意部２０ａα，２０ｂ〜２０ｄで行われる処理内容を表すものである。《Processing contents of recovery plan judgment unit and recovery plan agreement unit》
FIG. 7 is a flowchart illustrating an example of processing contents of the recovery plan determination unit and the recovery plan agreement unit in the information processing system of FIGS. 1A to 1C. FIG. 8 is a table showing an example of determination criteria for determining a recovery plan in the flowchart of FIG. 7 and 8 show the processing contents performed in the recovery plan determination units 19aα and 19b to 19d and the recovery plan agreement units 20aα and 20b to 20d in order to determine and agree with the recovery plan in steps S305 and S306 of FIG. Is expressed.

図７において、まず、回復プラン判定部１９ａα，１９ｂ〜１９ｄは、回復プランを判定する（ステップＳ７０１）。具体的に説明すると、まず、その前段階で、各障害診断回復系統ａα，ｂ〜ｄ毎に、自身の多数決判定部からのエラー判定結果や、自身の構成不良箇所診断部での診断結果や、自身のシステム障害診断部での診断結果に基づいて別個独立に障害診断情報が生成される。各障害診断情報は、それぞれ、どの処理ブロックに障害が生じたのか、または、どの診断ブロックもしくは外部送受信ブロックに障害が生じたのかを各障害診断回復系統ａα，ｂ〜ｄ毎の判断で定めたものとなる。その後、当該各障害診断情報は、障害診断情報共有部１８ａα，１８ｂ〜１８で共有される。回復プラン判定部１９ａα，１９ｂ〜１９ｄのそれぞれは、この共有された各障害診断情報を参照し、図８の表８００に基づいて、障害が自動回復可能であるのか、自動回復不可能であるのかを判定する。 In FIG. 7, first, the recovery plan determination units 19aα, 19b to 19d determine a recovery plan (step S701). Specifically, first, in the previous stage, for each fault diagnosis recovery system aα, b to d, an error determination result from its own majority decision determination unit, a diagnosis result in its own configuration failure point diagnosis unit, Fault diagnosis information is generated separately and independently based on the diagnosis result of the system fault diagnosis unit of the system. Each failure diagnosis information determines which processing block has failed or which diagnosis block or external transmission / reception block has failed by judgment for each failure diagnosis recovery system aα, b to d. It will be a thing. Thereafter, each failure diagnosis information is shared by the failure diagnosis information sharing units 18aα and 18b-18. Each of the recovery plan determination units 19aα, 19b to 19d refers to each shared failure diagnosis information, and whether the failure can be automatically recovered or not automatically recovered based on the table 800 of FIG. Determine.

図８の表８００では、３個の処理ブロック３１０Ａ〜３１０Ｃ（再構成可能回路部３１９Ａ〜３１９Ｃ）の中のいずれか１個に障害が有る場合に自動回復が可能と判定され、それ以外の場合には、自動回復が不可能と判定される。例えば、共有した各障害診断情報に含まれる故障した処理ブロックを特定する情報（言い換えれば各多数決判定部からの各エラー判定結果）が全て一致し、かつ当該処理ブロックの障害の存在が対応する構成不良箇所診断部で検証され、かつ各システム障害診断部で通信障害が検出されなかった場合には、自動回復が可能と判定される。回復プラン判定部１９ａα，１９ｂ〜１９ｄは、自動回復が可能と判定した場合、自動回復プランを生成し、自動回復が不可能と判定した場合、手動回復プランを生成する。 In the table 800 of FIG. 8, it is determined that automatic recovery is possible if any one of the three processing blocks 310A to 310C (reconfigurable circuit units 319A to 319C) has a failure, and otherwise. It is determined that automatic recovery is impossible. For example, a configuration in which all pieces of information (in other words, each error determination result from each majority decision unit) that specify a failed processing block included in each shared failure diagnosis information match and the presence of a failure in the processing block corresponds When it is verified by the defective part diagnosis unit and no communication failure is detected by each system failure diagnosis unit, it is determined that automatic recovery is possible. The recovery plan determination units 19aα and 19b to 19d generate an automatic recovery plan when it is determined that automatic recovery is possible, and generate a manual recovery plan when it is determined that automatic recovery is impossible.

自動回復プランの中には、障害が有る処理ブロックを表す回復先の処理ブロックと、障害が無い処理ブロックの中から選択した回復元の処理ブロックの情報等が含まれる。この回復元の処理ブロックは、回復プラン判定部１９ａα，１９ｂ〜１９ｄの中で予め共通に設定されている優先順位等に基づいて選択される。また、手動回復プランの中には、各障害診断情報から推定される障害の発生箇所（すなわちどの診断ブロックで障害が発生しているか）の情報等が含まれる。例えば、共有された４個の障害診断情報の中のいずれか１個が他と矛盾するような場合には、この１個の障害診断情報に対応する診断ブロックに障害が発生していると推定される。 The automatic recovery plan includes information on a recovery destination processing block representing a processing block having a failure, and information on a processing block for time restoration selected from processing blocks having no failure. The processing block for this restoration is selected based on the priority order set in advance in the recovery plan determination units 19aα, 19b to 19d. Further, the manual recovery plan includes information on the location of the failure estimated from each failure diagnosis information (that is, in which diagnosis block the failure has occurred). For example, if any one of the four shared fault diagnosis information is inconsistent with the other, it is estimated that a fault has occurred in the diagnostic block corresponding to this one fault diagnosis information. Is done.

次いで、生成された回復プランは、回復プラン合意部２０ａα，２０ｂ〜２０ｄへ渡される。回復プラン合意部２０ａα，２０ｂ〜２０ｄは、ビザンチン合意プロトコルに従い、回復プランを交換する（ステップＳ７０２）。詳細な交換方法は図２０および図２１で後述するが、このようなビザンチン合意プロトコルを用いることによって、ビザンチン障害が１つ存在しても、ビザンチン障害が起きている系統を除く、残り３つの障害診断回復系統の回復プランを同一にすることができる。つまり、回復プラン決定後の状態遷移を、ビザンチン障害が起きている系統を除く、残り３つの障害診断回復系統で同一にすることができる。 Next, the generated recovery plan is passed to the recovery plan agreement units 20aα, 20b to 20d. The recovery plan agreement units 20aα and 20b to 20d exchange recovery plans according to the Byzantine agreement protocol (step S702). The detailed replacement method will be described later with reference to FIGS. 20 and 21. By using such a Byzantine agreement protocol, even if one Byzantine disorder exists, the remaining three faults excluding the line in which the Byzantine disorder occurs are excluded. The recovery plan of the diagnostic recovery system can be made the same. That is, the state transition after determining the recovery plan can be made the same in the remaining three fault diagnosis recovery systems except for the system in which the Byzantine failure occurs.

これにより、例えば、ビザンチン障害が発生した系統が、実際には障害が発生していないにも関わらず、障害の自動回復処理の実行に入ったとしても、障害の自動回復処理に必要な他の３つの障害診断回復系統が障害の自動回復処理を実行していないため、障害の自動回復処理が途中でエラーとなる。この場合、適切にエラー処理を行うことで、例えば、ビザンチン障害が起きていない残り３つの障害診断回復系統およびそれに対応する処理系統を用いて縮退運転等を行うことが可能になる。 As a result, for example, even if a system in which a Byzantine failure occurs does not actually cause a failure, even if it enters into the execution of automatic failure recovery processing, Since the three failure diagnosis and recovery systems do not execute the automatic failure recovery processing, the automatic failure recovery processing results in an error. In this case, by appropriately performing error processing, for example, it is possible to perform degenerate operation or the like using the remaining three fault diagnosis recovery systems in which no Byzantine fault has occurred and the corresponding processing systems.

続いて、回復プラン合意部２０ａα，２０ｂ〜２０ｄは、回復プランの交換が正常に終了したか否かをチェックする（ステップＳ７０３）。正常に終了した場合、回復プラン合意部２０ａα，２０ｂ〜２０ｄは、交換した各回復プランを比較し、全ての障害診断回復系統の回復プランが一致しているか否かをチェックする（ステップＳ７０４）。回復プランが一致していなければ、どこかでビザンチン障害が発生しているため、回復プランの一致をみることで、ビザンチン障害による不安定なシステム環境での障害回復処理を防ぐことが可能である。 Subsequently, the recovery plan agreement units 20aα and 20b to 20d check whether or not the replacement of the recovery plan has ended normally (step S703). In the case of normal termination, the recovery plan agreement units 20aα, 20b to 20d compare the replaced recovery plans and check whether the recovery plans of all the failure diagnosis recovery systems match (step S704). If the recovery plan does not match, a Byzantine failure has occurred somewhere, and by looking at the recovery plan match, it is possible to prevent failure recovery processing in an unstable system environment due to Byzantine failure .

全ての障害診断回復系統の回復プランが一致している場合、回復プラン合意部２０ａα，２０ｂ〜２０ｄは、当該一致した回復プランを回復プラン実行部２１ａα，２１ｂ〜２１ｄへ渡す（ステップＳ７０５）。一方、全ての障害診断回復系統の回復プランが一致している場合に該当しない場合、回復プラン合意部２０ａα，２０ｂ〜２０ｄは、当該回復プランを手動回復プランに変更し、それを回復プラン実行部２１ａα，２１ｂ〜２１ｄへ渡す（ステップＳ７０７）。また、ステップＳ７０３において回復プランの交換が正常に終了しなかった場合にも、回復プラン合意部２０ａα，２０ｂ〜２０ｄは、当該回復プランを手動回復プランに変更し、それを回復プラン実行部２１ａα，２１ｂ〜２１ｄへ渡す（ステップＳ７０６）。 When the recovery plans of all the failure diagnosis recovery systems match, the recovery plan agreement units 20aα, 20b-20d pass the matched recovery plans to the recovery plan execution units 21aα, 21b-21d (step S705). On the other hand, when the recovery plans of all failure diagnosis recovery systems do not correspond, the recovery plan agreement units 20aα, 20b to 20d change the recovery plan to a manual recovery plan and change it to the recovery plan execution unit. 21a [alpha], 21b to 21d (step S707). In addition, even when the exchange of the recovery plan does not end normally in step S703, the recovery plan agreement units 20aα and 20b to 20d change the recovery plan to a manual recovery plan and change it to the recovery plan execution unit 21aα, 21b to 21d (step S706).

《回復プラン実行部の詳細構成》
図９は、図１Ａ〜図１Ｃの情報処理システムにおいて、その外部送受信ブロック内の回復プラン実行部の詳細な構成例を示すブロック図である。ここでは、外部送受信ブロック１０αを代表として説明するが、外部送受信ブロック１０βも同様である。図９に示す回復プラン実行部２１ａαは、回復制御部２１１ａαと、手動回復プラン通知部２１６ａαと、入力データ再投入部２１７ａαと、入力データ再投入キュー２１８ａαから構成される。詳細は図１１〜図１５で後述するが、回復制御部２１１ａαは、手動回復プラン通知部２１６ａα、入力データ再投入部２１７ａαを制御し、手動回復プラン、及び、自動回復プランを実行する。 << Detailed configuration of recovery plan execution part >>
FIG. 9 is a block diagram showing a detailed configuration example of the recovery plan execution unit in the external transmission / reception block in the information processing system of FIGS. 1A to 1C. Here, the external transmission / reception block 10α will be described as a representative, but the same applies to the external transmission / reception block 10β. The recovery plan execution unit 21aα illustrated in FIG. 9 includes a recovery control unit 211aα, a manual recovery plan notification unit 216aα, an input data re-input unit 217aα, and an input data re-input queue 218aα. Although details will be described later with reference to FIGS. 11 to 15, the recovery control unit 211aα controls the manual recovery plan notification unit 216aα and the input data re-input unit 217aα to execute the manual recovery plan and the automatic recovery plan.

入力データ再投入部２１７ａαは、ステートの回復を行う際に必要な入力データを、入力データ再投入キュー２１８ａαに保存し、必要になった時点で、回復先の処理ブロックへ順に入力データを投入する。手動回復プラン通知部２１６ａαは、受け取った回復プランが手動回復プランであった場合に、それを手動回復プラン情報として外部へ通知する。 The input data re-input unit 217aα stores input data necessary for state recovery in the input data re-input queue 218aα, and inputs the input data in order to the processing block of the recovery destination when necessary. . When the received recovery plan is a manual recovery plan, the manual recovery plan notification unit 216aα notifies the outside as manual recovery plan information.

図１０は、図１Ａ〜図１Ｃの情報処理システムにおいて、その診断ブロック内の回復プラン実行部の詳細な構成例を示すブロック図である。ここでは、診断ブロック３００Ａを代表として説明するが、診断ブロック３００Ｂ，３００Ｃも同様である。図１０に示す回復プラン実行部２１ｂは、回復制御部２１１ｂと、再構成データ生成部２１２ｂと、回路データ２１３ｂと、構成回復部２１４ｂと、ステート回復部２１５ｂと、手動回復プラン通知部２１６ｂから構成される。詳細は図１１〜図１５で後述するが、回復制御部２１１ｂは、再構成データ生成部２１２ｂと、構成回復部２１４ｂと、ステート回復部２１５ｂと、手動回復プラン通知部２１６ｂとを制御する。 FIG. 10 is a block diagram illustrating a detailed configuration example of the recovery plan execution unit in the diagnosis block in the information processing system of FIGS. 1A to 1C. Here, the diagnosis block 300A will be described as a representative, but the diagnosis blocks 300B and 300C are the same. The recovery plan execution unit 21b illustrated in FIG. 10 includes a recovery control unit 211b, a reconfiguration data generation unit 212b, circuit data 213b, a configuration recovery unit 214b, a state recovery unit 215b, and a manual recovery plan notification unit 216b. Is done. Although details will be described later with reference to FIGS. 11 to 15, the recovery control unit 211b controls the reconfiguration data generation unit 212b, the configuration recovery unit 214b, the state recovery unit 215b, and the manual recovery plan notification unit 216b.

再構成データ生成部２１２ｂは、例えば構成回復部２１４ｂによって得られる障害箇所の情報と、回路データ２１３ｂとから再構成するべき障害箇所の再構成データを生成する。また、構成回復部２１４ｂは、再構成データを構成情報記憶管理部３１７Ａへ送信し、再構成可能回路部３１９Ａの自動回復（回路の再構成）を行う。ステート回復部２１５ｂは、再構成可能回路部３１９Ａの回路の再構成が完了した後、正常なステートを持つ回復元の処理ブロックからステートをコピーし、当該ステートをステート記憶管理部３１３Ａを通して第一バンク３１５Ａ１に上書きする。 For example, the reconfiguration data generation unit 212b generates reconfiguration data of a fault location to be reconfigured from the fault location information obtained by the configuration recovery unit 214b and the circuit data 213b. Further, the configuration recovery unit 214b transmits the reconfiguration data to the configuration information storage management unit 317A, and performs automatic recovery (circuit reconfiguration) of the reconfigurable circuit unit 319A. After the reconfiguration of the circuit of the reconfigurable circuit unit 319A is completed, the state recovery unit 215b copies the state from the recovery block having a normal state and transfers the state to the first bank through the state storage management unit 313A. Overwrite 315A1.

手動回復プラン通知部２１６ｂは、受け取った回復プランが手動回復プランであった場合に、それを手動回復プラン情報として外部へ通知する。なお、ここでは、回路データ２１３ｂは、回復プラン実行部２１ｂ内に存在するが、処理系統間で三重化されていれば、処理ブロックや診断ブロック内の他の異なる場所に存在してもよい。 When the received recovery plan is a manual recovery plan, the manual recovery plan notifying unit 216b notifies the outside as manual recovery plan information. Here, the circuit data 213b exists in the recovery plan execution unit 21b. However, as long as the circuit data 213b is tripled between the processing systems, the circuit data 213b may exist in other different locations in the processing block and the diagnostic block.

《回復プラン実行部（回復先の処理系統内）の詳細動作》
図１１は、図１０の回復プラン実行部において、当該回復プラン実行部が回復先の処理系統に属する場合の処理内容の一例を示すフローチャートである。図１１のフローチャートは、図３のステップＳ３０７における障害回復処理として実行される。図１１において、まず、回復プラン実行部２１ｂ〜２１ｄに入力された回復プランは、回復制御部２１１ｂ〜２１１ｄによって受信され、回復制御部２１１ｂ〜２１１ｄは、当該回復プランが手動回復プランかどうかをチェックする（ステップＳ１１０１）。回復プランが手動回復プランで無ければ、ステップＳ１１０２へ移行し、自動回復プランに基づく自動回復処理が行われる。以下、自動回復プランが示す回復先の処理系統が処理系統ＡＡであり、回復元の処理系統が処理系統ＢＢである場合を例として説明を行う。 << Detailed operation of the recovery plan execution unit (in the recovery destination processing system) >>
FIG. 11 is a flowchart showing an example of processing contents when the recovery plan execution unit belongs to the recovery destination processing system in the recovery plan execution unit of FIG. The flowchart in FIG. 11 is executed as the failure recovery process in step S307 in FIG. In FIG. 11, first, the recovery plans input to the recovery plan execution units 21b to 21d are received by the recovery control units 211b to 211d, and the recovery control units 211b to 211d check whether the recovery plan is a manual recovery plan. (Step S1101). If the recovery plan is not a manual recovery plan, the process proceeds to step S1102, and automatic recovery processing based on the automatic recovery plan is performed. Hereinafter, a case where the recovery destination processing system indicated by the automatic recovery plan is the processing system AA and the recovery processing system is the processing system BB will be described as an example.

ステップＳ１１０２において、回復制御部２１１ｂは、他の回復制御部２１１ａα，２１１ｃ，２１１ｄに向けて再構成の開始通知を第一通信部３０１Ａおよび通信路Ｌ２を用いてブロードキャストする。次いで、回復制御部２１１ｂは、再構成可能回路部３１９Ａを再構成する（ステップＳ１１０３）。 In step S1102, the recovery control unit 211b broadcasts a reconfiguration start notification to the other recovery control units 211aα, 211c, and 211d using the first communication unit 301A and the communication path L2. Next, the recovery control unit 211b reconfigures the reconfigurable circuit unit 319A (step S1103).

具体的には、まず、回復制御部２１１ｂは、構成回復部２１４ｂに障害箇所の情報を要求する。構成回復部２１４ｂは、構成不良箇所診断部３０２ＡのステップＳ４０３〜Ｓ４０５の処理と同様にして障害箇所の情報を取得し、取得に成功した場合、当該障害箇所の情報を再構成データ生成部２１２ｂへ渡す。再構成データ生成部２１２ｂは、障害箇所の情報と回路データ２１３ｂを用いて、再構成が必要な箇所の再構成データを生成し、それを構成回復部２１４ｂへ渡す。構成回復部２１４ｂは、第二通信部３０３Ａ，３１６Ａを用いて、再構成データを構成情報記憶管理部３１７Ａへ送る。構成情報記憶管理部３１７Ａは、再構成データをもとに、構成情報記憶部３１８Ａを上書きし、正常な回路へ復元する。復元が正常に完了した場合、構成情報記憶管理部３１７Ａは、その旨を構成回復部２１４ｂを介して回復制御部２１１ｂへ通知する。 Specifically, first, the recovery control unit 211b requests information on the failure location from the configuration recovery unit 214b. The configuration recovery unit 214b acquires failure location information in the same manner as the processing of steps S403 to S405 of the configuration failure location diagnosis unit 302A. If acquisition succeeds, the configuration recovery unit 214b sends the failure location information to the reconfiguration data generation unit 212b. hand over. The reconfiguration data generation unit 212b uses the failure location information and the circuit data 213b to generate reconfiguration data for a location that needs to be reconfigured and passes it to the configuration recovery unit 214b. The configuration recovery unit 214b sends the reconfiguration data to the configuration information storage management unit 317A using the second communication units 303A and 316A. The configuration information storage management unit 317A overwrites the configuration information storage unit 318A based on the reconfiguration data, and restores the normal circuit. When the restoration is normally completed, the configuration information storage management unit 317A notifies the recovery control unit 211b via the configuration recovery unit 214b.

続いて、回復制御部２１１ｂは、再構成が正常に終了したか否かをチェックする（ステップＳ１１０４）。再構成が正常に終了した場合、回復制御部２１１ｂは、他の回復制御部２１１ａα，２１１ｃ，２１１ｄに向けてステート回復の開始通知を第一通信部３０１Ａおよび通信路Ｌ２を用いてブロードキャストする。次に、回復制御部２１１ｂは、制御をステート回復部２１５ｂへ渡し、正常なステートの受信と既存のステートの上書きを行う（ステップＳ１１０６）。 Subsequently, the recovery control unit 211b checks whether the reconfiguration has been completed normally (step S1104). When the reconfiguration is normally completed, the recovery control unit 211b broadcasts a state recovery start notification to the other recovery control units 211aα, 211c, and 211d using the first communication unit 301A and the communication path L2. Next, the recovery control unit 211b transfers control to the state recovery unit 215b, and receives a normal state and overwrites an existing state (step S1106).

具体的には、ステート回復部２１５ｂは、第一通信部３０１Ａおよび通信路Ｌ２を用いて、正常な回復元の処理ブロック内のステート回復部２１５ｃと通信し、当該処理ブロック内の正常なステートを受け取る。ステート回復部２１５ｂは、受け取った正常なステートを、第二通信部３０３Ａ，３１６Ａを用いて、ステート記憶管理部３１３Ａへ送信する。ステート記憶管理部３１３Ａは、受信した正常なステートをステート記憶部３１４Ａへ上書きする。ステート記憶管理部３１３Ａは、上書きの完了後、その旨を第二通信部３０３Ａ，３１６Ａを用いて、ステート回復部２１５ｂへ通知する。 Specifically, the state recovery unit 215b communicates with the state recovery unit 215c in the normal recovery processing block using the first communication unit 301A and the communication path L2, and stores the normal state in the processing block. receive. The state recovery unit 215b transmits the received normal state to the state storage management unit 313A using the second communication units 303A and 316A. The state storage management unit 313A overwrites the received normal state on the state storage unit 314A. After the completion of overwriting, the state storage management unit 313A notifies the state recovery unit 215b of the fact using the second communication units 303A and 316A.

次に、ステート回復部２１５ｂは、ステート回復が正常に終了したか否かをチェックする（ステップＳ１１０７）。ステート回復が正常に終了した場合、ステート回復部２１５ｂは、その旨を回復制御部２１１ｂへ通知し、回復制御部２１１ｂは、他の回復制御部２１１ａα，２１１ｃ，２１１ｄへステート回復の終了通知を第一通信部３０１Ａおよび通信路Ｌ２を用いてブロードキャストする（ステートＳ１１０８）。この際に、詳細は図１４で後述するが、ステート回復の終了通知を受けた、外部送受信ブロック１０αが有する入力データ再投入部２１７ａαは、自動回復処理の最中に受信して、まだ処理を行っていない分の入力データの再投入を開始する。ステートフル処理部３１２Ａは、当該再投入された入力データを順次処理する。 Next, the state recovery unit 215b checks whether or not the state recovery has ended normally (step S1107). When the state recovery is completed normally, the state recovery unit 215b notifies the recovery control unit 211b of the fact, and the recovery control unit 211b notifies the other recovery control units 211aα, 211c, and 211d of the end of the state recovery. Broadcast using one communication unit 301A and the communication path L2 (state S1108). At this time, as will be described in detail later with reference to FIG. 14, the input data re-insertion unit 217aα of the external transmission / reception block 10α that has received the state recovery end notification is received during the automatic recovery process and is still processing. Start re-inputting the input data that has not been done. The stateful processing unit 312A sequentially processes the input data that has been input again.

続いて、回復制御部２１１ｂは、多数決判定部１４ｂに対して、入力データ再投入部２１７ａαから送られてくる入力データを処理した結果が、他の正常な処理系統が処理した結果に追いついたかを確認する（ステップＳ１１０９）。確認した結果、追いついていない場合、回復制御部２１１ｂは、もう一度ステップＳ１１０９を実行する。確認した結果、追いついていた場合、回復制御部２１１ｂは、他の回復制御部２１１ａα，２１１ｃ，２１１ｄとの間で、回復完了の合意を前述した回復プランの合意と同様の方法で行う（ステップＳ１１１０）。次に、回復制御部２１１ｂは、回復完了の合意が正常にとれたか否かを確認する（ステップＳ１１１１）。回復完了の合意が正常に行われた場合、回復制御部２１１ｂは、回復が完了したとして、通常モードへ移行する（ステップＳ１１１２）。具体的には、回復制御部２１１ｂは、モード情報２２ｂを通常モードへ上書きし、自動回復処理を終了する。 Subsequently, the recovery control unit 211b determines whether the result of processing the input data sent from the input data re-input unit 217aα has caught up with the result of processing by another normal processing system with respect to the majority decision determining unit 14b. Confirmation is made (step S1109). As a result of the confirmation, if it has not caught up, the recovery control unit 211b executes Step S1109 once again. As a result of the confirmation, if it has caught up, the recovery control unit 211b performs a recovery completion agreement with the other recovery control units 211aα, 211c, and 211d in the same manner as the above-described recovery plan agreement (step S1110). ). Next, the recovery control unit 211b checks whether or not the recovery completion agreement has been normally obtained (step S1111). When the recovery completion agreement is normally performed, the recovery control unit 211b shifts to the normal mode assuming that the recovery is completed (step S1112). Specifically, the recovery control unit 211b overwrites the mode information 22b to the normal mode, and ends the automatic recovery process.

また、ステップＳ１１０４において、回復制御部２１１ｂは、再構成が正常に終了しなかった場合、その旨を含んだ手動回復プランを作成する（ステップＳ１１１３）。ステップＳ１１０７において、ステート回復が正常に終了しなかった場合、回復制御部２１１ｂは、その旨を含んだ手動回復プランを作成する（ステップＳ１１１４）。ステップＳ１１１１において、回復完了の合意が正常に行われなかった場合、回復制御部２１１ｂは、その旨を含んだ手動回復プランを作成する（ステップＳ１１１５）。ステップＳ１１０１において、回復プランが手動回復プランであった場合、もしくは、ステップＳ１１１３〜Ｓ１１１５によって手動回復プランが作成された場合、詳細は図１５で後述するが、回復制御部２１１ａα，２１１ｂ〜２１１ｄ、及び、手動回復プラン通知部２１６ａα，２１６ｂ〜２１６ｃによって当該手動回復プランに基づく手動回復処理が行われる（ステップＳ１１１６）。 In step S1104, if the reconfiguration is not completed normally, the recovery control unit 211b creates a manual recovery plan including that fact (step S1113). In step S1107, when the state recovery is not normally completed, the recovery control unit 211b creates a manual recovery plan including the fact (step S1114). In step S1111, when the recovery completion agreement is not normally performed, the recovery control unit 211b creates a manual recovery plan including the fact (step S1115). In step S1101, if the recovery plan is a manual recovery plan, or if a manual recovery plan is created in steps S1113 to S1115, details will be described later with reference to FIG. 15, but the recovery controllers 211aα, 211b to 211d, and The manual recovery plan notification units 216aα and 216b to 216c perform manual recovery processing based on the manual recovery plan (step S1116).

《回復プラン実行部（回復元の処理系統内）の詳細動作》
図１２は、図１０の回復プラン実行部において、当該回復プラン実行部が回復元の処理系統に属する場合の処理内容の一例を示すフローチャートである。図１２のフローチャートは、図３のステップＳ３０７における障害回復処理として実行される。図１２において、まず、回復プラン実行部２１ｂ〜２１ｄに入力された回復プランは、回復制御部２１１ｂ〜２１１ｄによって受信され、回復制御部２１１ｂ〜２１１ｄは、当該回復プランが手動回復プランかどうかをチェックする（ステップＳ１２０１）。回復プランが手動回復プランで無ければ、ステップＳ１２０２へ移行し、自動回復プランに基づく自動回復処理が行われる。以下、自動回復プランが示す回復先の処理系統が処理系統ＡＡであり、回復元の処理系統が処理系統ＢＢである場合を例として説明を行う。 << Detailed operation of the recovery plan execution unit (in the recovery system) >>
FIG. 12 is a flowchart illustrating an example of the processing contents when the recovery plan execution unit belongs to the processing system of the time restoration in the recovery plan execution unit of FIG. The flowchart in FIG. 12 is executed as the failure recovery process in step S307 in FIG. In FIG. 12, first, the recovery plans input to the recovery plan execution units 21b to 21d are received by the recovery control units 211b to 211d, and the recovery control units 211b to 211d check whether the recovery plan is a manual recovery plan. (Step S1201). If the recovery plan is not a manual recovery plan, the process proceeds to step S1202, and automatic recovery processing based on the automatic recovery plan is performed. Hereinafter, a case where the recovery destination processing system indicated by the automatic recovery plan is the processing system AA and the recovery processing system is the processing system BB will be described as an example.

ステップＳ１２０２において、回復制御部２１１ｃは、回復先の処理系統の回復制御部２１１ｂが再構成の開始通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかをチェックする（ステップＳ１２０２）。タイムアウトするまでに再構成の開始通知を受信できた場合、回復制御部２１１ｃは、回復先の回復制御部２１１ｂがステート回復の開始通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかをチェックする（ステップＳ１２０３）。 In step S1202, the recovery control unit 211c waits until the recovery control unit 211b of the recovery destination processing system issues a reconfiguration start notification until time-out, and checks whether the time-out has occurred (step S1202). When the reconfiguration start notification can be received before the time-out, the recovery control unit 211c waits until the recovery destination recovery control unit 211b notifies the start of the state recovery until time-out, and checks whether the time-out has occurred. (Step S1203).

タイムアウトするまでにステート回復の開始通知を受信できた場合、回復制御部２１１ｃは、制御をステート回復部２１５ｃへ渡す。ステート回復部２１５ｃは、ステート記憶管理部３１３Ｂに、ステートの書込み先を、第二バンク３１５Ｂ２へ切り替える事を、第二通信部３０３Ｂ，３１６Ｂを用いて要求する（ステップＳ１２０４）。次に、ステート回復部２１５ｃは、第一バンク３１５Ｂ１上のステートを、回復先のステート回復部２１５ｂへ第一通信部３０１Ｂおよび通信路Ｌ２を用いて送信する（ステップＳ１２０５）。 When the state recovery start notification can be received before the time-out, the recovery control unit 211c passes control to the state recovery unit 215c. The state recovery unit 215c requests the state storage management unit 313B to switch the state write destination to the second bank 315B2 using the second communication units 303B and 316B (step S1204). Next, the state recovery unit 215c transmits the state on the first bank 315B1 to the recovery destination state recovery unit 215b using the first communication unit 301B and the communication path L2 (step S1205).

具体的には、ステート回復部２１５ｃは、ステート記憶管理部３１３Ｂに、第一バンク３１５Ｂ１上のステートを読み出すよう、第二通信部３０３Ｂ，３１６Ｂを用いて要求する。ステート記憶管理部３１３Ｂは、第一バンク３１５Ｂ１上のステートを読み出し、当該ステートをステート回復部２１５ｃに向けて第二通信部３１６Ｂ，３０３Ｂを用いて送信する。ステート回復部２１５ｃは、この送信された正常なステートを回復先のステート回復部２１５ｂへ第一通信部３０１Ｂおよび通信路Ｌ２を用いて送信し、制御を回復制御部２１１ｃへ戻す。 Specifically, the state recovery unit 215c requests the state storage management unit 313B to read the state on the first bank 315B1 using the second communication units 303B and 316B. The state storage management unit 313B reads the state on the first bank 315B1, and transmits the state to the state recovery unit 215c using the second communication units 316B and 303B. The state recovery unit 215c transmits the transmitted normal state to the recovery destination state recovery unit 215b using the first communication unit 301B and the communication path L2, and returns control to the recovery control unit 211c.

次に、回復制御部２１１ｃは、回復先の回復制御部２１１ｂがステート回復の終了通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかをチェックする（ステップＳ１２０６）。タイムアウトするまでにステート回復の終了通知を受信できた場合、回復制御部２１１ｃは、制御をステート回復部２１５ｃへ渡す。ステート回復部２１５ｃは、ステート記憶管理部３１３Ｂに、ステートの書込み先を、第一バンク３１５Ｂ１に戻す事を、第二通信部３０３Ｂ，３１６Ｂを用いて要求する（ステップＳ１２０７）。詳細は図１４で後述するが、この際に、ステート回復の終了通知を受けた、外部送受信ブロック１０αが有する入力データ再投入部２１７ａαは、自動回復処理の最中に受信して、まだ処理を行っていない分の入力データの再投入を開始する。 Next, the recovery control unit 211c waits until the recovery control unit 211b of the recovery destination issues a state recovery end notification until time-out, and checks whether or not the time-out has occurred (step S1206). When the end notification of the state recovery can be received before the time-out, the recovery control unit 211c passes control to the state recovery unit 215c. The state recovery unit 215c requests the state storage management unit 313B to use the second communication units 303B and 316B to return the state write destination to the first bank 315B1 (step S1207). The details will be described later with reference to FIG. 14. At this time, the input data re-input unit 217aα included in the external transmission / reception block 10α that has received the state recovery end notification is received during the automatic recovery process, and the processing is still in progress. Start re-inputting the input data that has not been done.

続いて、回復制御部２１１ｃは、多数決判定部１４ｃに対して、自身を含めた正常な処理系統の処理結果が、入力データ再投入部２１７ａαから送られてくる入力データを処理している回復先の処理系統の処理結果に追いつかれたか否かを確認する（ステップＳ１２０８）。確認した結果、追いつかれていない場合、回復制御部２１１ｃは、もう一度ステップＳ１２０８を実行する。確認した結果、追いつかれた場合、回復制御部２１１ｃは、他の回復制御部２１１ａα，２１１ｂ，２１１ｄとの間で、回復完了の合意を前述した回復プランの合意と同様の方法で行う（ステップＳ１２０９）。次に、回復制御部２１１ｃは、回復完了の合意が正常にとれたか否かを確認する（ステップＳ１２１０）。回復完了の合意が正常に行われた場合、回復制御部２１１ｃは、回復が完了したとして、通常モードへ移行する（ステップＳ１２１１）。具体的には、回復制御部２１１ｃは、モード情報２２ｃを通常モードへ上書きし、自動回復処理を終了する。 Subsequently, the recovery control unit 211c processes the input data sent from the input data re-input unit 217aα by the processing result of the normal processing system including itself to the majority determination unit 14c. It is confirmed whether or not the processing result of the processing system has been caught up (step S1208). As a result of the confirmation, if it is not caught up, the recovery control unit 211c executes Step S1208 once again. As a result of the confirmation, if it is caught up, the recovery control unit 211c performs a recovery completion agreement with the other recovery control units 211aα, 211b, and 211d in the same manner as the above-described recovery plan agreement (step S1209). ). Next, the recovery control unit 211c checks whether or not the recovery completion agreement has been normally obtained (step S1210). When the recovery completion agreement is normally performed, the recovery control unit 211c shifts to the normal mode on the assumption that the recovery is completed (step S1211). Specifically, the recovery control unit 211c overwrites the mode information 22c to the normal mode, and ends the automatic recovery process.

また、ステップＳ１２０２においてタイムアウトした場合、回復制御部２１１ｃは、その旨を含んだ手動回復プランを作成する（ステップＳ１２１２）。ステップＳ１２０３においてタイムアウトした場合、回復制御部２１１ｃは、その旨を含んだ手動回復プランを作成する（ステップＳ１２１３）。ステップＳ１２０６においてタイムアウトした場合、回復制御部２１１ｃは、その旨を含んだ手動回復プランを作成する（ステップＳ１２１４）。ステップＳ１２１０において、回復完了の合意が正常に行われなかった場合、回復制御部２１１ｃは、その旨を含んだ手動回復プランを作成する（ステップＳ１２１５）。ステップＳ１２０１において、回復プランが手動回復プランであった場合、もしくは、ステップＳ１２１２〜Ｓ１２１５によって手動回復プランが作成された場合、詳細は図１５で後述するが、回復制御部２１１ａα，２１１ｂ〜２１１ｄ、及び、手動回復プラン通知部２１６ａα，２１６ｂ〜２１６ｃによって当該手動回復プランに基づく手動回復処理が行われる（ステップＳ１２１６）。 If a time-out occurs in step S1202, the recovery control unit 211c creates a manual recovery plan including that fact (step S1212). When a time-out occurs in step S1203, the recovery control unit 211c creates a manual recovery plan including that fact (step S1213). When a time-out occurs in step S1206, the recovery control unit 211c creates a manual recovery plan including that fact (step S1214). In step S1210, when the recovery completion agreement is not normally performed, the recovery control unit 211c creates a manual recovery plan including the fact (step S1215). In step S1201, if the recovery plan is a manual recovery plan, or if a manual recovery plan is created in steps S1212 to S1215, the details will be described later with reference to FIG. 15, but recovery controllers 211aα, 211b to 211d, and The manual recovery plan notification units 216aα and 216b to 216c perform manual recovery processing based on the manual recovery plan (step S1216).

《回復プラン実行部（回復先および回復元でない処理系統内）の詳細動作》
図１３は、図１０の回復プラン実行部において、当該回復プラン実行部が回復先および回復元の処理系統のいずれにも属しない場合の処理内容の一例を示すフローチャートである。図１３のフローチャートは、図３のステップＳ３０７における障害回復処理として実行される。図１３において、まず、回復プラン実行部２１ｂ〜２１ｄに入力された回復プランは、回復制御部２１１ｂ〜２１１ｄによって受信され、回復制御部２１１ｂ〜２１１ｄは、当該回復プランが手動回復プランかどうかをチェックする（ステップＳ１３０１）。回復プランが手動回復プランで無ければ、ステップＳ１３０２へ移行し、自動回復プランに基づく自動回復処理が行われる。以下、自動回復プランが示す回復先の処理系統が処理系統ＡＡであり、回復元の処理系統が処理系統ＢＢである場合を例として説明を行う。 << Detailed operation of the recovery plan execution unit (in the recovery destination and non-restore processing systems) >>
FIG. 13 is a flowchart illustrating an example of processing contents when the recovery plan execution unit of FIG. 10 does not belong to either the recovery destination or the recovery system. The flowchart in FIG. 13 is executed as the failure recovery process in step S307 in FIG. In FIG. 13, first, the recovery plans input to the recovery plan execution units 21b to 21d are received by the recovery control units 211b to 211d, and the recovery control units 211b to 211d check whether the recovery plan is a manual recovery plan. (Step S1301). If the recovery plan is not a manual recovery plan, the process proceeds to step S1302, and automatic recovery processing based on the automatic recovery plan is performed. Hereinafter, a case where the recovery destination processing system indicated by the automatic recovery plan is the processing system AA and the recovery processing system is the processing system BB will be described as an example.

ステップＳ１３０２において、回復制御部２１１ｄは、回復先の回復制御部２１１ｂが再構成の開始通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかをチェックする（ステップＳ１３０２）。タイムアウトするまでに再構成の開始通知を受信できた場合、回復制御部２１１ｄは、回復先の回復制御部２１１ｂがステート回復の開始通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかをチェックする（ステップＳ１３０３）。タイムアウトするまでにステート回復の開始通知を受信できた場合、回復制御部２１１ｄは、回復先の回復制御部２１１ｂがステート回復の終了通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかチェックする（ステップＳ１３０４）。 In step S1302, the recovery control unit 211d waits for the recovery destination recovery control unit 211b to notify the start of reconfiguration until time-out, and checks whether time-out has occurred (step S1302). When the reconfiguration start notification can be received before the time-out, the recovery control unit 211d waits until the recovery destination recovery control unit 211b performs the state recovery start notification until the time-out, and checks whether the time-out has occurred. (Step S1303). When the state recovery start notification can be received before the time-out, the recovery control unit 211d waits until the recovery destination recovery control unit 211b notifies the end of the state recovery until time-out, and checks whether the time-out has occurred ( Step S1304).

タイムアウトするまでにステート回復の終了通知を受信できた場合、詳細は図１４で後述するが、ステート回復の終了通知を受けた、外部送受信ブロック１０αが有する入力データ再投入部２１７ａαは、自動回復処理の最中に受信して、まだ処理を行っていない分の入力データの再投入を開始する。回復制御部２１１ｄは、多数決判定部１４ｄに対して、自身を含めた正常な処理系統の処理結果が、入力データ再投入部２１７ａαから送られてくる入力データを処理している回復先の処理系統の処理結果に追いつかれたか否かを確認する（ステップＳ１３０５）。確認した結果、追いつかれていない場合、回復制御部２１１ｄは、もう一度ステップＳ１３０５を実行する。確認した結果、追いつかれた場合、回復制御部２１１ｄは、他の回復制御部２１１ａα，２１１ｂ，２１１ｃとの間で、回復完了の合意を前述した回復プランの合意と同様の方法で行う（ステップＳ１３０６）。次に、回復制御部２１１ｄは、回復完了の合意が正常にとれたか否かを確認する（ステップＳ１３０７）。回復完了の合意が正常に行われた場合、回復制御部２１１ｄは、回復が完了したとして、通常モードへ移行する（ステップＳ１３０８）。具体的には、回復制御部２１１ｄは、モード情報２２ｄを通常モードへ上書きし、自動回復処理を終了する。 When the state recovery end notification can be received before the timeout, the input data re-input unit 217aα of the external transmission / reception block 10α having received the state recovery end notification will be described later in detail with reference to FIG. Re-input of the input data received during the process and not yet processed is started. The recovery control unit 211d is a recovery destination processing system in which the processing result of the normal processing system including itself is processing the input data sent from the input data re-input unit 217aα to the majority decision determining unit 14d. It is confirmed whether or not the processing result has been caught up (step S1305). As a result of the confirmation, if it is not caught up, the recovery control unit 211d executes Step S1305 once again. As a result of the confirmation, if the catch-up is overtaken, the recovery control unit 211d makes a recovery completion agreement with the other recovery control units 211aα, 211b, and 211c in the same manner as the above-described recovery plan agreement (step S1306). ). Next, the recovery control unit 211d confirms whether or not the recovery completion agreement has been normally obtained (step S1307). When the recovery completion agreement is normally performed, the recovery control unit 211d shifts to the normal mode assuming that the recovery is completed (step S1308). Specifically, the recovery control unit 211d overwrites the mode information 22d to the normal mode, and ends the automatic recovery process.

また、ステップＳ１３０２においてタイムアウトした場合、回復制御部２１１ｄは、その旨を含んだ手動回復プランを作成する（ステップＳ１３０９）。ステップＳ１３０３においてタイムアウトした場合、回復制御部２１１ｄは、その旨を含んだ手動回復プランを作成する（ステップＳ１３１０）。ステップＳ１３０４においてタイムアウトした場合、回復制御部２１１ｄは、その旨を含んだ手動回復プランを作成する（ステップＳ１３１１）。ステップＳ１３０７において、回復完了の合意が正常に行われなかった場合、回復制御部２１１ｄは、その旨を含んだ手動回復プランを作成する（ステップＳ１３１２）。ステップＳ１３０１において、回復プランが手動回復プランであった場合、もしくは、ステップＳ１３０９〜Ｓ１３１２によって手動回復プランが作成された場合、詳細は図１５で後述するが、回復制御部２１１ａα，２１１ｂ〜２１１ｄ、及び、手動回復プラン通知部２１６ａα，２１６ｂ〜２１６ｃによって当該手動回復プランに基づく手動回復処理が行われる（ステップＳ１３１３）。 If a time-out occurs in step S1302, the recovery control unit 211d creates a manual recovery plan including the fact (step S1309). When a time-out occurs in step S1303, the recovery control unit 211d creates a manual recovery plan including that fact (step S1310). When a time-out occurs in step S1304, the recovery control unit 211d creates a manual recovery plan including that fact (step S1311). In step S1307, when the recovery completion agreement is not normally performed, the recovery control unit 211d creates a manual recovery plan including the fact (step S1312). In step S1301, if the recovery plan is a manual recovery plan, or if a manual recovery plan is created in steps S1309 to S1312, the details will be described later with reference to FIG. 15, but recovery controllers 211aα, 211b to 211d, and The manual recovery plan notification units 216aα and 216b to 216c perform manual recovery processing based on the manual recovery plan (step S1313).

《回復プラン実行部（外部送受信ブロック内）の詳細動作》
図１４は、図９の回復プラン実行部の処理内容の一例を示すフローチャートである。図１４のフローチャートは、図３のステップＳ３０７における障害回復処理として実行される。図１４において、まず、回復プラン実行部２１ａαに入力された回復プランは、回復制御部２１１ａαによって受信され、回復制御部２１１ａαは、当該回復プランが手動回復プランかどうかをチェックする（ステップＳ１４０１）。回復プランが手動回復プランで無ければ、ステップＳ１４０２へ移行し、自動回復プランに基づく自動回復処理が行われる。以下、自動回復プランが示す回復先の処理系統が処理系統ＡＡであり、回復元の処理系統が処理系統ＢＢである場合を例として説明を行う。 << Detailed operation of recovery plan execution unit (in external transmission / reception block) >>
FIG. 14 is a flowchart showing an example of processing contents of the recovery plan execution unit of FIG. The flowchart in FIG. 14 is executed as the failure recovery process in step S307 in FIG. In FIG. 14, first, the recovery plan input to the recovery plan execution unit 21aα is received by the recovery control unit 211aα, and the recovery control unit 211aα checks whether the recovery plan is a manual recovery plan (step S1401). If the recovery plan is not a manual recovery plan, the process proceeds to step S1402, and automatic recovery processing based on the automatic recovery plan is performed. Hereinafter, a case where the recovery destination processing system indicated by the automatic recovery plan is the processing system AA and the recovery processing system is the processing system BB will be described as an example.

ステップＳ１４０２において、回復制御部２１１ａαは、回復先の処理系統内の第一通信部３１１Ａに向けた入力データの送信をやめる（ステップＳ１４０２）。次に、回復制御部２１１ａαは、回復先の回復制御部２１１ｂが再構成の開始通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかチェックする（ステップＳ１４０３）。タイムアウトするまでに再構成の開始通知を受信できた場合、回復制御部２１１ａαは、回復先の回復制御部２１１ｂがステート回復の開始通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかチェックする（ステップＳ１４０４）。 In step S1402, the recovery control unit 211aα stops transmitting input data to the first communication unit 311A in the recovery destination processing system (step S1402). Next, the recovery control unit 211aα waits until the recovery destination recovery control unit 211b notifies the start of reconfiguration until time-out, and checks whether time-out has occurred (step S1403). If the reconfiguration start notification can be received before the time-out, the recovery control unit 211aα waits until the recovery destination recovery control unit 211b notifies the start of the state recovery until the time-out, and checks whether the time-out has occurred ( Step S1404).

タイムアウトするまでにステート回復の開始通知を受信できた場合、回復制御部２１１ａαは、入力データ再投入部２１７ａαに、入力データ投入部１２αからの入力データをキャプチャし、入力データ再投入キュー２１８ａαへ保存し続けるよう要求する（ステップＳ１４０５）。続いて、回復制御部２１１ａαは、回復先の回復制御部２１１ｂがステート回復の終了通知を行うのをタイムアウトするまで待ち、タイムアウトしたかどうかチェックする（ステップＳ１４０６）。タイムアウトするまでにステート回復の終了通知を受信できた場合、回復制御部２１１ａαは、入力データ投入部１２αに、正常な処理系統への入力データの投入スピードを低下させるよう要求する（ステップＳ１４０７）。 When the state recovery start notification can be received before the time-out, the recovery control unit 211aα captures the input data from the input data input unit 12α in the input data reinput unit 217aα and stores it in the input data reinput queue 218aα. It requests to continue (step S1405). Subsequently, the recovery control unit 211aα waits until the recovery destination recovery control unit 211b notifies the end of the state recovery until time-out, and checks whether time-out has occurred (step S1406). When the state recovery end notification can be received before the time-out, the recovery control unit 211aα requests the input data input unit 12α to reduce the input data input speed to the normal processing system (step S1407).

次に、回復制御部２１１ａαは、入力データ再投入キュー２１８ａαから入力データを一つ取り出し、回復先の処理系統に送信するよう、入力データ再投入部２１７ａαへ要求する（ステップＳ１４０８）。続いて、回復制御部２１１ａαは、多数決判定部１４ａαに対して、入力データ再投入部２１７ａαから送られてくる入力データを処理した回復先の処理系統の処理結果が、他の正常な処理系統からの処理結果に追いついたか否かを確認する（ステップＳ１４０９）。 Next, the recovery control unit 211aα requests the input data re-input unit 217aα to retrieve one input data from the input data re-input queue 218aα and send it to the recovery destination processing system (step S1408). Subsequently, the recovery control unit 211aα gives the majority decision determination unit 14aα the processing result of the recovery destination processing system that has processed the input data sent from the input data re-input unit 217aα, from other normal processing systems. It is confirmed whether or not it has caught up with the processing result (step S1409).

確認した結果、追いついていない場合、回復制御部２１１ａαは、もう一度ステップＳ１４０８へ進む。確認した結果、追いついていた場合、回復制御部２１１ａαは、入力データ再投入部２１７ａαに、「入力データ投入部１２αからの入力データをキャプチャし、入力データ再投入キュー２１８ａαへ保存する」ことの停止を要求する。さらに、回復制御部２１１ａαは、入力データ投入部１２αに、正常な処理系統への入力データの投入スピードを正常なスピードへ戻すよう要求する（ステップＳ１４１０）。 As a result of the confirmation, if it has not caught up, the recovery control unit 211aα proceeds to step S1408 again. As a result of the confirmation, if it has caught up, the recovery control unit 211aα stops “capturing input data from the input data input unit 12α and storing it in the input data reinput queue 218aα” in the input data reinput unit 217aα. Request. Furthermore, the recovery control unit 211aα requests the input data input unit 12α to return the input data input speed to the normal processing system to the normal speed (step S1410).

続いて、回復制御部２１１ａαは、他の回復制御２１１ｂ〜２１１ｄとの間で、回復完了の合意を前述した回復プランの合意と同様の方法で行う（ステップＳ１４１１）。次に、回復制御部２１１ａαは、回復完了の合意が正常にとれたか否かを確認する（ステップＳ１４１２）。回復完了の合意が正常に行われた場合、回復制御部２１１ａαは、回復が完了したとして、通常モードへ移行する（ステップＳ１４１３）。具体的には、回復制御部２１１ａαは、モード情報２２ａαを通常モードへ上書きし、自動回復処理を終了する。 Subsequently, the recovery control unit 211aα performs recovery completion agreement with the other recovery controls 211b to 211d in the same manner as the recovery plan agreement described above (step S1411). Next, the recovery control unit 211aα confirms whether or not the recovery completion agreement has been normally obtained (step S1412). When the recovery completion agreement is normally performed, the recovery control unit 211aα shifts to the normal mode assuming that the recovery is completed (step S1413). Specifically, the recovery control unit 211aα overwrites the mode information 22aα to the normal mode, and ends the automatic recovery process.

また、ステップＳ１４０３においてタイムアウトした場合、回復制御部２１１ａαは、その旨を含んだ手動回復プランを作成する（ステップＳ１４１４）。ステップＳ１４０４においてタイムアウトした場合、回復制御部２１１ａαは、その旨を含んだ手動回復プランを作成する（ステップＳ１４１５）。ステップＳ１４０６においてタイムアウトした場合、回復制御部２１１ａαは、その旨を含んだ手動回復プランを作成する（ステップＳ１４１６）。ステップＳ１４１２において、回復完了の合意が正常に行われなかった場合、回復制御部２１１ａαは、その旨を含んだ手動回復プランを作成する（ステップＳ１４１７）。ステップＳ１４０１において、回復プランが手動回復プランであった場合、もしくは、ステップＳ１４１４〜Ｓ１４１７によって手動回復プランが作成された場合、詳細は図１５で後述するが、回復制御部２１１ａα，２１１ｂ〜２１１ｄ、及び、手動回復プラン通知部２１６ａα，２１６ｂ〜２１６ｃによって当該手動回復プランに基づく手動回復処理が行われる（ステップＳ１４１８）。 When a time-out occurs in step S1403, the recovery control unit 211aα creates a manual recovery plan including that fact (step S1414). When time-out occurs in step S1404, the recovery control unit 211aα creates a manual recovery plan including that fact (step S1415). When a time-out occurs in step S1406, the recovery control unit 211aα creates a manual recovery plan including the fact (step S1416). In step S1412, if the recovery completion agreement is not normally performed, the recovery control unit 211aα creates a manual recovery plan including the fact (step S1417). In step S1401, if the recovery plan is a manual recovery plan, or if a manual recovery plan is created in steps S1414 to S1417, details will be described later with reference to FIG. 15, but recovery controllers 211aα, 211b to 211d, and The manual recovery plan notification units 216aα and 216b to 216c perform manual recovery processing based on the manual recovery plan (step S1418).

《回復プラン実行部（手動回復処理時）の詳細動作》
図１５は、図１Ａ〜図１Ｃの情報処理システムにおいて、その回復プラン実行部が行う手動回復処理の処理内容の一例を示すフローチャートである。図１５のフローチャートは、前述したステップＳ１１１６、ステップＳ１２１６、ステップＳ１３１３、およびステップＳ１４１８における手動回復処理として、回復プラン実行部２１ａα，２１ｂ〜２１ｄによって実行される。 << Detailed operation of the recovery plan execution unit (during manual recovery processing) >>
FIG. 15 is a flowchart illustrating an example of the processing content of the manual recovery process performed by the recovery plan execution unit in the information processing systems of FIGS. 1A to 1C. The flowchart in FIG. 15 is executed by the recovery plan execution units 21aα and 21b to 21d as the manual recovery process in steps S1116, S1216, S1313, and S1418 described above.

図１５において、まず、手動回復プラン通知部２１６ａα，２１６ｂ〜２１６ｄは、回復制御部２１１ａα，２１１ｂ〜２１１ｄを介して受信した手動回復プランを、手動回復プラン情報として、例えば、各処理系統に備わるログ記録部（図示は省略）や、通信路Ｌ２に接続された監視装置（図示は省略）へ送信する（ステップＳ１５０１）。次に、回復制御部２１１ａα，２１１ｂ〜２１１ｄは、手動回復プラン内の情報から、自身が障害系統かどうかを判定する。自身が障害系統だと判定した場合、障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄのいずれかは、モード情報２２ａα，２２ｂ〜２２ｄを手動回復待ちモードへ書きかえる（ステップＳ１５０３）。 In FIG. 15, first, the manual recovery plan notification units 216aα and 216b to 216d use the manual recovery plan received via the recovery control units 211aα and 211b to 211d as the manual recovery plan information, for example, logs provided in each processing system. The data is transmitted to a recording unit (not shown) or a monitoring device (not shown) connected to the communication path L2 (step S1501). Next, the recovery control units 211aα, 211b to 211d determine whether they are a faulty system from the information in the manual recovery plan. If it is determined that it is a faulty system, any of the recovery control units 211aα, 211b to 211d corresponding to the faulty system rewrites the mode information 22aα and 22b to 22d to the manual recovery waiting mode (step S1503).

続いて、例えば、監視装置を管理する管理者等は、手動回復プラン情報に基づく手動回復処理を実施する（ステップＳ１５０４）。次に、障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄのいずれかは、手動回復処理が完了し、管理者等からその旨が通知されると、モード情報２２ａα，２２ｂ〜２２ｄを手動回復済みモードへ書きかえる（ステップＳ１５０５）。次いで、障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄのいずれかは、自身の手動回復済みモードへの移行を、他の回復制御部２１１ａα，２１１ｂ〜２１１ｄへ通知する（ステップＳ１５０６）。その後、障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄのいずれかは、モード情報２２ａα，２２ｂ〜２２ｄを障害回復モードへ書きかえ、前述したステップＳ３０１へ進む（ステップＳ１５０７）。 Subsequently, for example, an administrator who manages the monitoring apparatus performs a manual recovery process based on the manual recovery plan information (step S1504). Next, when any of the recovery control units 211aα, 211b to 211d corresponding to the failure system has completed the manual recovery process and is notified by the administrator or the like, the mode information 22aα, 22b to 22d is manually recovered. The mode is rewritten to the completed mode (step S1505). Next, one of the recovery control units 211aα, 211b to 211d corresponding to the faulty system notifies the other recovery control units 211aα, 211b to 211d of the transition to the manual recovery completed mode (step S1506). After that, any of the recovery controllers 211aα, 211b to 211d corresponding to the failure system rewrites the mode information 22aα, 22b to 22d to the failure recovery mode, and proceeds to the above-described step S301 (step S1507).

一方、ステップＳ１５０２において自身が障害系統だと判定されなかった場合、非障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄは、モード情報２２ａα，２２ｂ〜２２ｄを縮退運転モードへ書きかえる（ステップＳ１５０８）。次に、非障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄは、障害系統からの手動回復済みモードへの移行通知を待ち、通知が届いているか否かをチェックする（ステップＳ１５０９）。通知が届いていない場合、もう一度ステップＳ１５０９を実行する。通知が届いた場合、非障害系統に対応する回復制御部２１１ａα，２１１ｂ〜２１１ｄは、モード情報２２ａα，２２ｂ〜２２ｄを障害回復モードへ書きかえ、前述したステップＳ３０１へ進む（ステップＳ１５１０）。 On the other hand, if it is not determined in step S1502 that the system is a faulty system, the recovery controllers 211aα, 211b to 211d corresponding to the non-failed system rewrite the mode information 22aα and 22b to 22d to the degenerate operation mode (step S1508). ). Next, the recovery control units 211aα and 211b to 211d corresponding to the non-failed system wait for notification of transition from the failed system to the manual recovery completed mode, and check whether the notification has arrived (step S1509). If the notification has not arrived, step S1509 is executed once again. When the notification arrives, the recovery control units 211aα, 211b to 211d corresponding to the non-failure system rewrite the mode information 22aα and 22b to 22d to the failure recovery mode, and proceed to Step S301 described above (Step S1510).

《モード情報の詳細》
図１６は、図１Ａ〜図１Ｃの情報処理システムにおいて、そのモード情報に格納される各モードの一例およびその関係の一例を表す状態遷移図である。図１６に示すように、モード情報２２ａα，２２ｂ〜２２ｄに格納されるモードは、通常モード１６０と、障害回復モード１６１と、手動回復待ちモード１６２と、手動回復済みモード１６３と、縮退運転モード１６４から構成される。 <Details of mode information>
FIG. 16 is a state transition diagram showing an example of each mode stored in the mode information and an example of the relationship in the information processing system of FIGS. 1A to 1C. As shown in FIG. 16, the modes stored in the mode information 22aα, 22b-22d are the normal mode 160, the failure recovery mode 161, the manual recovery wait mode 162, the manual recovery completed mode 163, and the degenerate operation mode 164. Consists of

通常モード１６０から障害回復モード１６１への遷移１６０１６１は、ステップＳ３０２によって行われる。障害回復モード１６１から通常モード１６０への遷移１６１１６０は、ステップＳ１１１２、ステップＳ１２１１、ステップＳ１３０８、ステップＳ１４１３によって行われる。障害回復モード１６１から手動回復待ちモード１６２への遷移１６１１６２は、ステップＳ１５０３によって行われる。障害回復モード１６１から縮退運転モード１６４への遷移１６１１６４は、ステップＳ１５０８によって行われる。手動回復待ちモード１６２から手動回復済みモード１６３への遷移１６２１６３は、ステップＳ１５０５によって行われる。手動回復済みモード１６３から障害回復モード１６１への遷移１６３１６１はステップＳ１５０７によって行われる。縮退運転モード１６４から障害回復モード１６１への遷移１６４１６１はステップＳ１５１０によって行われる。 Transition 160161 from the normal mode 160 to the failure recovery mode 161 is performed in step S302. The transition 161160 from the failure recovery mode 161 to the normal mode 160 is performed by Step S1112, Step S1211, Step S1308, and Step S1413. The transition 161116 from the failure recovery mode 161 to the manual recovery wait mode 162 is performed in step S1503. The transition 161164 from the failure recovery mode 161 to the degenerate operation mode 164 is performed in step S1508. The transition 162163 from the manual recovery wait mode 162 to the manual recovery completed mode 163 is performed in step S1505. Transition 163161 from the manual recovery completed mode 163 to the failure recovery mode 161 is performed in step S1507. Transition 164161 from the degenerate operation mode 164 to the failure recovery mode 161 is performed in step S1510.

例えば、各モード情報２２ａα，２２ｂ〜２２ｄは、自動回復処理が行われている間は共に障害回復モード１６１となり、自動回復処理が完了すると共に通常モード１６０となる。また、例えば、手動回復処理が行われている間、当該手動回復処理の回復先となる障害系統のモード情報は、手動回復待ちモード１６２となり、それ以外のモード情報は縮退運転モード１６４となる。そして、当該障害系統の手動回復処理が完了すると、当該障害系統のモード情報は、手動回復済みモード１６３となり、これを受けて、それ以外のモード情報は、障害回復モード１６１となる。その後、当該障害系統のモード情報は、障害回復モード１６１を介して通常モード１６０となり、それ以外のモード情報も、通常モード１６０となる。 For example, the mode information 22aα and 22b to 22d are both in the failure recovery mode 161 while the automatic recovery process is being performed, and are in the normal mode 160 when the automatic recovery process is completed. Further, for example, while the manual recovery process is being performed, the mode information of the fault system that is the recovery destination of the manual recovery process is the manual recovery waiting mode 162, and the other mode information is the degenerate operation mode 164. When the manual recovery process of the fault system is completed, the mode information of the fault system becomes the manual recovery completed mode 163, and the other mode information becomes the fault recovery mode 161 in response to this. Thereafter, the mode information of the fault system becomes the normal mode 160 through the fault recovery mode 161, and the other mode information also becomes the normal mode 160.

モード情報が縮退運転モード１６４となっている各処理系統は、縮退運転を行う。すなわち、例えば、処理系統ＡＡに手動回復処理が必要な障害が生じた場合、処理系統ＢＢ、処理系統ＣＣ、および外部送受信ブロック１０αのモード情報は、共に縮退運転モード１６４となり、処理系統ＢＢおよび処理系統ＣＣは、外部送受信ブロック１０αからの入力データに応じた処理をそのまま継続する。この際に、外部送受信ブロック１０αの多数決判定部１４ａαは、当該２個の処理系統からの２個の処理結果を受け、当該２個の処理結果が同一の場合には当該同一の処理結果を外部に向けて出力する。一方、多数決判定部１４ａαは、当該２個の処理結果が異なる場合には、システムの停止等を行う。 Each processing system whose mode information is the degenerate operation mode 164 performs the degenerate operation. That is, for example, when a failure requiring a manual recovery process occurs in the processing system AA, the mode information of the processing system BB, the processing system CC, and the external transmission / reception block 10α are all in the degenerate operation mode 164, and the processing system BB and the processing system The system CC continues the process according to the input data from the external transmission / reception block 10α as it is. At this time, the majority decision determination unit 14aα of the external transmission / reception block 10α receives two processing results from the two processing systems, and if the two processing results are the same, the same processing result is externally transmitted. Output to. On the other hand, the majority decision determination unit 14aα stops the system or the like when the two processing results are different.

また、モード情報が障害回復モード１６１であり、各処理系統で自動回復処理が行われている間、回復先の処理系統を除いた残り２個の処理系統と外部送受信ブロックは、前述した縮退運転モード１６４の場合と同様にして縮退運転を行う。この縮退運転の期間では、情報処理システムは、多数決判定部を用いてエラーを訂正することは困難となるものの、エラーを検出することは可能となっている。これによって、情報処理システムの稼働を継続しつつ誤った処理結果が外部に流出することを防止することができ、フェイルセーフやフェイルソフトの観点で情報処理システムの信頼性を向上させることが可能になる。 Further, while the mode information is the failure recovery mode 161 and the automatic recovery processing is being performed in each processing system, the remaining two processing systems excluding the recovery destination processing system and the external transmission / reception block are in the degenerate operation described above. The degenerate operation is performed in the same manner as in the mode 164. During this degenerate operation period, it is difficult for the information processing system to correct the error using the majority decision determination unit, but it is possible to detect the error. As a result, it is possible to prevent an erroneous processing result from leaking to the outside while continuing the operation of the information processing system, and to improve the reliability of the information processing system from the viewpoint of fail-safe or fail software. Become.

《システム構成情報の詳細》
図１７は、図１Ａ〜図１Ｃの情報処理システムにおいて、そのシステム構成情報に格納される内容の一例を示す図である。図１７に示すように、システム構成情報１７ａα，１７ｂ〜１７ｄのそれぞれには、図１に示した外部送受信ブロック１０α，１０β、診断ブロック３００Ａ〜３００Ｃ、および処理ブロック３１０Ａ〜３１０Ｃの間の接続関係等が保持される。当該システム構成情報１７ａα，１７ｂ〜１７ｄは、例えば、通信を行う際や、障害発生時の各系統間の依存関係を解決する際等で使用される。 << Details of system configuration information >>
FIG. 17 is a diagram illustrating an example of contents stored in the system configuration information in the information processing systems of FIGS. 1A to 1C. As shown in FIG. 17, the system configuration information 17aα, 17b to 17d includes, for example, connection relationships between the external transmission / reception blocks 10α and 10β, the diagnosis blocks 300A to 300C, and the processing blocks 310A to 310C shown in FIG. Is retained. The system configuration information 17aα, 17b to 17d is used, for example, when performing communication or resolving the dependency between systems when a failure occurs.

《障害診断情報共有部の他の処理内容》
図１８は、図１Ａ〜図１Ｃの情報処理システムにおいて、その障害診断情報共有部が行う図６とは異なる処理内容の一例を示すフローチャートである。図６は、障害回復モードとなっている障害診断情報共有部１８ａα，１８ｂ〜１８ｄで実行されるが、図１８は、通常モードとなっている（すなわち多数決判定部からエラー判定結果を受けていない）障害診断情報共有部１８ａα，１８ｂ〜１８ｄで実行され、図２のフローチャートと並行して実行される。 <Other processing contents of the fault diagnosis information sharing unit>
FIG. 18 is a flowchart illustrating an example of processing contents different from those in FIG. 6 performed by the failure diagnosis information sharing unit in the information processing systems of FIGS. 1A to 1C. 6 is executed by the failure diagnosis information sharing units 18aα and 18b to 18d in the failure recovery mode, but FIG. 18 is in the normal mode (that is, no error determination result is received from the majority decision determination unit). ) It is executed by the fault diagnosis information sharing unit 18aα, 18b to 18d, and is executed in parallel with the flowchart of FIG.

図１８において、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、他の障害診断回復系統からの障害診断情報を待ちうける（ステップＳ１８０１）。他の障害診断回復系統から障害診断情報が送られてきた場合、障害診断情報共有部１８ａα，１８ｂ〜１８ｄは、前述した図３のフローチャートを実行する（ステップＳ１８０２）。ここで、例えば、障害診断回復系統ａα内の多数決判定部１４ａαに障害が生じ、障害診断回復系統ａαから障害診断情報共有部１８ｂ〜１８ｄに向けて障害診断情報が送られてきた場合を想定する。 In FIG. 18, the fault diagnosis information sharing units 18aα, 18b to 18d wait for fault diagnosis information from other fault diagnosis recovery systems (step S1801). When fault diagnosis information is sent from another fault diagnosis recovery system, the fault diagnosis information sharing units 18aα and 18b to 18d execute the above-described flowchart of FIG. 3 (step S1802). Here, for example, it is assumed that a failure occurs in the majority decision determination unit 14aα in the failure diagnosis recovery system aα, and failure diagnosis information is sent from the failure diagnosis recovery system aα toward the failure diagnosis information sharing units 18b to 18d. .

この場合、各障害診断回復系統ｂ〜ｄは、図３のステップＳ３０２において障害回復モードへ移行し、ステップＳ３０３において障害診断情報を生成する。当該障害診断情報には、例えば、自身の多数決判定部で障害が検出されない旨の情報や、あるいは、障害診断回復系統ａαからのみ障害診断情報を受信した旨の情報が含まれる。その後、ステップＳ３０４において、各障害診断情報共有部１８ａα，１８ｂ〜１８ｄは障害診断情報を共有する。この際には、障害診断情報共有部１８ｂ〜１８ｄ間では障害が検出されない旨あるいは障害診断回復系統ａαからのみ障害診断情報を受信した旨で一致し、障害診断情報共有部１８ａαの障害診断情報のみが異なることになる。この場合、障害診断情報共有部１８ｂ〜１８ｄ（又は回復プラン判定部１９ｂ〜１９ｄ）は、障害診断回復系統ａαに障害があると判断し、その旨の信号を外部送受信ブロック１０βに通知し、外部送受信ブロック１０αから外部送受信ブロック１０βへの切り替えを行わせる。 In this case, each failure diagnosis recovery system b to d shifts to the failure recovery mode in step S302 of FIG. 3, and generates failure diagnosis information in step S303. The failure diagnosis information includes, for example, information indicating that no failure is detected by its own majority decision unit, or information indicating that failure diagnosis information is received only from the failure diagnosis recovery system aα. Thereafter, in step S304, the respective fault diagnosis information sharing units 18aα and 18b to 18d share the fault diagnosis information. In this case, the failure diagnosis information sharing units 18b to 18d agree that no failure is detected or that failure diagnosis information is received only from the failure diagnosis recovery system aα, and only failure diagnosis information of the failure diagnosis information sharing unit 18aα is obtained. Will be different. In this case, the failure diagnosis information sharing units 18b to 18d (or the recovery plan determination units 19b to 19d) determine that the failure diagnosis recovery system aα has a failure, notify the external transmission / reception block 10β to that effect, Switching from the transmission / reception block 10α to the external transmission / reception block 10β is performed.

また、例えば、障害診断回復系統ｂ内の多数決判定部１４ｂに障害が生じ、障害診断回復系統ｂから障害診断情報共有部１８ａα，１８ｃ，１８ｄに向けて障害診断情報が送られてきた場合を想定する。この場合、各障害診断回復系統ａα，ｃ，ｄは、図３のステップＳ３０２において障害回復モードへ移行し、ステップＳ３０３において障害診断情報を生成する。当該障害診断情報は、例えば、自身の多数決判定部で障害が検出されない旨の内容や、あるいは、障害診断回復系統ｂからのみ障害診断情報を受信した旨の内容を持つ。 Further, for example, it is assumed that a failure occurs in the majority decision determination unit 14b in the failure diagnosis recovery system b, and failure diagnosis information is sent from the failure diagnosis recovery system b to the failure diagnosis information sharing units 18aα, 18c, and 18d. To do. In this case, each failure diagnosis recovery system aα, c, d shifts to the failure recovery mode in step S302 of FIG. 3, and generates failure diagnosis information in step S303. The fault diagnosis information has, for example, content indicating that no fault is detected by its own majority determination unit, or content indicating that fault diagnosis information has been received only from the fault diagnosis recovery system b.

その後、図３のステップＳ３０４において、各障害診断情報共有部１８ａα，１８ｂ〜１８ｄは障害診断情報を共有する。この際には、障害診断情報共有部１８ａα，１８ｃ，１８ｄ間では障害が検出されない旨あるいは障害診断回復系統ｂからのみ障害診断情報を受信した旨で一致し、障害診断情報共有部１８ｂの障害診断情報のみが異なることになる。この場合、図３のステップＳ３０５において、回復プラン判定部１９ａα，１９ｃ，１９ｄは、診断ブロック３００Ａに障害があると判断し、その旨を表す手動回復プランを作成する。その後、ステップＳ３０６およびＳ３０７の処理が実行される。 Thereafter, in step S304 of FIG. 3, the fault diagnosis information sharing units 18aα and 18b to 18d share the fault diagnosis information. In this case, the failure diagnosis information sharing unit 18aα, 18c, 18d agrees that a failure is not detected or that failure diagnosis information is received only from the failure diagnosis recovery system b. Only the information will be different. In this case, in step S305 in FIG. 3, the recovery plan determination units 19aα, 19c, and 19d determine that there is a failure in the diagnostic block 300A, and create a manual recovery plan indicating that fact. Thereafter, the processes of steps S306 and S307 are executed.

なお、ここでは、障害診断回復系統ａαに障害がある際には外部送受信ブロックを早期に切り替えることが望ましいため、障害診断情報共有部１８ｂ〜１８ｄ（又は回復プラン判定部１９ｂ〜１９ｄ）が当該切り替えを指示する例で説明を行った。ただし、場合によっては、図３のステップＳ３０５〜Ｓ３０７を処理を経たのち、回復プラン実行部によって当該切り替えの指示が行われるような処理を用いることも可能である。さらに、場合によっては、障害診断情報共有部１８ｂ〜１８ｄに、障害診断回復系統ａαからの障害診断情報のみを受けた時点で、外部送受信ブロックの切り替えの指示を出させるような処理を用いることも可能である。いずれにしても、障害診断情報共有部１８ａα，１８ｂ〜１８ｄによって障害診断情報の共有を行うことで、障害診断情報共有部のいずれか一つのみから障害診断情報が生成されるような事態（すなわち多数決判定部等に障害が生じた事態）を検出することが可能になる。 Here, since it is desirable to switch the external transmission / reception block early when there is a failure in the failure diagnosis and recovery system aα, the failure diagnosis information sharing units 18b to 18d (or the recovery plan determination units 19b to 19d) perform the switching. The explanation is given with an example of instructing. However, depending on the case, it is also possible to use a process in which the recovery plan execution unit instructs the switching after the processes of steps S305 to S307 in FIG. Further, in some cases, a process may be used in which the fault diagnosis information sharing units 18b to 18d are instructed to switch the external transmission / reception block when receiving only the fault diagnosis information from the fault diagnosis recovery system aα. Is possible. In any case, failure diagnosis information is shared by the failure diagnosis information sharing units 18aα, 18b to 18d, so that failure diagnosis information is generated from only one of the failure diagnosis information sharing units (ie, It is possible to detect a situation in which a failure has occurred in the majority decision unit or the like.

また、図３のフローチャートは、ステップＳ２０８およびステップＳ１８０２の２つのステップから並列に起動される可能性がある。そのため、２つの並列で動作するステップが同時に障害回復モードへ遷移しないように、ステップＳ３０１およびステップＳ３０２は、ひと固まりの処理として実行される必要がある。 Further, the flowchart of FIG. 3 may be activated in parallel from two steps of step S208 and step S1802. Therefore, step S301 and step S302 need to be executed as a single process so that two steps operating in parallel do not simultaneously shift to the failure recovery mode.

《障害診断情報共有部における障害診断情報の交換方法》
図１９は、図６のフローチャートにおいて、障害診断情報の交換方法の一例を示すシーケンス図である。図１９に示すように、各障害診断回復系統ａα，ｂ〜ｄは、図６のステップＳ６０２において、自身が生成したデータ（ここでは障害診断情報）Ａ１９０，Ｂ１９０〜Ｄ１９０を、他の各障害診断回復系統へブロードキャストすることによって、データを交換する。その後、障害診断回復系統ａα，ｂ〜ｄのそれぞれは、図６のステップＳ６０３において、交換によって得られたデータ１９１と、自身が生成したデータ（Ａ１９０，Ｂ１９０〜Ｄ１９０のいずれか）とを合わせ、Ａ１９０〜Ｄ１９０が全て揃えば成功、何らかのエラーにより全てが揃わなければ失敗と判断する。《How to exchange fault diagnosis information in the fault diagnosis information sharing unit》
FIG. 19 is a sequence diagram illustrating an example of a method for exchanging fault diagnosis information in the flowchart of FIG. As shown in FIG. 19, each failure diagnosis recovery system aα, b-d uses the data (in this case, failure diagnosis information) A190, B190-D190 generated by itself in step S602 of FIG. Data is exchanged by broadcasting to the recovery system. Thereafter, each of the fault diagnosis recovery systems aα, b to d combines the data 191 obtained by the exchange with the data generated by itself (any one of A190, B190 to D190) in step S603 of FIG. If all of A190 to D190 are aligned, it is determined to be successful.

《回復プラン合意部における回復プランの交換方法》
図２０および図２１は、図７のフローチャートにおいて、回復プランのビザンチン合意プロトコルに基づく交換方法の一例を示すシーケンス図である。図２０には、ビザンチン合意プロトコルを用いて、例えば系統aが生成したデータ（ここでは回復プラン）を共有する際の処理内容が示されている。ビザンチン合意プロトコルは、あるｎ個のタスク間での合意、つまり同じデータの共有の際、ｍ個のタスク中に障害が存在しているが、どのタスクに障害が存在しているか分からない状態でも、障害が発生しているタスク以外のタスク間で、同一データの共有を行い、正しく合意を取るためのプロトコルである。データを共有するため、１個のタスクが他の（ｎ−１）個のタスクにデータを送るが、このとき、合意が成立するのは、以下の３つの条件が成立するときである。《Recovery plan exchange method in Recovery Plan Agreement Department》
20 and 21 are sequence diagrams illustrating an example of an exchange method based on the recovery plan Byzantine agreement protocol in the flowchart of FIG. 7. FIG. 20 shows the processing contents when the data (here, the recovery plan) generated by the system a is shared using the Byzantine agreement protocol. Byzantine agreement protocol is an agreement between n tasks, that is, when sharing the same data, there is a failure in m tasks, but it is not clear which task has a failure. This is a protocol for sharing the same data among the tasks other than the task in which the failure has occurred, and obtaining a correct agreement. In order to share data, one task sends data to the other (n-1) tasks. At this time, an agreement is established when the following three conditions are established.

１つ目は、全ての障害のないタスク間では、受信データに関して合意が成立する。２つ目は、送信のタスクが正常であるなら、前記１つ目の条件で合意されたデータは実際に送信されたデータに一致する。３つ目は、前記１つ目の条件および２つ目の条件のもとでｍ個のタスクに障害が含まれているとき、ビザンチン合意が成立するのは、ｎ≧３ｍ＋１の関係が成立するときである。 First, an agreement is established regarding the received data among all the tasks having no failure. Second, if the transmission task is normal, the data agreed in the first condition matches the actually transmitted data. Third, when m tasks contain faults under the first and second conditions, the Byzantine agreement is established because the relation n ≧ 3m + 1 is established. Is the time.

ビザンチン合意プロトコルは、図７のステップＳ７０２において、回復プランの確実な共有のために実行される。図２０では、系統ａが作成したデータ（ここでは回復プラン）を、Ａ（２００）で示している。また、ここでは、系統ｂが、データの送信時に間違ったデータを送信してしまうタイプのビザンチン障害であることを前提に説明する。まず、系統ａは、１回目の通信として、データＡ（２００）を他の系統ｂ〜ｄへブロードキャストする。系統ｂ〜ｄで受信されたＡ（２００）は、Ａ（２００）に対するｉ番目の受信データであるという意味でＡｉ（ｉ＝１，２，３）と表し、ここでは１番目の受信データであるため、Ａ１（２０１）としている。正しく受信された場合、Ａ１、Ａ２、Ａ３は全て同一のデータである。 The Byzantine agreement protocol is executed for reliable sharing of the recovery plan in step S702 of FIG. In FIG. 20, data (here, the recovery plan) created by the system a is indicated by A (200). Here, the description will be made on the assumption that the system b is a type of Byzantine failure that transmits wrong data when transmitting data. First, the system a broadcasts data A (200) to the other systems b to d as the first communication. A (200) received by the systems b to d is represented as Ai (i = 1, 2, 3) in the sense that it is the i-th received data with respect to A (200). Therefore, A1 (201) is set. When correctly received, A1, A2, and A3 are all the same data.

次に、データＡ１（２０１）を受信した系統ｂ〜ｄは、２回目の通信として、自身と、系統ａ以外の残り２つの系統へそれぞれＡ１（２０１）を送信する。系統ｃ，ｄは、正常に動作しているため、正しいデータであるＡ２（２０２）およびＡ３（２０３）を受信する事ができるが、系統ｂは送信時に誤ったデータ（ここではＡ’，Ａ”）を送信してしまうタイプのビザンチン障害であるため、系統ｃ，ｄは、誤ったデータＡ’３（２０４），Ａ”３（２０５）を受信している。 Next, the systems b to d that have received the data A1 (201) transmit A1 (201) to itself and the remaining two systems other than the system a as the second communication. Since the systems c and d operate normally, they can receive the correct data A2 (202) and A3 (203), but the system b receives erroneous data (here, A ′, A in the transmission). Since the Byzantine failure is a type that transmits “”), the systems c and d receive erroneous data A′3 (204) and A ″ 3 (205).

次に、系統ｂ〜ｄは、異なる経路から受信した３つのデータ、Ａ１（２０１）Ａ２（２０２）及びＡ３（２０３）またはＡ’３（２０４）またはＡ”３（２０５）を多数決し、多数決判定結果として、Ａ２０６〜Ａ２０８を得る。このようにビザンチン合意プロトコルを用いることによって、系統ｃ，ｄは、系統ｂから誤ったデータＡ’３（２０４）、Ａ”３（２０５）を受け取ったにもかかわらず、正しい結果であるＡ（２０７），Ａ（２０８）を得ることができる。 Next, the systems b to d make a majority decision on three data received from different paths, A1 (201) A2 (202) and A3 (203) or A′3 (204) or A ″ 3 (205). As a result of the determination, A206 to A208 are obtained.By using the Byzantine agreement protocol in this way, the lines c and d receive erroneous data A′3 (204) and A ″ 3 (205) from the line b. Nevertheless, the correct results A (207) and A (208) can be obtained.

図２１には、ビザンチン合意プロトコルを用いて、図２０に示した処理内容と同様に、系統a，ｂ，ｃ，ｄのそれぞれが自身のデータ（ここでは回復プラン）Ａ（２１０），Ｂ（２１１），Ｃ（２１２），Ｄ（２１３）を共有する際の処理内容が示されている。図２１では、系統ａが、データの送信時に間違ったデータを送信してしまうタイプのビザンチン障害であることを前提に説明する。つまり、単一障害を前提とすると、例えば第一通信部１３αや障害診断情報共有部１８ａα等がビザンチン障害に陥った場合の説明となる。 In FIG. 21, using the Byzantine agreement protocol, each of the systems a, b, c, and d has its own data (here, a recovery plan) A (210), B (similar to the processing contents shown in FIG. 211), C (212), and D (213). In FIG. 21, description will be made on the assumption that the system a is a Byzantine failure that transmits wrong data when data is transmitted. That is, assuming a single failure, for example, the first communication unit 13α, the failure diagnosis information sharing unit 18aα, and the like will be described when a Byzantine failure occurs.

図２０で前述したように、各系統ａ〜ｄは、ビザンチン合意プロトコルに従って、自身の系統以外のデータについて、３つずつ、データを受信する。この例では、系統ａは、系統ｂに向けて誤ったデータＡ”を、系統ｂ，ｃに向けて誤ったデータＡ’を送信している。また、系統ａは、系統ｂからのデータＢを受けて系統ｃ，ｄに向けて誤ったデータＢ’を送信し、系統ｃからのデータＣを受けて系統ｂ，ｄに向けて誤ったデータＣ’を送信し、系統ｄからのデータＤを受けて系統ｂ，ｃに向けて誤ったデータＤ’を送信している。 As described above with reference to FIG. 20, each of the systems a to d receives three pieces of data for data other than its own system according to the Byzantine agreement protocol. In this example, the system a transmits erroneous data A ″ toward the system b, and erroneous data A ′ toward the systems b and c. The system a also transmits data B from the system b. In response, the erroneous data B ′ is transmitted to the systems c and d, the erroneous data C ′ is transmitted to the systems b and d in response to the data C from the system c, and the data D from the system d is transmitted. In response, erroneous data D ′ is transmitted to the systems b and c.

これにより、系統ｂ〜ｄが受信したデータの中には、誤ったデータが一部含まれている。ただし、データの交換が終了したのち、系統ａ〜ｄが異なる経路から受信した３つのデータを多数決判定した結果、系統ａは、Ａ，Ｂ，Ｃ，Ｄを、系統ｂ〜ｄは、Ａ’，Ｂ，Ｃ，Ｄを得ることができる。このように、ビザンチン合意プロトコルによって、全ての系統で交換したデータを少なくともビザンチン障害が生じた系統以外の系統で同じものにすることができる。すなわち、系統ａでビザンチン障害が生じた場合、系統ｂ〜ｄは、系統ａのデータＡを正しく得られない場合があるが、系統ｂ〜ｄのデータＢ〜Ｄを正しく得ることはできる。 Thereby, a part of erroneous data is included in the data received by the systems b to d. However, after the exchange of data is completed, the majority of the three data received from different routes of the systems a to d is determined. As a result, the system a is A, B, C, D, and the systems b to d are A ′. , B, C, D can be obtained. Thus, by the Byzantine agreement protocol, the data exchanged in all the lines can be made the same in at least the lines other than the line in which the Byzantine disorder has occurred. That is, when a Byzantine disorder occurs in the line a, the lines b to d may not obtain the data A of the line a correctly, but can correctly obtain the data B to D of the lines b to d.

その結果、ビザンチン障害が生じた系統以外の３つの系統ｂ〜ｄは、データ（ここでは回復プラン）Ｂ〜Ｄの一致に基づいて、系統ａを障害発生箇所とする障害発生後の動作（手動回復処理や縮退運転等）を正しく実行することが可能になる。なお、ビザンチン障害が存在する場合には正しい自動回復処理が行えない場合がある。したがって、図２１のように、例えば３個のデータＣが得られずに２個のデータＣと１個のデータＣ’によって多数決判定を行うといったような多数決判定の必要性が生じるような場合には、図７のステップＳ７０４において、回復プランを不一致として処理することが望ましい。 As a result, the three systems b to d other than the system in which the Byzantine failure occurs are based on the coincidence of the data (here, the recovery plans) B to D, and the operation after the occurrence of the failure (manual operation with the system a as the failure occurrence location) Recovery processing, degenerate operation, etc.) can be executed correctly. If there is a Byzantine failure, correct automatic recovery processing may not be performed. Therefore, as shown in FIG. 21, for example, when there is a need for a majority decision such as making a majority decision based on two data C and one data C ′ without obtaining three data C. In step S704 of FIG. 7, it is desirable to process the recovery plans as inconsistent.

以上、本実施の形態１の情報処理システムを用いることで、代表的には、信頼性の向上が実現可能になる。 As described above, by using the information processing system of the first embodiment, it is typically possible to improve reliability.

（実施の形態２）
《処理ブロックの構成（変形例）》
図２２は、本発明の実施の形態２による情報処理システムにおいて、その処理ブロックの構成例を示すブロック図である。図２２に示す処理ブロック３１０Ａは、前述した図１の情報処理システムにおける処理ブロック３１０Ａの変形例となっている。ここでは、図１の処理ブロック３１０Ａを代表として説明するが、図１の処理ブロック３１０Ｂ，３１０Ｃに対しても図２２の構成例が適用される。 (Embodiment 2)
<< Configuration of Processing Block (Modification) >>
FIG. 22 is a block diagram showing a configuration example of the processing block in the information processing system according to the second embodiment of the present invention. A processing block 310A shown in FIG. 22 is a modification of the processing block 310A in the information processing system of FIG. 1 described above. Here, the processing block 310A of FIG. 1 will be described as a representative, but the configuration example of FIG. 22 is also applied to the processing blocks 310B and 310C of FIG.

図２２に示す処理ブロック３１０Ａは、図１の処理ブロックと比較して、再構成可能回路部３１９Ａ内の構成が異なっており、それ以外の構成に関しては図１と同様である。図２２に示す再構成可能回路部３１９Ａは、図１の再構成可能回路部内のステートフル処理部３１２Ａの代わりに三重化されたステートフル処理部［１］３１２Ａ１〜［３］３１２Ａ３と、多数決判定部３２０Ａを備えている。 The processing block 310A shown in FIG. 22 differs from the processing block of FIG. 1 in the configuration in the reconfigurable circuit unit 319A, and the other configurations are the same as those in FIG. A reconfigurable circuit unit 319A shown in FIG. 22 includes triple stateful processing units [1] 312A1 to [3] 312A3 instead of the stateful processing unit 312A in the reconfigurable circuit unit of FIG. It has.

ステートフル処理部［１］３１２Ａ１〜［３］３１２Ａ３は、第一通信部３１１Ａを介して入力された入力データと、ステート記憶部３１４Ａからステート記憶管理部３１３Ａを介して入力されたステートとを用いて同一の処理を行い、これによって得られる処理結果（出力データ）およびステートを多数決判定部３２０Ａにそれぞれ出力する。多数決判定部３２０Ａは、３個の処理結果（出力データ）の多数決判定を行い、これによって得られる処理結果（出力データ）を第一通信部３１１Ａを介して出力し、また、３個のステートの多数決判定を行い、これによって得られるステートをステート記憶管理部３１３Ａを介してステート記憶部３１４Ａに書き込む。 The stateful processing unit [1] 312A1 to [3] 312A3 uses the input data input via the first communication unit 311A and the state input from the state storage unit 314A via the state storage management unit 313A. The same processing is performed, and the processing result (output data) and state obtained thereby are output to majority decision section 320A. The majority decision determination unit 320A performs a majority decision on the three processing results (output data), outputs the processing result (output data) obtained thereby through the first communication unit 311A, and also determines the three states. The majority decision is made, and the state obtained thereby is written to the state storage unit 314A via the state storage management unit 313A.

このように、本実施の形態２の情報処理システムは、再構成可能回路部内で多数決判定を行い、更に、図１の外部送受信ブロック１０αおよび診断ブロック３００Ａ〜３００Ｃ内でも多数決判定を行うといったように２段階の多数決判定を行う構成となっている。このような構成例を用いると、例えば、ステートフル処理部［１］３１２Ａ１〜［３］３１２Ａ３のいずれか一つに障害が生じたような場合でも、回復処理や縮退運転等を行わずに情報処理システムとしての処理を継続的に行うことが可能になる。言い換えれば、見かけ上、処理ブロックの耐障害性を向上させることが可能になる。 As described above, the information processing system according to the second embodiment makes a majority decision in the reconfigurable circuit unit, and also makes a majority decision in the external transmission / reception block 10α and the diagnostic blocks 300A to 300C in FIG. It is configured to perform a two-step majority decision. When such a configuration example is used, for example, even when a failure occurs in any one of the stateful processing units [1] 312A1 to [3] 312A3, information processing is performed without performing recovery processing, degenerate operation, or the like. Processing as a system can be performed continuously. In other words, the fault tolerance of the processing block can be apparently improved.

ただし、再構成可能回路部内の多数決判定部３２０Ａに障害が生じた場合には回復処理が必要となる。また、本実施の形態２の情報処理システムは、実施の形態１における図１の情報処理システムと比較して、再構成可能回路部での回路規模の増大やレイテンシの増大等が生じ得る。したがって、回路規模、レイテンシ、および信頼性のバランスの観点では、本実施の形態２の情報処理システムよりも、実施の形態１における図１の情報処理システムの方が望ましいと考えられる。 However, if a failure occurs in the majority decision unit 320A in the reconfigurable circuit unit, recovery processing is required. In addition, the information processing system according to the second embodiment may cause an increase in circuit scale or latency in the reconfigurable circuit unit as compared with the information processing system in FIG. 1 according to the first embodiment. Therefore, it is considered that the information processing system of FIG. 1 in the first embodiment is more desirable than the information processing system of the second embodiment in terms of the balance of circuit scale, latency, and reliability.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。例えば、前述した実施の形態は、本発明を分かり易く説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施の形態の構成の一部を他の実施の形態の構成に置き換えることが可能であり、また、ある実施の形態の構成に他の実施の形態の構成を加えることも可能である。また、各実施の形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiments. However, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to one having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. . Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

例えば、ここでは、ビザンチン障害を含めた単一障害に対応するため、３個の処理ブロック３１０Ａ〜３１０Ｃと４個の障害診断回復系統ａα，ｂ〜ｄを備えたが、２個の障害に対応する場合には、例えば、多数決判定を可能にするための５個の処理ブロックと、２個の障害発生時にビザンチン合意を得るための７個の障害診断回復系統を備えればよい。 For example, here, in order to cope with a single failure including a Byzantine failure, three processing blocks 310A to 310C and four failure diagnosis recovery systems aα, b to d are provided. In this case, for example, five processing blocks for enabling majority decision and seven fault diagnosis recovery systems for obtaining a Byzantine agreement when two faults occur may be provided.

１０α，１０β 外部送受信ブロック
１１α，１１β 外部通信部
１２α，１２β 入力データ投入部
１３α，１３β，３０１Ａ〜３０１Ｃ第一通信部
１４ａα，１４ａβ，１４ｂ〜１４ｄ多数決判定部
１５α，１５β 出力データ投入部
１６ａα，１６ａβ，１６ｂ〜１６ｄシステム障害診断部
１７ａα，１７ａβ，１７ｂ〜１７ｄシステム構成情報
１８ａα，１８ａβ，１８ｂ〜１８ｄ障害診断情報共有部
１９ａα，１９ａβ，１９ｂ〜１９ｄ回復プラン判定部
２０ａα，２０ａβ，２０ｂ〜２０ｄ回復プラン合意部
２１１ａα 回復制御部
２１１ｂ回復制御部
２１２ｂ再構成データ生成部
２１３ｂ回路データ
２１４ｂ構成回復部
２１５ｂステート回復部
２１６ａα 手動回復プラン通知部
２１６ｂ手動回復プラン通知部
２１７ａα 入力データ再投入部
２１８ａα 入力データ再投入キュー
２１ａα，２１ａβ，２１ｂ〜２１ｄ回復プラン実行部
２２ａα，２２ａβ，２２ｂ〜２２ｄモード情報
２３ａα，２３ａβ，２３ｂ〜２３ｄ障害診断部
３００Ａ〜３００Ｃ診断ブロック
３０２Ａ〜３０２Ｃ構成不良箇所診断部
３０３Ａ〜３０３Ｃ第二通信部
３１０Ａ〜３１０Ｃ処理ブロック
３１１Ａ〜３１１Ｃ第一通信部
３１２Ａ，３１２Ａ１〜３２１Ａ３ステートフル処理部
３１３Ａ〜３１３Ｃステート記憶管理部
３１４Ａ〜３１４Ｃステート記憶部
３１５Ａ１〜３１５Ｃ１第一バンク
３１５Ａ２〜３１５Ｃ２第二バンク
３１６Ａ〜３１６Ｃ第二通信部
３１７Ａ〜３１７Ｃ構成情報記憶管理部
３１８Ａ〜３１８Ｃ構成情報記憶部
３２０Ａ多数決判定部
ＡＡ〜ＣＣ処理系統
Ｌ１，Ｌ２通信路
ａα，ａβ，ｂ〜ｄ障害診断回復系統 10α, 10β External transmission / reception block 11α, 11β External communication unit 12α, 12β Input data input unit 13α, 13β, 301A to 301C First communication unit 14aα, 14aβ, 14b-14d Majority determination unit 15α, 15β Output data input unit 16aα, 16aβ , 16b to 16d System failure diagnosis unit 17aα, 17aβ, 17b to 17d System configuration information 18aα, 18aβ, 18b to 18d Failure diagnosis information sharing unit 19aα, 19aβ, 19b to 19d Recovery plan determination unit 20aα, 20aβ, 20b to 20d Recovery plan Agreement unit 211aα Recovery control unit 211b Recovery control unit 212b Reconfiguration data generation unit 213b Circuit data 214b Configuration recovery unit 215b State recovery unit 216aα Manual recovery plan notification unit 216b Manual recovery plan notification unit 217aα Re-input unit 218aα Input data re-input queue 21aα, 21aβ, 21b to 21d Recovery plan execution unit 22aα, 22aβ, 22b to 22d Mode information 23aα, 23aβ, 23b to 23d Fault diagnosis unit 300A to 300C Diagnosis block 302A to 302C Configuration failure Location diagnosis unit 303A to 303C Second communication unit 310A to 310C Processing block 311A to 311C First communication unit 312A, 312A1 to 321A3 Stateful processing unit 313A to 313C State storage management unit 314A to 314C State storage unit 315A1 to 315C1 First bank 315A2 315C2 Second bank 316A to 316C Second communication unit 317A to 317C Configuration information storage management unit 318A to 318C Configuration information storage unit 320A Majority determination unit AA to CC processing Integrated L1, L2 channel aα, aβ, b~d fault diagnosis recovery system

Claims

First to third processing blocks that execute the same processing according to a common input from the outside and output the first to third processing results, respectively;
First to third diagnostic blocks provided corresponding to the first to third processing blocks, respectively;
A fourth diagnostic block,
The Nth processing block (N = 1, 2, 3) is
An Nth configuration information storage unit for storing circuit configuration information;
An N-th reconfigurable circuit unit that constructs a circuit according to the configuration information of the circuit and outputs the N-th processing result according to the operation of the circuit;
The Mth diagnostic block (M = 1, 2, 3, 4)
When the first to third processing results are input and there is an error in any of the first to third processing results, an Mth error determination result specifying the processing result having the error is output. An M-th majority decision determination unit;
An Mth fault diagnosis unit that receives the Mth error determination result and diagnoses a fault;
An Mth communication unit for communicating with other diagnostic blocks,
The Mth failure diagnosis unit transmits Mth diagnosis information reflecting the Mth error determination result to another failure diagnosis unit through the Mth communication unit, and receives diagnosis information from the other failure diagnosis unit. An information processing system having an M-th sharing unit that respectively receives the M-th communication unit.

The information processing system according to claim 1,
The M-th failure diagnosis unit further determines whether any of the first to third processing blocks has a failure based on the first to fourth diagnosis information shared by the M-th sharing unit. If any of the first to third processing blocks is faulty, the recovery destination and the information on the processing block to be restored that are determined from the first to third processing blocks are included. An Mth recovery plan determination unit for generating an Mth recovery plan;
The M-th diagnosis block further includes an M-th recovery plan execution unit that executes an automatic recovery process of the recovery-destination processing block based on the M-th recovery plan.

The information processing system according to claim 2,
While the M-th recovery plan execution unit is executing automatic recovery processing of the recovery destination processing block, processing blocks other than the recovery destination processing block are processed in accordance with the common input from the outside. An information processing system that executes

The information processing system according to claim 3.
The M-th failure diagnosing unit further transmits the M-th recovery plan generated by the M-th recovery plan determination unit to another recovery plan determination unit using the Byzantine agreement protocol via the M-th communication unit. And an M-th recovery plan consensus unit that outputs the M-th recovery plan determined based on the exchange result to the M-th recovery plan execution unit.

The information processing system according to claim 4,
The Nth processing block further includes an Nth state storage unit that holds a state obtained in the process of executing the processing of the Nth reconfigurable circuit unit,
When the M-th recovery plan execution unit executes automatic recovery processing of the recovery destination processing block, in addition to the storage information of the configuration information storage unit in the recovery destination processing block, the storage information of the state storage unit An information processing system to recover.

The information processing system according to claim 5,
The Nth state storage unit includes an Nth state saving unit that holds a state at a first time point,
The fourth recovery plan execution unit includes a data holding unit that sequentially holds data input from the outside after the first time point,
Each of the first to third recovery plan execution units,
A configuration recovery unit for recovering the storage information of the configuration information storage unit in the recovery destination processing block;
The state stored in the state saving unit in the recovery processing block is set for the state storage unit in the recovery destination processing block, and the storage information in the configuration information storage unit in the recovery destination processing block is An information processing system, comprising: a state recovery unit that sequentially outputs data held in the data holding unit in the fourth recovery plan execution unit toward the recovery destination processing block after recovery.

The information processing system according to claim 4,
The fourth diagnostic block is provided in an external transmission / reception block serving as an interface with the outside,
The fourth majority decision unit in the fourth diagnosis block determines a processing result to be output to the outside by performing a majority decision on the first to third processing results,
The external transmission / reception block is made redundant including normal use and backup use,
Switching between the normal use and the backup use is performed based on the first to fourth diagnosis information shared by the Mth sharing unit.

First to third processing blocks that execute the same processing according to a common input from the outside and output the first to third processing results, respectively;
First to third diagnostic blocks provided corresponding to the first to third processing blocks, respectively;
A fourth diagnostic block,
The Nth processing block (N = 1, 2, 3) is
An Nth configuration information storage unit for storing circuit configuration information;
Constructing a circuit according to the configuration information of the circuit and outputting the N-th processing result according to the operation of the circuit;
An Nth state storage unit that holds a state obtained in the process of executing the process of the Nth reconfigurable circuit unit,
The Mth diagnostic block (M = 1, 2, 3, 4)
When the first to third processing results are input and there is an error in any of the first to third processing results, an Mth error determination result specifying the processing result having the error is output. An M-th majority decision determination unit;
When the M-th error determination result is input, a failure is diagnosed, and if any of the first to third processing blocks is diagnosed as having a failure, a recovery destination and a recovery processing block An M-th failure diagnosis unit that generates an M-th recovery plan including the information of:
An Mth communication unit for communicating with other diagnostic blocks;
An M-th recovery plan execution unit that executes an automatic recovery process of the recovery destination processing block based on the M-th recovery plan generated by the M-th fault diagnosis unit;
The Mth failure diagnosis unit
The M-th diagnosis information reflecting the M-th error determination result is transmitted to another failure diagnosis unit via the M-th communication unit, and the diagnosis information from the other failure diagnosis unit is transmitted via the M-th communication unit. An M-th sharing unit for receiving each;
An M-th recovery plan determination unit that determines the M-th recovery plan based on the first to fourth diagnosis information shared by the M-th sharing unit;
When the M-th recovery plan execution unit executes automatic recovery processing of the recovery destination processing block, in addition to the storage information of the configuration information storage unit in the recovery destination processing block, the storage information of the state storage unit An information processing system to recover.

The information processing system according to claim 8,
The Nth state storage unit includes an Nth state saving unit that holds a state at a first time point,
The fourth recovery plan execution unit includes a data holding unit that sequentially holds data input from the outside after the first time point,
Each of the first to third recovery plan execution units,
A configuration recovery unit for recovering the storage information of the configuration information storage unit in the recovery destination processing block;
The state stored in the state saving unit in the recovery processing block is set for the state storage unit in the recovery destination processing block, and the storage information in the configuration information storage unit in the recovery destination processing block is An information processing system, comprising: a state recovery unit that sequentially outputs data held in the data holding unit in the fourth recovery plan execution unit toward the recovery destination processing block after recovery.

The information processing system according to claim 9,
The M-th failure diagnosing unit further transmits the M-th recovery plan generated by the M-th recovery plan determination unit to another recovery plan determination unit using the Byzantine agreement protocol via the M-th communication unit. And an M-th recovery plan consensus unit that outputs the M-th recovery plan determined based on the exchange result to the M-th recovery plan execution unit.

The information processing system according to claim 9,
The fourth diagnostic block is provided in an external transmission / reception block serving as an interface with the outside,
The fourth majority decision unit in the fourth diagnosis block determines a processing result to be output to the outside by performing a majority decision on the first to third processing results,
The external transmission / reception block is made redundant including normal use and backup use,
Switching between the normal use and the backup use is performed based on the contents of the first to fourth diagnostic information shared by the Mth sharing unit and the match / mismatch.