WO2015045589A1

WO2015045589A1 - Fault-tolerant system and fault-tolerant system control method

Info

Publication number: WO2015045589A1
Application number: PCT/JP2014/069305
Authority: WO
Inventors: 拓下沢; 雅徳吉田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2013-09-26
Filing date: 2014-07-22
Publication date: 2015-04-02
Anticipated expiration: 2016-03-26
Also published as: JP6100135B2; JP2015064833A

Abstract

Provided is a fault-tolerant system with a plurality of computers which form a majority voting redundancy configuration, each of which is provided with: a task communication unit (140) for acquiring process content from a program which operates in that computer (110B, 110C); and a message processing unit (142) which receives, from any of the other computers (110A) which form the majority voting redundancy configuration, a prescribed message which indicates the process content of a program which operates on the other computer (110A) and returns a replication permissibility message to the other computer (110A) according to the result of a match determination between the process content indicated by the prescribed message and the process content obtained by the task communication unit (140). The message processing unit (142) of the computer which is the other computer (110A) further executes a process for changing the number of computers (110B, 110C), aside from the other computer, for which the other computer is to await receipt of replication permissibility messages, depending on the process content obtained by the task communication unit (140) of the other computer (110A).

Description

Fault tolerant system and fault tolerant system control method

　本発明は、フォールトトレラントシステム及びフォールトトレラントシステム制御方法に関するものであり、具体的には、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性向上を可能とする技術に関する。 The present invention relates to a fault-tolerant system and a fault-tolerant system control method, and more specifically, in a fault-tolerant system composed of a plurality of computer nodes having different architectures and infrastructure software, etc., the performance difference between the computer nodes. The present invention relates to a technology that can improve the availability of a system by reducing the influence of the overall performance due to the fluctuations and preventing excessive failure determination due to such influence.

　ミッションクリティカルなコンピュータシステムにおいて、所定の冗長性を持たせることで高い可用性を確保するフォールトトレランスの概念が導入されている。そうした概念に対応する技術としては、該当システムを複数の計算機から構成し、同期動作する各計算機の出力信号を多数決チェックにて比較、監視して、不具合のある計算機をシステムから切り離すといったものが提案されている。例えば、ユーザが「インタフェースと対話して、モニタされる変数、すなわち多数決される変数と、多数決が行われるブレークポイントと、多数決および回復パラメータなどの情報と」を指定し、「指定されたブレークポイントで相異なるコピーによって生成されたユーザ指定変数のうちの１つ以上の値を比較することによって、コピーのうちの１つにフォールトが検出される」といった従来技術（特許文献１参照）がある。 In the mission critical computer system, the concept of fault tolerance has been introduced to ensure high availability by giving a predetermined redundancy. As a technology corresponding to such a concept, a system is proposed in which the corresponding system is composed of a plurality of computers, the output signals of the computers operating in synchronization are compared and monitored by majority vote, and the defective computer is separated from the system. Has been. For example, the user may specify “interacting with the interface, monitored variables, ie variables to be voted on, breakpoints on which the majority will be taken, and information such as majority and recovery parameters” and “specified breakpoints” There is a conventional technique (see Patent Document 1) in which a fault is detected in one of the copies by comparing one or more values of user-specified variables generated by different copies.

特許第３６２４１１９号公報Japanese Patent No. 3624119

　ところで、同じハードウェアで同じ基盤ソフトウェアを使用する複数の計算機ノードで構成したフォールトトレラントシステム（以下、同種ＦＴシステムという）であっても、各計算機ノードにて同じタスクを実行した場合の実行時間は、計算機ノード間でばらつきを生じる。このばらつきは、中央演算装置のアーキテクチャ、性能、基盤ソフトウェアの種類の一つあるいは複数が異なる計算機ノードを組み合わせたフォールトトレラントシステム（以下、異種ＦＴシステムという）において、計算機ノード間の性能差に由来し更に顕著なものとなる傾向にある。 By the way, even in a fault tolerant system (hereinafter referred to as the same kind of FT system) configured by a plurality of computer nodes using the same hardware and the same hardware, the execution time when the same task is executed in each computer node is Variations occur between computer nodes. This variation stems from performance differences between computer nodes in a fault-tolerant system (hereinafter referred to as a heterogeneous FT system) in which computer nodes differing in one or more of the architecture, performance, and basic software of the central processing unit. There is a tendency to become more prominent.

　あるプログラムを実行するのにかかる時間は、計算機ノードにおけるハードウェアの性能だけでなく様々な外的要因により変化するため、一般に予測が困難である。このため、同種ＦＴシステムおよび異種ＦＴシステムのいずれの場合においても、この計算機ノード間の実行時間の差を許容し、かつ、無応答あるいは大幅な処理遅延の事象に基づく障害ノードの判定とその排除を、可用性を損なわずに行う必要がある。 Since the time required to execute a certain program varies depending on various external factors as well as the hardware performance of the computer node, it is generally difficult to predict. For this reason, in both cases of the homogeneous FT system and the heterogeneous FT system, the difference in execution time between the computer nodes is allowed, and the determination of the failed node based on the event of no response or a large processing delay and its elimination Must be done without sacrificing availability.

　一方、従来技術においては、出力間で多数決を行うにあたり、ある時点での変数内容の比較を行う際に、同期的に各コピー（計算機ノード）から変数内容を収集し比較するために、必ず全ノードが該当時点で同期する必要がある。このことは、各コピーの実行時間のばらつきを該当時点で吸収する必要があるため、全体の処理効率低下を招くことにつながりかねない。また、そのようなばらつきを許容する障害判定時間の設定も、実行時間の予測困難性により困難である。 On the other hand, in the prior art, when making a majority decision between outputs, when comparing variable contents at a certain point in time, it is necessary to collect and compare variable contents from each copy (computer node) synchronously. Nodes need to be synchronized at the appropriate time. This is necessary to absorb the variation in the execution time of each copy at the corresponding time, which may lead to a decrease in the overall processing efficiency. Also, it is difficult to set a failure determination time that allows such variations due to difficulty in predicting the execution time.

　そこで本発明の目的は、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性向上を可能とする技術を提供することにある。 Accordingly, an object of the present invention is to reduce the influence of the overall performance due to the difference in performance and variation between the computer nodes in a fault tolerant system composed of a plurality of computer nodes having different architectures and basic software, etc. It is an object of the present invention to provide a technology capable of improving availability in a system by preventing excessive failure determination.

　上記課題を解決する本発明のフォールトトレラントシステムは、多数決冗長構成を形成する複数の計算機各々において、当該計算機の内部で動作するプログラムからその処理内容を取得するタスク通信部と、前記多数決冗長構成を形成するいずれかの他計算機より、前記他計算機で動作するプログラムの処理内容を示す所定メッセージを受信し、前記所定メッセージが示す処理内容と前記タスク通信部が得ている前記処理内容との一致判定の結果に応じ、前記他計算機に追従可否のメッセージを返信するメッセージ処理部とを備えており、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容に応じて、前記追従可否のメッセージの受信待ち対象とする、当該他計算機以外の計算機の数を変更する処理を更に実行するものである、ことを特徴とする。 The fault-tolerant system of the present invention that solves the above-described problem includes, in each of a plurality of computers forming a majority redundant configuration, a task communication unit that acquires the processing contents from a program operating inside the computer, and the majority redundant configuration. Receiving a predetermined message indicating the processing content of the program operating on the other computer from any other computer to be formed, and determining whether the processing content indicated by the predetermined message and the processing content obtained by the task communication unit are consistent A message processing unit that returns a message indicating whether or not follow-up is possible to the other computer according to the result, and the message processing unit of the computer that is the other computer is a process obtained by the task communication unit of the other computer. Depending on the content, change the number of computers other than the other computers that are subject to reception of the followability message. It is to a process for further execution, and wherein the.

　また、本発明のフォールトトレラント制御方法は、多数決冗長構成を形成する複数の計算機各々において、当該計算機の内部で動作するプログラムからその処理内容を取得する処理と、前記多数決冗長構成を形成するいずれかの他計算機より、前記他計算機で動作するプログラムの処理内容を示す所定メッセージを受信し、前記所定メッセージが示す処理内容と前記処理で得ている前記処理内容との一致判定の結果に応じ、前記他計算機に追従可否のメッセージを返信する処理とを実行し、更に、前記他計算機においては、当該他計算機で動作するプログラムから得ている処理内容に応じて、前記追従可否のメッセージの受信待ち対象とする、当該他計算機以外の計算機の数を変更する処理を更に実行する、ことを特徴とする。 Further, the fault tolerant control method of the present invention is a method in which, in each of a plurality of computers forming a majority voting redundant configuration, a process for obtaining the processing contents from a program operating inside the computer, and forming the majority voting redundant configuration From another computer, a predetermined message indicating the processing content of the program operating on the other computer is received, and according to a result of the matching determination between the processing content indicated by the predetermined message and the processing content obtained by the processing, Processing for returning a followability message to another computer, and, in the other computer, in accordance with the processing content obtained from the program operating on the other computer, the target of waiting for reception of the followability message Further, a process of changing the number of computers other than the other computer is further executed.

　本発明によれば、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性が向上する。 According to the present invention, in a fault-tolerant system composed of a plurality of computer nodes having different architectures, infrastructure software, and the like, the influence of the overall performance due to the performance difference and variation between the computer nodes is reduced, and this influence is also caused. By preventing excessive fault determination, availability in the system is improved.

第１の実施形態に係るフォールトトレラントシステムの構成例を示す図である。It is a figure which shows the structural example of the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムを構成する複数の計算機ノード間の関係例を示す図である。It is a figure which shows the example of a relationship between the some computer nodes which comprise the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムを構成する計算機ノードが保持する承認メッセージリストの構成例を示す図である。It is a figure which shows the structural example of the approval message list which the computer node which comprises the fault tolerant system which concerns on 1st Embodiment hold | maintains. 第１の実施形態に係るフォールトトレラントシステムを構成する計算機ノードが保持する通知メッセージリストの構成例を示す図である。It is a figure which shows the structural example of the notification message list which the computer node which comprises the fault tolerant system which concerns on 1st Embodiment hold | maintains. 第１の実施形態に係るフォールトトレラントシステムを構成する計算機ノードが保持するメッセージ種類テーブルの構成例を示す図である。It is a figure which shows the structural example of the message type table which the computer node which comprises the fault tolerant system which concerns on 1st Embodiment hold | maintains. 第１の実施形態に係るフォールトトレラントシステムを構成する複数の計算機ノード間で授受されるデータの構造例を示す図である。It is a figure which shows the structural example of the data transferred between the some computer nodes which comprise the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムにおける計算機ノードの各処理部の動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow of each process part of the computer node in the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムにおけるリーダーノードのメッセージ処理部での動作フロー例１を示す図である。It is a figure which shows the example 1 of an operation | movement flow in the message processing part of the leader node in the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムにおけるリーダーノードのメッセージ処理部での動作フロー例２を示す図である。It is a figure which shows the example 2 of an operation | movement flow in the message processing part of the leader node in the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムにおけるフォロワーノードのメッセージ処理部での動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow in the message processing part of a follower node in the fault tolerant system which concerns on 1st Embodiment. 第２の実施形態におけるフォールトトレラントシステムのフォロワーノードのメッセージ処理部での動作フロー例１を示す図である。It is a figure which shows the example 1 of an operation | movement flow in the message processing part of the follower node of the fault tolerant system in 2nd Embodiment. 第２の実施形態におけるフォールトトレラントシステムのフォロワーノードのメッセージ処理部での動作フロー例２を示す図である。It is a figure which shows the example 2 of an operation | movement flow in the message processing part of the follower node of the fault tolerant system in 2nd Embodiment. 第３の実施形態におけるフォールトトレラントシステムのリーダーノードのメッセージ処理部での動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow in the message processing part of the leader node of the fault tolerant system in 3rd Embodiment. 第４の実施形態におけるフォールトトレラントシステムのリーダーノードのメッセージ処理部での動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow in the message processing part of the leader node of the fault tolerant system in 4th Embodiment.

　関連出願の相互参照
　この出願は、２０１３年９月２６日に出願された日本特許出願、特願２０１３－１９９６１４に基づく優先権を主張し、その内容を援用する。 This application claims priority based on Japanese Patent Application No. 2013-199614 filed on Sep. 26, 2013, the contents of which are incorporated herein by reference.

－－－第１の実施形態－－－
　以下に本発明における第１の実施形態について図面を用いて詳細に説明する。第１の実施形態のフォールトトレラントシステムにおいては、多数決冗長構成におけるリーダーたる計算機およびフォロワーたる各計算機で、同一のタスクを実行する。リーダーはタスクの処理状体が含まれた通知メッセージをフォロワーに対して逐次送信し、フォロワーはそれに対して、リーダー同様に処理可能であるかどうかを承認／否認のメッセージ（追従可否のメッセージ）としてリーダーに応答する。この場合、リーダーはフォロワーからの応答すなわち上述のメッセージの内容、受信タイミングおよびメッセージの種類によって、該当タスクに関して多数決冗長構成に対応した動作（例：メッセージ待ち受け）の継続に関して判断する。 --- First embodiment ---
Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings. In the fault-tolerant system of the first embodiment, the same task is executed by the computer that is the leader and the computer that is the follower in the majority redundant configuration. The leader sequentially sends a notification message including the task processing state to the follower, and the follower accepts or rejects it as an approval / denial message (follow-up / possibility message). Respond to the leader. In this case, the leader makes a determination on the continuation of the operation corresponding to the majority redundant configuration (for example, message waiting) for the corresponding task according to the response from the follower, that is, the content of the message, the reception timing, and the message type.

　図１は、第１の実施形態に係るフォールトトレラントシステムの構成例を示す図である。図１に示すフォールトトレラントシステム１は、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性を向上させるコンピュータシステムである。 FIG. 1 is a diagram illustrating a configuration example of a fault tolerant system according to the first embodiment. The fault-tolerant system 1 shown in FIG. 1 is a fault-tolerant system composed of a plurality of computer nodes having different architectures and basic software, etc., and reduces the influence of overall performance due to performance differences and variations between computer nodes. It is a computer system that improves availability in the system by preventing excessive failure determination due to influence.

　当該フォールトトレラントシステム１は、複数の計算機ノード１１０で構成される。図１のフォールトトレラントシステム１においては、計算機ノード１１０として３台の計算機ノード１１０Ａ、１１０Ｂ、１１０Ｃでシステム構成した例を示している。勿論、フォールトトレラントシステム１を構成する計算機ノード数は「３」台である場合に限定されず、例えば「２」台や「４」台以上であってもよい。 The fault tolerant system 1 includes a plurality of computer nodes 110. In the fault tolerant system 1 of FIG. 1, an example is shown in which the system configuration includes three computer nodes 110A, 110B, and 110C as the computer node 110. Of course, the number of computer nodes constituting the fault-tolerant system 1 is not limited to “3”, and may be, for example, “2” or “4” or more.

　多数決冗長構成に対応したフォールトトレラントシステム１においては、計算機ノード１１０のうち１台をリーダーノードとし、残りの計算機ノード１１０をフォロワーノードとする。リーダーノードとフォロワーノードの役割は、計算機ノード１１０の中で固定的である必要はない。例えば、あるタスク毎、すなわち当該タスクに関する上述の通知メッセージごとに、一番最初にその通知メッセージを送信した計算機ノード１１０をリーダーノードとするなどといったアルゴリズムを用いて、それぞれの役割を交代してもよい。ただし、フォールトトレラントシステム１の動作中の任意の時点において、リーダーノードは、高々一つが存在する。リーダーノードが存在しない場合、もしくは、リーダーノードに何らかの障害が発生して処理を停止しリーダーノードが存在しなくなった場合、フォールトトレラントシステム１は処理を中断し、フォロワーノードの中から、一つの計算機ノード１１０をリーダーノードとして選ぶこととなる。こうした多数決冗長構成を成す計算機ノード中からリーダーノードを選択する処理技術については既存のものを採用してよい。なお、第１の実施形態においては、３台の計算機ノード１１０のうち、計算機ノード１１０Ａをリーダーとし、計算機ノード１１０Ｂ及び計算機ノード１１０Ｃをフォロワーとしている。 In the fault tolerant system 1 corresponding to the majority redundant configuration, one of the computer nodes 110 is a leader node and the remaining computer nodes 110 are follower nodes. The roles of the leader node and the follower node need not be fixed in the computer node 110. For example, for each task, that is, for each of the above-described notification messages related to the task, the role of each computer may be changed using an algorithm such as the computer node 110 that first transmitted the notification message as a leader node. Good. However, at any time during the operation of the fault tolerant system 1, there is at most one leader node. When the leader node does not exist, or when a failure occurs in the leader node and the process stops and the leader node does not exist, the fault tolerant system 1 interrupts the process, and one computer is selected from the follower nodes. Node 110 will be selected as the leader node. As a processing technique for selecting a leader node from among computer nodes having such a majority redundant configuration, an existing one may be adopted. In the first embodiment, among the three computer nodes 110, the computer node 110A is a leader, and the computer node 110B and the computer node 110C are followers.

　また、各計算機ノード１１０のハードウェア構成は以下の如くとなる。計算機ノード１１０は、主記憶装置１２１と１つ以上の中央演算装置１２０、外部通信インタフェース１２２、およびノード間通信インタフェース１２３を備える。このうち外部通信インタフェース１２２は、１台以上の端末１０２と外部ネットワーク１１１を介し接続されている。また、ノード間通信インタフェース１２３は、他の計算機ノード１１０とノード間ネットワーク１１２を介し接続されている。これらのネットワーク１１１、１１２は、例えばイーサネット（登録商標）規格やＩｎｆｉｎｉｂａｎｄ規格に準拠して構成されたネットワークである。 The hardware configuration of each computer node 110 is as follows. The computer node 110 includes a main storage device 121, one or more central processing units 120, an external communication interface 122, and an inter-node communication interface 123. Among these, the external communication interface 122 is connected to one or more terminals 102 via the external network 111. The inter-node communication interface 123 is connected to another computer node 110 via an inter-node network 112. These networks 111 and 112 are networks configured in conformity with, for example, the Ethernet (registered trademark) standard or the Infiniband standard.

　また、計算機ノード１１０における主記憶装置１２１は、データの読み込み及び書き込みが可能な揮発性記憶装置であり、中央演算装置１２０からアクセスされる。この主記憶装置１２１は、例えばＤＲＡＭ（Ｄｙｎａｍｉｃ　Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）等で構成されるものである。 The main storage device 121 in the computer node 110 is a volatile storage device that can read and write data, and is accessed from the central processing unit 120. The main storage device 121 is composed of, for example, a DRAM (Dynamic Random Access Memory).

　また、計算機ノード１１０における中央演算装置１２０は、汎用プロセッサであり、補助記憶装置１２４等から、基本ソフトウェア１３０、フォールトトレラントミドルウェア（以下、ＦＴミドルウェア）１３１、及びアプリケーション１３２等のプログラムを読み出して主記憶装置１２１に展開し、該当プログラムを実行する。中央演算装置１２０の採用するアーキテクチャや、その処理速度は、複数ある計算機ノード１１０で同一のものでなくてもよい。また、各計算機ノード１１０は２つ以上の中央演算装置１２０を備えていてもよい。 The central processing unit 120 in the computer node 110 is a general-purpose processor, and reads main software 130, fault-tolerant middleware (hereinafter referred to as FT middleware) 131, an application 132, and the like from the auxiliary storage device 124 and the main memory. The program is expanded on the device 121 and the program is executed. The architecture adopted by the central processing unit 120 and the processing speed thereof may not be the same among a plurality of computer nodes 110. Each computer node 110 may include two or more central processing units 120.

　また、計算機ノード１１０における補助記憶装置１２４は、永続的な不揮発性記憶装置であり、基本ソフトウェア１３０、ＦＴミドルウェア１３１、アプリケーション１３２等のプログラムに加えて、これらプログラムから読み書きされるデータ等が格納される。補助記憶装置１２４は、例えばハードディスク、フラッシュメモリなどにより構成される。 The auxiliary storage device 124 in the computer node 110 is a permanent non-volatile storage device, and stores data read / written from these programs in addition to programs such as the basic software 130, the FT middleware 131, and the application 132. The The auxiliary storage device 124 is configured by, for example, a hard disk, a flash memory, or the like.

　また、計算機ノード１１０における外部通信インタフェース１２２は、計算機ノード１１０が端末１０２と通信を行うためのインタフェースである。一方、ノード間通信インタフェース１２３は、計算機ノード１１０同士で通信を行うためのインタフェースである。なお、外部通信インタフェース１２２及びノード間通信インタフェース１２３は、個別の装置である必要はなく、一つの通信インタフェースが外部通信インタフェース１２２及びノード間通信インタフェース１２３を兼ねていてもよい。外部通信インタフェース１２２及びノード間通信インタフェース１２３は、それぞれ外部ネットワーク１１１及びノード間ネットワーク１１２の種類に応じた通信インタフェース等で構成される。 Further, the external communication interface 122 in the computer node 110 is an interface for the computer node 110 to communicate with the terminal 102. On the other hand, the inter-node communication interface 123 is an interface for performing communication between the computer nodes 110. Note that the external communication interface 122 and the inter-node communication interface 123 do not have to be separate devices, and one communication interface may also serve as the external communication interface 122 and the inter-node communication interface 123. The external communication interface 122 and the inter-node communication interface 123 are configured by communication interfaces corresponding to the types of the external network 111 and the inter-node network 112, respectively.

　また、主記憶装置１２１に保持されるアプリケーション１３２は、ＦＴシステム１が提供する機能を実現するためのプログラムである。アプリケーション１３２は、一つ以上のタスク１３３から構成される。ＦＴシステム１においては、複数の計算機ノード１１０で同じアプリケーション１３２が実行される。なおこの時、アプリケーション１３２は、実行される計算機ノード１１０のアーキテクチャや基盤ソフトウェア等の差異に応じた差異をもっていてもよい。すなわち、アプリケーション１３２およびそのタスク１３３を構成する、中央演算装置１２０のための命令群は、計算機ノード１１０間において同一である必要はない。 Further, the application 132 held in the main storage device 121 is a program for realizing the function provided by the FT system 1. The application 132 is composed of one or more tasks 133. In the FT system 1, the same application 132 is executed by a plurality of computer nodes 110. At this time, the application 132 may have a difference according to a difference in the architecture, infrastructure software, or the like of the computer node 110 to be executed. That is, the instruction group for the central processing unit 120 constituting the application 132 and its task 133 need not be the same among the computer nodes 110.

　また、主記憶装置１２１に保持される基本ソフトウェア１３０は、マルチタスク処理が可能なオペレーティングシステムであり、複数のタスク１３３およびＦＴミドルウェア１３１を並列に実行することができる。この基本ソフトウェア１３０は、ＦＴミドルウェア１３１に対して、補助記憶装置１２４に対するアクセス、外部通信インタフェース１２２及びノード間通信インタフェース１２３に対するデータ送受信の機能、主記憶装置１２１の一部分の占有領域の確保および解放等の機能を提供する。 Further, the basic software 130 held in the main storage device 121 is an operating system capable of multitask processing, and can execute a plurality of tasks 133 and FT middleware 131 in parallel. The basic software 130 accesses the FT middleware 131 to the auxiliary storage device 124, functions to transmit and receive data to the external communication interface 122 and the inter-node communication interface 123, and secures and releases a partial occupied area of the main storage device 121. Provides the functionality of

　また、主記憶装置１２１に保持されるＦＴミドルウェア１３１は、ＦＴシステム１においてフォールトトレラント機能を実現するためのプログラムである。ＦＴミドルウェア１３１は、上述のアプリケーション１３２に対して、補助記憶装置１２４に対するアクセス、外部通信インタフェース１２２に対するデータ送受信、主記憶装置１２１の一部分に対する排他処理の機能等を提供する。これらの機能について、ＦＴミドルウェア１３１は、基本ソフトウェア１３０等の提供する機能を用いて実現する。なお、ＦＴミドルウェア１３１は、独立したプログラムであっても、アプリケーション１３２に組み込まれて一体となる形であっても、あるいは、基本ソフトウェア１３０に組み込まれて一体となる形であってもよい。また、アプリケーション１３２、ＦＴミドルウェア１３１、基本ソフトウェア１３０の３つは、単一のプログラムが兼ねていてもよい。 Also, the FT middleware 131 held in the main storage device 121 is a program for realizing a fault tolerant function in the FT system 1. The FT middleware 131 provides the above-described application 132 with functions such as access to the auxiliary storage device 124, data transmission / reception with respect to the external communication interface 122, and exclusive processing with respect to a part of the main storage device 121. About these functions, the FT middleware 131 is implement | achieved using the function which the basic software 130 grade | etc., Provides. The FT middleware 131 may be an independent program, may be integrated into the application 132, or may be integrated into the basic software 130. Further, the application 132, the FT middleware 131, and the basic software 130 may serve as a single program.

　こうしたＦＴミドルウェア１３１は、タスク通信部１４０、ノード間通信部１４１、およびメッセージ処理部１４２を備えている。これらタスク通信部１４０、ノード間通信部１４１、メッセージ処理部１４２は、各計算機ノード１１０の中央演算装置１２０がＦＴミドルウェア１３１を実行することで実装される機能と言える。 The FT middleware 131 includes a task communication unit 140, an inter-node communication unit 141, and a message processing unit 142. The task communication unit 140, the inter-node communication unit 141, and the message processing unit 142 can be said to be functions that are implemented when the central processing unit 120 of each computer node 110 executes the FT middleware 131.

　このうちタスク通信部１４０は、タスク１３３からの処理の要求を受け取り、その要求をメッセージ処理部１４２に対して伝達する。一方、メッセージ処理部１４２は、他の計算機ノード１１０との同期をとった上で要求された処理を実行し、その結果をタスク通信部１４０に対して伝達する。メッセージ処理部１４２から上述の結果を得たタスク通信部１４０は、該当結果をタスク１３３に対して伝える。 Among these, the task communication unit 140 receives a processing request from the task 133 and transmits the request to the message processing unit 142. On the other hand, the message processing unit 142 executes the requested processing after synchronizing with other computer nodes 110 and transmits the result to the task communication unit 140. The task communication unit 140 that has obtained the above result from the message processing unit 142 informs the task 133 of the corresponding result.

　メッセージ処理部１４２が他の計算機ノード１１０との通信を行う際には、ノード間通信部１４１に対し、ノード間通信インタフェース１２３を経由したデータ送信を要求する。一方、ノード間通信部１４１はノード間通信インタフェース１２３から受信したデータをメッセージ処理部１４２に伝達する。こうした、タスク１３３とタスク通信部１４０、タスク通信部１４０とメッセージ処理部１４２、メッセージ処理部１４２とノード間通信部１４１の各間の伝達の手段としては、任意のものを採用できる。例えば、上述の伝達手段として、関数呼び出しとその返り値や、ソケットやパイプといった基本ソフトウェア１３０の提供する通信手段等を用いることができる。 When the message processing unit 142 communicates with another computer node 110, the inter-node communication unit 141 is requested to transmit data via the inter-node communication interface 123. On the other hand, the inter-node communication unit 141 transmits the data received from the inter-node communication interface 123 to the message processing unit 142. Arbitrary means can be adopted as means for communication between the task 133 and the task communication unit 140, the task communication unit 140 and the message processing unit 142, and the message processing unit 142 and the inter-node communication unit 141. For example, as the above-described transmission means, function calls and their return values, communication means provided by the basic software 130 such as sockets and pipes, and the like can be used.

　また、ノード間通信部１４１は、ノード間通信インタフェース１２３からデータを受信すると、その内容をメッセージ処理部１４２に伝達する。また、メッセージ処理部１４２から、他の計算機ノード１１０に対してのデータ送信の要求を受けると、その内容を指定された計算機ノード１１０に対して、ノード間通信インタフェース１２３を用いた送信を行う。 Further, when the inter-node communication unit 141 receives data from the inter-node communication interface 123, the inter-node communication unit 141 transmits the contents to the message processing unit 142. In addition, when a request for data transmission to another computer node 110 is received from the message processing unit 142, transmission is performed using the inter-node communication interface 123 to the computer node 110 whose contents are designated.

　ＦＴミドルウェア１３１では、前述した処理部１４０、１４１、１４２の他に、承認メッセージリスト１５０、通知メッセージリスト１５１、およびメッセージ種類テーブル１５２を備える。これらの保持するデータ及びその役割については後述する。なお、メッセージ種類テーブル１５２は、処理において必ずしも必要なものではない。これについても後述する。 The FT middleware 131 includes an approval message list 150, a notification message list 151, and a message type table 152 in addition to the processing units 140, 141, 142 described above. These retained data and their roles will be described later. The message type table 152 is not necessarily required for processing. This will also be described later.

　続いて、ＦＴシステム１を構成する各計算機ノード１１０の間の関係について説明する。図２は、第１の実施形態に係るＦＴシステム１を構成する複数の計算機ノード間の関係例を示す図であり、ＦＴシステム１がフォールトトレラント機能を実現するために必要な各部分の関係例について示した図である。ＦＴシステム１の実現するフォールトトレラント機能は、半数未満の計算機ノード１１０のハードウェアあるいはソフトウェアの障害が発生しても、タスク１３３の機能を継続して提供できる機能である。 Subsequently, the relationship between the computer nodes 110 constituting the FT system 1 will be described. FIG. 2 is a diagram illustrating a relationship example between a plurality of computer nodes constituting the FT system 1 according to the first embodiment, and a relationship example of each part necessary for the FT system 1 to realize a fault tolerant function. It is the figure shown about. The fault tolerant function realized by the FT system 1 is a function that can continuously provide the function of the task 133 even if a hardware or software failure of less than half of the computer nodes 110 occurs.

　そのために、先に説明したように、複数の計算機ノード１１０において同じタスク１３３を動作させる。タスク１３３が外部からの入力あるいは外部への出力を行う時、あるいは、複数のスレッドの動作順序を決定する時など、動作を計算機ノード１１０間で一致化させなければならない処理（以下、「動作一致化点」と呼ぶ）において、計算機ノード１１０間でメッセージをやりとりして、全ての計算機ノード１１０でタスク１３３が同一の挙動を示すようにする。 Therefore, as described above, the same task 133 is operated in a plurality of computer nodes 110. When the task 133 performs input from the outside or output to the outside, or when determining the operation order of a plurality of threads, the processing in which operations must be matched between the computer nodes 110 (hereinafter referred to as “operation matching”). In this case, messages are exchanged between the computer nodes 110 so that the task 133 shows the same behavior in all the computer nodes 110.

　このために、各タスク１３３は、各動作一致化点において、ＦＴミドルウェア１３１を呼び出し、その処理の実行を要求する。一方、ＦＴミドルウェア１３１は、その内容を他の計算機ノード１１０のタスク１３３の要求と比較し、過半数の計算機ノード１１０に関して同一であれば、それを実際に実行する。これにより、計算機ノード１１０の半数未満が、ハードウェア障害やソフトウェア障害によって無応答、応答遅延、あるいは、異常応答を行った場合でも、他の計算機ノード１１０の結果により、タスク１３３の動作を継続することができる。 For this purpose, each task 133 calls the FT middleware 131 at each operation matching point and requests execution of the process. On the other hand, the FT middleware 131 compares the content with the request of the task 133 of the other computer node 110, and if it is the same for the majority of the computer nodes 110, actually executes it. As a result, even if less than half of the computer nodes 110 perform no response, response delay, or abnormal response due to hardware failure or software failure, the operation of the task 133 is continued according to the result of the other computer node 110. be able to.

　本実施形態では、上述したように、動作一致化点において計算機ノード１１０間で合意形成の効率化のために、計算機ノード１１０を、役割別にリーダーノード１１０Ａとフォロワーノード１１０Ｂ、１１０Ｃの二種に分けるものとする。動作一致化点において、リーダーノード１１０Ａが、その動作を決定し、フォロワーノード１１０Ｂ、１１０Ｃに対して、その処理要求に対する動作内容を通知メッセージ２０１により通知する。フォロワーノード１１０Ｂ、１１０Ｃは、自身のタスク１３３からの処理要求と照合し、この動作内容に追従可能、すなわちリーダーノード１１０Ａと同じ挙動を実現することが可能であれば承認メッセージ２０２、不可能であれば否認メッセージ２０３をリーダーノード１１０Ａに返答する。リーダーノード１１０Ａは、承認メッセージ２０２を多数決における賛成票、否認メッセージ２０３を反対票として扱い、過半数が承認メッセージ２０２を受け取った時には、これを実際に実行する。このような多数決冗長構成に伴うリーダーノード１１０Ａ、フォロワーノード１１０Ｂ、１１０Ｃの基本的な挙動に関しては従来と同様である。 In the present embodiment, as described above, the computer node 110 is divided into two types, the leader node 110A and the follower nodes 110B and 110C, according to roles, in order to increase the efficiency of consensus formation between the computer nodes 110 at the operation matching point. Shall. At the operation matching point, the leader node 110A determines the operation, and notifies the follower nodes 110B and 110C of the operation content corresponding to the processing request by the notification message 201. The follower nodes 110B and 110C collate with the processing request from their own task 133, and can follow this operation content, that is, if it is possible to realize the same behavior as the leader node 110A, the approval message 202 is impossible. If the message is rejected, a reply message 203 is returned to the leader node 110A. The leader node 110A treats the approval message 202 as an approval vote in the majority vote and the denial message 203 as a negative vote, and when the majority receives the approval message 202, it actually executes this. The basic behavior of the leader node 110A and the follower nodes 110B and 110C accompanying such a majority vote redundant configuration is the same as that of the prior art.

　続いて、ＦＴミドルウェア１３１内の構造であるタスク通信部１４０、メッセージ処理部１４２、ノード間通信部１４１の動作と、その際に使用されるデータ構造である承認メッセージリスト１５０、通知メッセージリスト１５１、およびメッセージ種類テーブル１５２について述べる。 Subsequently, operations of the task communication unit 140, the message processing unit 142, and the inter-node communication unit 141, which are structures in the FT middleware 131, and an approval message list 150, a notification message list 151, which are data structures used at that time, The message type table 152 will be described.

　図３、図４、図５は、承認メッセージリスト１５０、通知メッセージリスト１５１、およびメッセージ種類テーブル１５２の内部構造の一例をそれぞれ示した図である。このうち、承認メッセージリスト１５０及び通知メッセージリスト１５１は、メッセージ処理部１４２によって下記のように使用される。メッセージ処理部１４２は、タスク通信部１４０から処理要求を受け取ったとき、当該計算機ノード１１０がリーダーノードであれば、これを他の計算機ノードすなわちフォロワーノードに対して伝達する。メッセージ処理部１４２は、フォロワーノードからの承認メッセージや否認メッセージの受信の有無を管理するために、承認メッセージリスト１５０に承認メッセージ、否認メッセージの受信履歴を記録する。他方、フォロワーノードでは、リーダーノードからの通知メッセージの受信の有無を管理するために、通知メッセージ２０１のリストである通知メッセージリスト１５１を使用する。 3, 4, and 5 are diagrams illustrating examples of internal structures of the approval message list 150, the notification message list 151, and the message type table 152, respectively. Among these, the approval message list 150 and the notification message list 151 are used by the message processing unit 142 as follows. When the message processing unit 142 receives a processing request from the task communication unit 140, if the computer node 110 is a leader node, the message processing unit 142 transmits this request to another computer node, that is, a follower node. The message processing unit 142 records the reception history of the approval message and the denial message in the approval message list 150 in order to manage whether or not the approval message or the denial message is received from the follower node. On the other hand, the follower node uses a notification message list 151 that is a list of notification messages 201 in order to manage whether or not a notification message is received from the leader node.

　承認メッセージリスト１５０は、通知参照３００、承認ノード集合３０１、及び、否認ノード集合３０２などの各値を各行として持つテーブルである。通知参照３００は、その行の管理するところの通知メッセージ２０１への参照（参照は、識別子によるもの、メモリ番地によるもの、あるいは、参照先のコピーによるものなど任意の方法でよい。以下同様）用の情報で通知メッセージ２０１の識別情報となる。また、承認ノード集合３０１は、通知メッセージ２０１に対して承認メッセージ２０２を返答したフォロワーノードたる計算機ノード１１０の集合を表す。同様に、否認ノード集合３０２は、通知メッセージ２０１に対して否認メッセージ２０３を返答したフォロワーノードたる計算機ノード１１０の集合を表す。 The approval message list 150 is a table having values such as a notification reference 300, an approval node set 301, and a denial node set 302 as rows. The notification reference 300 is used for reference to the notification message 201 managed by the row (reference may be any method such as by identifier, by memory address, or by reference destination copy, and so on). This information becomes the identification information of the notification message 201. An approval node set 301 represents a set of computer nodes 110 that are follower nodes that have returned the approval message 202 in response to the notification message 201. Similarly, the denial node set 302 represents a set of computer nodes 110 that are follower nodes that have returned the denial message 203 to the notification message 201.

　また、通知メッセージリスト１５１は、リーダーノードから送付されていて、かつ、承認メッセージ２０２もしくは否認メッセージ２０３を返答していない０個以上の通知メッセージ２０１をリストとして持つ。 Further, the notification message list 151 has zero or more notification messages 201 sent from the leader node and not responding with the approval message 202 or the denial message 203 as a list.

　次に、タスク通信部１４０とメッセージ処理部１４２への処理の要求及びその結果の伝達に用いられるデータ構造について述べる。以下では、「処理要求」とはこのタスク通信部１４０からメッセージ処理部１４２に対して、タスクからの処理要求を伝達するデータ構造を指し、「処理結果」とはメッセージ処理部１４２からタスク通信部１４０に対して、処理要求に対する処理の結果（処理内容）を伝達するデータ構造を指す。処理要求の内部構造の一例としては、要求種別と、それに応じた要求パラメータがある。他方、処理結果の内部構造の一例としては、対応している処理要求への処理要求参照と、処理の結果を表す値である処理結果値がある。なお、タスク通信部１４０とメッセージ処理部１４２の間の伝達方式により、参照する処理が一意に特定できるのであれば、処理要求参照を省略することができる。 Next, the processing structure to the task communication unit 140 and the message processing unit 142 and the data structure used for transmitting the result will be described. Hereinafter, “processing request” refers to a data structure for transmitting a processing request from a task to the message processing unit 142 from the task communication unit 140, and “processing result” refers to a task communication unit from the message processing unit 142. Reference numeral 140 denotes a data structure for transmitting a processing result (processing content) in response to a processing request. As an example of the internal structure of the processing request, there are a request type and a request parameter corresponding to the request type. On the other hand, as an example of the internal structure of a processing result, there are a processing request reference to a corresponding processing request and a processing result value that is a value representing the processing result. If the process to be referred to can be uniquely specified by the transmission method between the task communication unit 140 and the message processing unit 142, the process request reference can be omitted.

　処理要求において、要求種別は、タスク１３３のＦＴミドルウェア１３１に対する要求の種別を示し、例えば、補助記憶装置１２４からの読み込みや補助記憶装置１２４への書き込み、外部通信インタフェース１２２からのデータ受信、あるいはデータ送信、主記憶装置１２１の一部領域に対する排他処理といったものである。一方、要求パラメータは、それぞれ要求種別により意味を異とする値であり、またその数も要求種別に応じて異なる。要求種別が「補助記憶装置１２４からの読み込み」であった場合を一例とすれば、補助記憶装置１２４上のファイル名、ファイル上の位置、読み込んだ値を格納する主記憶装置１２１の位置、読み込むバイト数等が、要求パラメータに含まれる。 In the processing request, the request type indicates the type of request to the FT middleware 131 of the task 133, for example, reading from the auxiliary storage device 124, writing to the auxiliary storage device 124, receiving data from the external communication interface 122, or data Transmission, exclusive processing for a partial area of the main storage device 121, and the like. On the other hand, the request parameters are values having different meanings depending on the request type, and the number thereof also differs depending on the request type. Taking the case where the request type is “read from auxiliary storage device 124” as an example, the file name on the auxiliary storage device 124, the position on the file, the position of the main storage device 121 that stores the read value, and the read The number of bytes is included in the request parameter.

　また処理結果は、タスク通信部１４０からメッセージ処理部１４２に対してそれまでに通知された処理要求の処理結果を、メッセージ処理部１４２からタスク通信部１４０に通知するために使用される。処理結果値は、処理要求参照により参照される処理要求の要求種別により意味を異とする値であり、またその数も要求種別に応じて異なる。なお、この処理結果値は、要求に対する結果を全て表現する必要はない。たとえば外部通信インタフェース１２２からデータを受信し指定した主記憶装置１２１へのコピーといった副作用を持つ処理要求であれば、メッセージ処理部１４２は、指定したデータ内容の主記憶装置１２１へのコピー処理を行った上で、処理結果をタスク通信部１４０に通知する。このような場合において、受信したデータのサイズのみを処理結果値として返し、受信データそのものの情報は、処理結果値には陽に含めないということができる。 Further, the processing result is used to notify the task communication unit 140 from the message processing unit 142 of the processing result of the processing request notified so far from the task communication unit 140 to the message processing unit 142. The processing result value is a value having a different meaning depending on the request type of the processing request referred to by the processing request reference, and the number thereof also differs depending on the request type. Note that this processing result value does not need to express all the results for the request. For example, if the processing request has a side effect such as receiving data from the external communication interface 122 and copying to the designated main storage device 121, the message processing unit 142 performs processing for copying the designated data content to the main storage device 121. After that, the processing result is notified to the task communication unit 140. In such a case, it can be said that only the size of the received data is returned as the processing result value, and the information of the received data itself is not explicitly included in the processing result value.

　図６にて、リーダーノードとフォロワーノードの間で、処理要求の伝達を行う際に用いられる通知メッセージ２０１及び、それに対する応答である承認メッセージ２０２と否認メッセージ２０３　のデータ構造の一例を示す。ここで通知メッセージ２０１は、伝達するリクエストへの参照値であり、処理要求の種別を規定した処理要求４００をもつ。処理要求への参照値に加えて、あるいはそれに代えて、後述するフォロワーノードにおける内容比較のために十分な内容、すなわちその処理要求について挙動が同一であると判定するのに十分な情報を含む任意の付随データ４０６を含んでいてもよい。 FIG. 6 shows an example of the data structure of the notification message 201 used when the processing request is transmitted between the leader node and the follower node, and the approval message 202 and the denial message 203 as responses thereto. Here, the notification message 201 is a reference value to the request to be transmitted, and has a processing request 400 that defines the type of processing request. In addition to or instead of the reference value for the processing request, any content that is sufficient for content comparison in the follower node described later, that is, information that is sufficient to determine that the processing request has the same behavior The accompanying data 406 may be included.

　また、承認メッセージ２０２及び否認メッセージ２０３は、対応するリーダーノードから送付された通知メッセージ２０１を参照するための値である通知参照３００を持つ。なお、通知メッセージ２０１を一意に識別できるのであれば、処理要求の参照で代用することもできる。 Also, the approval message 202 and the denial message 203 have a notification reference 300 that is a value for referring to the notification message 201 sent from the corresponding leader node. If the notification message 201 can be uniquely identified, it can be substituted by referring to the processing request.

　続いて図５に示すメッセージ種類テーブル１５２について述べる。メッセージ処理部１４２は、後述するように各処理要求に対して、メッセージ同期種類を判定する。メッセージ同期種類は、例えば「同期」、「準同期」のいずれかであり、その処理要求に対するメッセージの応答あるいは到着に対して、メッセージ処理部１４２がいつ次の処理に進むかの処理を規定するものである。例えば、リーダーノードにおけるメッセージ処理部１４２は、そのメッセージ同期種類が「同期」である処理要求に対しては、現在ＦＴシステム１で正常に動作している全ての計算機ノード１１０からの応答を待ってから次の処理に進み、「準同期」であるメッセージに対しては、現在ＦＴシステム１で正常に動作している計算機ノード１１０の半数の承認メッセージ２０２を待ってから次の処理に進む、といった動作を行う。この判断の方法の一例として、メッセージ種類テーブル１５２を用いた方法がある。　このメッセージ種類テーブル１５２は、上述した処理要求について、その要求種別３０３からメッセージ同期種類３０４を判断するものである。メッセージ種類テーブル１５２は、要求種別３０３とメッセージ同期種類３０４などの各値を各行として持つテーブルである。各行は、要求種別３０３をもつ処理要求、および、それを処理要求参照としてもつ通知メッセージ２０１のメッセージ待ちの挙動の種類が、メッセージ同期種類３０４であることを意味する。なお、メッセージ同期種類の判定は、この方法に限定されない。例えば、メッセージ種類テーブル１５２が存在しない場合であっても、計算機ノード１１０がメッセージにそのメッセージを一意に識別する連番を振り、ある倍数の番号のメッセージを「同期」とし、それ以外を「準同期」とするといった方法をとることもできる。 Next, the message type table 152 shown in FIG. 5 will be described. The message processing unit 142 determines the message synchronization type for each processing request as will be described later. The message synchronization type is, for example, “synchronous” or “semi-synchronous”, and specifies the process when the message processing unit 142 proceeds to the next process in response to the response or arrival of the message in response to the processing request. Is. For example, the message processing unit 142 in the leader node waits for responses from all the computer nodes 110 currently operating normally in the FT system 1 in response to a processing request whose message synchronization type is “synchronous”. The process proceeds to the next process, and for a message that is “semi-synchronous”, it waits for half of the approval messages 202 of the computer nodes 110 that are currently operating normally in the FT system 1 and then proceeds to the next process. Perform the action. As an example of this determination method, there is a method using the message type table 152. This message type table 152 is for determining the message synchronization type 304 from the request type 303 for the processing request described above. The message type table 152 is a table having values such as the request type 303 and the message synchronization type 304 as rows. Each line means that the type of message waiting behavior of the processing request having the request type 303 and the notification message 201 having the processing type as a processing request reference is the message synchronization type 304. Note that the determination of the message synchronization type is not limited to this method. For example, even when the message type table 152 does not exist, the computer node 110 assigns a serial number that uniquely identifies the message to the message, sets a message of a certain multiple number as “synchronized”, and sets the other as “semi-standard”. A method such as “synchronization” can also be used.

　続いて、ＦＴシステム１を構成する各計算機ノード１１０Ａ、１１０Ｂ、１１０Ｃの動作と連携について説明する。図７は第１の実施形態に係るＦＴシステム１における計算機ノード１１０の各処理部の動作フロー例を示す図である。この図においては、リーダーノードとして計算機ノード１１０Ａ、フォロワーノードとして計算機ノード１１０Ｂの動作を示している。この他のフォロワーノード１１０Ｃについては、計算機ノード１１０Ｂと同様の動作を行うので説明を省略している。 Subsequently, the operation and cooperation of the computer nodes 110A, 110B, and 110C constituting the FT system 1 will be described. FIG. 7 is a diagram illustrating an example of an operation flow of each processing unit of the computer node 110 in the FT system 1 according to the first embodiment. In this figure, the operation of the computer node 110A as the leader node and the computer node 110B as the follower node are shown. Since the other follower node 110C performs the same operation as the computer node 110B, description thereof is omitted.

　まず、リーダーノードたる計算機ノード１１０Ａの動作について述べる。計算機ノード１１０Ａのタスク通信部１４０は、アプリケーション１３２から要求を受け（５１０）、これを処理要求４００の形でメッセージ処理部１４２に伝達する（５１１）。メッセージ処理部１４２は、この処理要求４００に応じて必要であれば適宜処理を行い、その処理要求４００等を含む通知メッセージ２０１を、ノード間通信部１４１に依頼してフォロワーノードに対して送信する（５１２）。 First, the operation of the computer node 110A as a leader node will be described. The task communication unit 140 of the computer node 110A receives a request from the application 132 (510), and transmits this request to the message processing unit 142 in the form of a processing request 400 (511). The message processing unit 142 performs processing as needed according to the processing request 400, and sends a notification message 201 including the processing request 400 to the follower node by requesting the inter-node communication unit 141. (512).

　次に計算機ノード１１０Ａのメッセージ処理部１４２は、タスク通信部１４０から受けた上述の処理要求４００について、メッセージ同期種類を判断する（５１３）。メッセージ処理部１４２は、ステップ５１３で判断したメッセージ同期種類に応じた、フォロワーノードたる計算機ノード１１０Ｂ、１１０Ｃからの承認メッセージ２０２あるいは否認メッセージ２０３の応答待ちを行う（５１４）。この時、計算機ノード１１０Ａのメッセージ処理部１４２は、すべてのフォロワーノードからの応答について受信待ち（メッセージ同期種類＝「同期」）、半数のフォロワーノードからの承認メッセージ２０２について受信待ち（メッセージ同期種類＝「準同期」）といった、メッセージ同期種類に応じた待機を行う。こうした応答待ちの後、メッセージ処理部１４２は、各フォロワーノードより得た該当タスクの処理結果に関して多数決処理を行い、タスク通信部１４０に対して、最終的に一致した結果を処理結果として伝達する（５１５）。タスク通信部１４０は、この結果をアプリケーション１３２に通知する（５１６）。 Next, the message processing unit 142 of the computer node 110A determines the message synchronization type for the processing request 400 received from the task communication unit 140 (513). The message processing unit 142 waits for a response of the approval message 202 or the denial message 203 from the computer nodes 110B and 110C, which are follower nodes, according to the message synchronization type determined in step 513 (514). At this time, the message processing unit 142 of the computer node 110A is waiting to receive responses from all follower nodes (message synchronization type = “synchronization”), and is waiting to receive approval messages 202 from half of the follower nodes (message synchronization type = Wait according to the message synchronization type, such as “semi-synchronous”). After waiting for the response, the message processing unit 142 performs majority processing on the processing result of the corresponding task obtained from each follower node, and transmits the finally matched result to the task communication unit 140 as the processing result ( 515). The task communication unit 140 notifies the result to the application 132 (516).

　他方、フォロワーノードたる計算機ノード１１０Ｂの動作について説明する。なお、フォロワーノードである計算機ノード１１０Ｂのタスク通信部１４０の動作は、リーダーノードたる計算機ノード１１０Ａの場合と同様であるので説明を省略する。フォロワーノードたる計算機ノード１１０Ｂのメッセージ処理部１４２は、リーダーノードたる計算機ノード１１０Ａからの通知メッセージ２０１を待つ（５１７）。リーダーノードから通知メッセージ２０１が送信されてきた場合、計算機ノード１１０Ｂのメッセージ処理部１４２は、リーダーノードから受信した通知メッセージ２０１と、自身のタスク通信部１４０でタスク１３３から受けている処理要求の内容とを比較し、リーダーノードの挙動に追従可能（すなわち同じタスクについて処理を同期的に実行可能）かどうかを判断し、追従可能であれば承認メッセージ２０２を、追従不可であれば否認メッセージ２０３をノード間通信部１４１経由でリーダーノードに送信する（５１９）。 On the other hand, the operation of the computer node 110B as a follower node will be described. Note that the operation of the task communication unit 140 of the computer node 110B that is the follower node is the same as that of the computer node 110A that is the leader node, and thus description thereof is omitted. The message processing unit 142 of the computer node 110B as the follower node waits for the notification message 201 from the computer node 110A as the leader node (517). When the notification message 201 is transmitted from the leader node, the message processing unit 142 of the computer node 110B receives the notification message 201 received from the leader node and the content of the processing request received from the task 133 by its own task communication unit 140. To determine whether it is possible to follow the behavior of the leader node (that is, processing can be executed synchronously for the same task). The data is transmitted to the leader node via the inter-node communication unit 141 (519).

　以下では、図７の点線で囲んだフローについて、その詳細を述べる。リーダーノードのメッセージ処理部１４２のフロー（５０１）について図８及び図９で、フォロワーノードのメッセージ処理部１４２のフロー（５０２）について図１０でそれぞれ述べる。 Below, the details of the flow enclosed by the dotted line in FIG. 7 will be described. The flow (501) of the message processing unit 142 of the leader node will be described with reference to FIGS. 8 and 9, and the flow (502) of the message processing unit 142 of the follower node will be described with reference to FIG.

　図８、図９は第１の実施形態に係るフォールトトレラントシステムにおけるリーダーノードのメッセージ処理部での動作フロー例１、２を示す図であり、具体的には、リーダーノードのメッセージ処理部１４２が、タスク通信部１４０から処理要求を受け、この処理要求に対応する処理を行い、その結果をタスク通信部１４０に返す処理の一例を示したフローチャートである。 8 and 9 are diagrams showing operation flow examples 1 and 2 in the message processing unit of the leader node in the fault tolerant system according to the first embodiment. Specifically, the message processing unit 142 of the leader node has 5 is a flowchart showing an example of processing for receiving a processing request from the task communication unit 140, performing processing corresponding to the processing request, and returning the result to the task communication unit 140.

　この場合、リーダーノードのメッセージ処理部１４２は、まずタスク通信部１４０から処理要求を受け取る（６００）と、外部に影響がある場合、すなわち、外部通信インタフェース１２２からの送信などの場合を除いて、この処理要求に対応する処理を行う（６０１）。このときメッセージ処理部１４２は、必要であれば要求付随データ（図６の付随データ４０６に対応）を生成する。メッセージ処理部１４２は、これをもとに、受信した処理要求を参照する通知メッセージ２０１を生成し、要求付随データがあればそれを加え、全フォロワーノード（計算機ノード１１０Ｂ及び計算機ノード１１０Ｃ）に対してノード間通信部１４１を介して送信する（６０２）。 In this case, the message processing unit 142 of the leader node first receives the processing request from the task communication unit 140 (600), and when there is an external influence, that is, except for the case of transmission from the external communication interface 122, Processing corresponding to this processing request is performed (601). At this time, the message processing unit 142 generates request accompanying data (corresponding to the accompanying data 406 in FIG. 6) if necessary. Based on this, the message processing unit 142 generates a notification message 201 that refers to the received processing request, adds any data associated with the request, and adds it to all follower nodes (computer node 110B and computer node 110C). And transmitted via the inter-node communication unit 141 (602).

　次にメッセージ処理部１４２は、ＦＴミドルウェア１３１における承認メッセージリスト１５０に、該当通知メッセージ２０１に対応する、すなわち通知参照３００として当該通知メッセージ２０１を含む行を追加する（６０３）。次にメッセージ処理部１４２は、ステップ６００で受けた処理要求について、メッセージ同期種類を判定する（６０４）。 Next, the message processing unit 142 adds a line corresponding to the corresponding notification message 201 to the approval message list 150 in the FT middleware 131, that is, including the notification message 201 as the notification reference 300 (603). Next, the message processing unit 142 determines the message synchronization type for the processing request received in step 600 (604).

　ステップ６０４において、メッセージ処理部１４２は、例えばＦＴミドルウェア１３１におけるメッセージ種類テーブル１５２に対し、上述の処理要求の情報を照合することにより、メッセージ種類テーブル１５２の各レコードのうち、要求種別４００の値が、ステップ６００で受けた処理要求（例：sync,Network_Send,Network_Recv,Lock_Acquire）の値にマッチした行を特定し、該当処理要求のメッセージ同期種類３０４を判断する。但し、既に上述したように、メッセージ同期種類を判断する手法は、メッセージ種類テーブル１５２を用いたこの方法に限定されず、メッセージ同期種類を一つに決定できる方法であればいずれのものも採用できる。 In step 604, the message processing unit 142 collates the above-described processing request information with the message type table 152 in the FT middleware 131, for example, so that the value of the request type 400 in each record of the message type table 152 is set. The line that matches the value of the processing request received in step 600 (eg, sync, Network_Send, Network_Recv, Lock_Acquire) is identified, and the message synchronization type 304 of the corresponding processing request is determined. However, as described above, the method for determining the message synchronization type is not limited to this method using the message type table 152, and any method can be adopted as long as it can determine the message synchronization type as one. .

　上述のステップ６０４の結果、ステップ６００で受けた処理要求のメッセージ同期種類が「同期」であったとする（６０４：同期）。このとき、リーダーノードのメッセージ処理部１４２は、その時点で動作している全てのフォロワーノードからの応答を、該当処理要求に関する動作の継続に必要とする。すなわちリーダーノードのメッセージ処理部１４２は、全フォロワーノード（計算機ノード１１０Ｂ及び計算機ノード１１０Ｃ）からの承認メッセージ２０２あるいは否認メッセージ２０３を待つ（６０５）。 As a result of step 604 described above, it is assumed that the message synchronization type of the processing request received in step 600 is “synchronization” (604: synchronization). At this time, the message processing unit 142 of the leader node requires responses from all the follower nodes operating at that time in order to continue the operation related to the processing request. That is, the message processing unit 142 of the leader node waits for an approval message 202 or a denial message 203 from all follower nodes (computer node 110B and computer node 110C) (605).

　その後、メッセージ処理部１４２は、承認メッセージ２０２の受信をノード間通信部１４１から通知された（以下、簡単に「受信した」とする）場合（６０６：Ａｃｋ）、承認メッセージリスト１５０の当該行（該当処理要求に関する通知メッセージ２０１に関してステップ６０３で追加した行）において、承認ノード集合３０１に、送信元の計算機ノード１１０（すなわち該当通知メッセージ２０１に対して承認メッセージ２０２を返答したフォロワーノード）の識別情報を追加する（６０７）。 Thereafter, when the message processing unit 142 is notified of the reception of the approval message 202 from the inter-node communication unit 141 (hereinafter simply referred to as “received”) (606: Ack), the message processing unit 142 in the approval message list 150 ( (The row added in step 603 regarding the notification message 201 related to the corresponding processing request), the identification information of the computer node 110 of the transmission source (that is, the follower node that returned the approval message 202 to the notification message 201) in the approval node set 301. Is added (607).

　他方、メッセージ処理部１４２は、否認メッセージ２０３をフォロワーノードから受信した場合（６０６：Ｎａｃｋ）、同様に、承認メッセージリスト１５０の当該行において、否認ノード集合３０２に、送信元の計算機ノード１１０（すなわち該当通知メッセージ２０１に対して否認メッセージ２０２を返答したフォロワーノード）の識別情報を追加する（６０８）。 On the other hand, when the message processing unit 142 receives the denial message 203 from the follower node (606: Nack), similarly, in the corresponding row of the approval message list 150, the message processing unit 142 adds the denial node set 302 to the transmission source computer node 110 (that is, The identification information of the follower node that responded with the denial message 202 to the notification message 201 is added (608).

　リーダーノードのメッセージ処理部１４２は、全てのフォロワーノードが承認メッセージリスト１５０における承認ノード集合３０１あるいは否認ノード集合３０２のいずれかに入るまで、これを繰り返す（６０９）。これにより、リーダーノードのメッセージ処理部１４２は、該当メッセージ２０１に対応する処理要求を実行する時点まで、すべてのフォロワーノードのタスク１３３が到達したことを確認できる。 The message processing unit 142 of the leader node repeats this until all follower nodes enter either the approval node set 301 or the denial node set 302 in the approval message list 150 (609). As a result, the message processing unit 142 of the leader node can confirm that the tasks 133 of all the follower nodes have arrived until the time when the processing request corresponding to the message 201 is executed.

　ステップ６０９において、全てのフォロワーノードが承認メッセージリスト１５０における承認ノード集合３０１あるいは否認ノード集合３０２のいずれかに入ったと判定したならば（６０９：Ｙｅｓ）、リーダーノードのメッセージ処理部１４２は、同タスクに関する各計算機ノードでの実行結果について、多数決の原理に従って、処理を継続するかどうか判定する（６１０）。承認メッセージ２０２を送信した計算機ノードが否認メッセージより多数、ないしは否認メッセージと同数である場合（６１０：Ｙｅｓ）、リーダーノードのメッセージ処理部１４２は、否認メッセージ２０３を送信したノードを停止させる。これは、リーダーノード自身は当然この通知メッセージ２０１の内容に同意しているためで、承認メッセージ２０２を送信したフォロワーノードと否認メッセージ２０３を送信したフォロワーノードが同数であれば、過半数のノードが同意していることになる。承認メッセージ２０２を送信したノードが多数である場合に、まだ処理要求に対応する処理を実行していない場合には、この時点においては過半数の計算機ノードの同意が得られているため、リーダーノードはこれを実行する（６１１）。最後に、リーダーノードのメッセージ処理部１４２は、ステップ６１１の処理結果を生成して（６１２）、それをタスク通信部１４０に伝達し（６１３）、タスク通信部１４０からの次の処理要求を待つ処理（６００）に戻る。 If it is determined in step 609 that all follower nodes have entered either the approval node set 301 or the denial node set 302 in the approval message list 150 (609: Yes), the message processing unit 142 of the leader node performs the same task. It is determined whether or not the processing is to be continued according to the principle of majority vote for the execution result in each computer node (610). When the number of computer nodes that transmitted the approval message 202 is greater than or equal to the number of rejection messages (610: Yes), the message processing unit 142 of the leader node stops the node that transmitted the rejection message 203. This is because the leader node itself naturally agrees with the contents of the notification message 201. If the number of follower nodes that transmitted the approval message 202 and the number of follower nodes that transmitted the denial message 203 are the same, the majority of the nodes agree. Will be. If there are a large number of nodes that have sent the approval message 202 and the processing corresponding to the processing request has not yet been executed, the agreement of the majority of the computer nodes has been obtained at this point, so the leader node This is executed (611). Finally, the message processing unit 142 of the leader node generates the processing result of step 611 (612), transmits it to the task communication unit 140 (613), and waits for the next processing request from the task communication unit 140. Return to processing (600).

　他方、否認メッセージ２０３を送信したノードのほうが多数である場合には（６１０：Ｎｏ）、リーダーノード自身に障害が発生したとみなせるので、リーダーノードのメッセージ処理部１４２は当該リーダーノード自身の動作を停止するなどの障害時の処理を実行する（６１４）。 On the other hand, when the number of nodes that transmitted the denial message 203 is larger (610: No), it can be considered that a failure has occurred in the leader node itself, so the message processing unit 142 of the leader node performs the operation of the leader node itself. Processing at the time of failure such as stopping is executed (614).

　一方、上述のステップ６０４の結果、メッセージ同期種類が「準同期」であったとする（６０４：準同期）。このとき、リーダーノードは、その時点で動作しているフォロワーノードのうち過半数からの応答を、該当タスクに関する動作の継続に必要とする。動作は、承認メッセージ２０２や否認メッセージ２０３の受信に関しては、前段の場合と同様であるが（６１６，６１７，６１８，６２０）、条件が異なり、フォロワーノードの半数以上が承認ノード集合３０１に加わるか、全てのフォロワーノードが承認ノード集合３０１あるいは否認ノード集合３０２のいずれかに入るまでこれを繰り返す（６１９）。リーダノード自身は、この決断に当然同意しているので、フォロワーノードの半数以上が承認メッセージ２０２を送信したということは、過半数のノードが応答したということになる。その後の多数決結果の判定処理（６１０）以降の処理は、前段の場合と同様である。なお、リーダーノードが処理を進めた後も、残りのフォロワーノードは、承認メッセージ２０２あるいは否認メッセージ２０３を送信する可能性がある。このために、リーダーノードのメッセージ処理部１４２は、これらを受け取った場合に、該当する承認ノード集合３０１あるいは否認ノード集合３０２に加わる操作を随時行う。否認メッセージ２０３を受信した場合には、これが少数派であることがわかるので、ただちにそのノードを停止させる。 On the other hand, it is assumed that the message synchronization type is “quasi-synchronization” as a result of the above-described step 604 (604: quasi-synchronization). At this time, the leader node requires a response from a majority of the follower nodes operating at that time in order to continue the operation related to the task. The operation is the same as that in the previous stage with respect to the reception of the approval message 202 and the denial message 203 (616, 617, 618, 620), but the conditions are different and whether more than half of the follower nodes are added to the approval node set 301. This is repeated until all follower nodes enter either the approval node set 301 or the denial node set 302 (619). The leader nodes themselves naturally agree with this decision, so that more than half of the follower nodes have sent the approval message 202 means that a majority of the nodes have responded. Subsequent majority decision result determination processing (610) and subsequent processing is the same as in the previous stage. Even after the leader node advances the processing, the remaining follower nodes may transmit the approval message 202 or the denial message 203. For this reason, when receiving these, the message processing unit 142 of the leader node performs an operation of adding to the corresponding approval node set 301 or denial node set 302 as needed. When the denial message 203 is received, it is known that this is a minority, and the node is immediately stopped.

　上述した内容は、全てのフォロワーノードが正しく応答をした場合であるが、背景で述べたように、計算機ノード１１０や、ノード間ネットワーク１１２、ノード間通信インタフェース１２３での障害発生等により、フォロワーノードが無応答となる場合、あるいは、応答が著しく遅延する場合がある。この状況下では、上述のフローのうち、メッセージ同期種類が「同期」（６０４：同期）であった際の、リーダーノードのメッセージ処理部１４２が行う受信待ち（６０５）の処理に影響が及ぶ。 The above-mentioned content is a case where all the follower nodes respond correctly, but as described in the background, the follower node is caused by a failure in the computer node 110, the inter-node network 112, the inter-node communication interface 123, or the like. May become unresponsive or the response may be significantly delayed. Under this situation, the process of waiting for reception (605) performed by the message processing unit 142 of the leader node when the message synchronization type is “synchronization” (604: synchronization) in the above-described flow is affected.

　ステップ６０２での通知メッセージ２０１の送信時刻から所定時間（以下、タイムアウト時間とする）が経過した場合（６０６：タイムアウト）、リーダーノードのメッセージ処理部１４２は、その時刻までに承認メッセージ２０２もしくは否認メッセージ２０３を送らなかったフォロワーノードで障害が発生したとみなす。そして、リーダーノードのメッセージ処理部１４２は、多数決結果の判定判定（６１０）の処理にうつる。このとき、障害が発生したとみなされたフォロワーノードは、否認ノード集合３０２にあるとみなしてもよいし（６２１）、計算機ノードの全数から取り除いて判断してもよい。 When a predetermined time (hereinafter referred to as timeout time) has elapsed from the transmission time of the notification message 201 in step 602 (606: timeout), the message processing unit 142 of the leader node has received the approval message 202 or the denial message by that time. It is assumed that a failure has occurred in the follower node that did not send 203. Then, the message processing unit 142 of the leader node proceeds to the determination processing (610) of the majority result. At this time, the follower node that is considered to have failed may be considered to be in the denial node set 302 (621), or may be determined by removing it from the total number of computer nodes.

　上述のタイムアウト時間としては、最も単純な例では定数値が想定できる。或いは、例えば計算機ノード１１０が、任意のアルゴリズムによってタイムアウト時間を決定するとしてもよい。例えば、上述の通知メッセージ２０１に応じて所定処理を実行する時間を所定アルゴリズムで予測して、フォロワーノードからの承認／否認のいずれかのメッセージの到達期待時間を特定し、これをタイムアウト時間と算出するなどしてもよい。定数を用いた場合であっても、全メッセージのうちメッセージ同期種類が「同期」のものについてのみ、タイムアウト時間を用いた障害判定が行われるため、過半数の計算機ノードが大きな遅延なく動作している場合には、半数未満の、すなわち少数派の計算機ノードで一時的な遅れがあっても、それを吸収することができる。 ∙ As the above timeout period, a constant value can be assumed in the simplest example. Alternatively, for example, the computer node 110 may determine the timeout time by an arbitrary algorithm. For example, a predetermined algorithm is used to predict a time for executing a predetermined process in accordance with the notification message 201 described above, and an expected arrival time of either an approval / denial message from the follower node is specified, and this is calculated as a timeout time. You may do it. Even if a constant is used, failure determination using the timeout time is performed only for all messages whose message synchronization type is "synchronous", so the majority of computer nodes are operating without significant delay. In some cases, even if there is a temporary delay in less than half of the computer nodes, that is, a minority computer node, it can be absorbed.

　また、メッセージ同期種類３０４が「同期」となる処理要求の要求パラメータに、実行制約時間を加え、これを基準にタイムアウト時間を定めることもできる。これは、タイムアウト時間を明示的にアプリケーションから制御することで、例えば周期的に実行されるタスクの周期時間というようなアプリケーションの特性に応じた障害判定を行うことが可能になるメリットがある。 Also, an execution constraint time can be added to a request parameter of a processing request in which the message synchronization type 304 is “synchronous”, and a timeout time can be determined based on this. This is advantageous in that it is possible to perform failure determination according to application characteristics such as a periodic time of a task that is periodically executed by explicitly controlling the timeout time from the application.

　メッセージ同期種類３０４が「同期」となる処理要求は、例えば、既知の時間制約のもと一定の時間間隔をおいて同じプログラムを繰り返し実行するタスク（「周期タスク」）での、繰り返し実行する領域の開始点を示す要求であるとか、未知のタイミングで到達する外部からのメッセージ受信、つまり、外部通信インタフェース１２２からのデータ受信の要求などである。メッセージ同期種類３０４が「同期」である二つの連続したメッセージについて、それらに対応する処理要求を発行するタスク１３３の実行箇所への早い方の到達時刻から、遅い方の到達時刻の時間についての時間制約（以下「時間制約」という）が明確である場合に、その時間制約を、いずれかの処理要求の要求パラメータに加える。 The processing request in which the message synchronization type 304 is “synchronous” is, for example, an area that is repeatedly executed in a task (“periodic task”) that repeatedly executes the same program at a constant time interval under a known time constraint. For example, a request for receiving a message from the outside that arrives at an unknown timing, that is, a request for receiving data from the external communication interface 122. For two consecutive messages whose message synchronization type 304 is "synchronous", the time from the earlier arrival time to the execution location of the task 133 that issues a processing request corresponding to them, the time from the earlier arrival time When the constraint (hereinafter referred to as “time constraint”) is clear, the time constraint is added to the request parameter of any processing request.

　この時間制約を用いたステップ６０５におけるタイムアウト処理は、下記の通りになる。リーダーノードのメッセージ処理部１４２は、直前のメッセージ同期種類３０４が「同期」である処理要求の受信時刻から、その要求パラメータに含まれる時間制約、もしくは、現に処理している処理要求の要求パラメータに含まれる時間制約（どちらを用いるかは実装に依存する）だけ経った場合に、タイムアウトとみなす。これ以外のメッセージ同期種類３０４についても、あるいは、フォロワーノードについても、例えば、直前のメッセージ同期種類３０４が「同期」である処理要求の受信時刻から、その要求パラメータに含まれる時間制約を、あるいは、その時間制約を適当な割合で按分した時間をタイムアウト時間として用いることができる。 Timeout processing in step 605 using this time constraint is as follows. The message processing unit 142 of the leader node changes the time constraint included in the request parameter or the request parameter of the processing request currently being processed from the reception time of the processing request whose previous message synchronization type 304 is “synchronous”. A timeout is considered when only the included time constraints (which depends on the implementation) are passed. For other message synchronization types 304 or follower nodes, for example, from the reception time of the processing request whose previous message synchronization type 304 is “synchronous”, the time constraint included in the request parameter, or A time obtained by apportioning the time constraint at an appropriate ratio can be used as the timeout time.

　なお、後述するが、メッセージ種類が「非同期」の時の受信待ちにおいては、別の方法で決められたタイムアウト時間を用いて、障害判定を行ってもよいし、タイムアウトを用いないということもできる。後者では、過半数のノードが応答をしない場合にはリーダーノードはフォロワーノードからの承認／否認メッセージを無限に待つことになるが、そもそも過半数のノードが無応答である場合には、ＦＴシステム１として動作を継続できない状態であって、停止させられるべき状態であり、可用性の点ではこれを大きく損なうものではない。 As will be described later, when waiting for reception when the message type is “asynchronous”, a failure determination may be performed using a timeout time determined by another method, or a timeout may not be used. . In the latter case, when a majority of the nodes do not respond, the leader node waits indefinitely for an approval / denial message from the follower node. However, if a majority of the nodes are not responding, the FT system 1 is used. This is a state in which the operation cannot be continued and should be stopped, and this is not greatly impaired in terms of availability.

　続いて、フォロワーノードのメッセージ処理部１４２における処理について説明する。図１０は第１の実施形態に係るＦＴシステム１におけるフォロワーノードのメッセージ処理部１４２での動作フロー例を示す図であり、具体的には、フォロワーノードのメッセージ処理部１４２が、タスク通信部１４０から処理要求を受けて対応する処理を行い、その結果をタスク通信部１４０に返す処理の一例を示したフローチャートである。 Subsequently, processing in the message processing unit 142 of the follower node will be described. FIG. 10 is a diagram illustrating an example of an operation flow in the message processing unit 142 of the follower node in the FT system 1 according to the first embodiment. Specifically, the message processing unit 142 of the follower node includes the task communication unit 140. 7 is a flowchart illustrating an example of processing for receiving a processing request from the server and performing corresponding processing and returning the result to the task communication unit 140.

　この場合、フォロワーノードのメッセージ処理部１４２は、まず当該フォロワーノードにおけるタスク通信部１４０からの処理要求、またはノード間通信部１４１からのメッセージを待つ（７００）。ここで、フォロワーノードのメッセージ処理部１４２は、タスク通信部１４０から処理要求を受け取った場合（７０１：Ｙｅｓ）、通知メッセージリスト１５１を確認し、タスク通信部１４０から受けた処理要求に対応する通知メッセージ２０１がリーダーから到着しているかどうか確認する（７０２）。 In this case, the message processing unit 142 of the follower node first waits for a processing request from the task communication unit 140 or a message from the inter-node communication unit 141 in the follower node (700). Here, when the message processing unit 142 of the follower node receives a processing request from the task communication unit 140 (701: Yes), the notification message list 151 is confirmed and a notification corresponding to the processing request received from the task communication unit 140 is received. It is confirmed whether the message 201 has arrived from the reader (702).

　リーダーノードから対応する通知メッセージ２０１が到着している場合（７０２：Ｙｅｓ）、フォロワーノードのメッセージ処理部１４２は、通知メッセージ２０１の参照する処理要求を、タスク通信部１４０から受け取った処理要求と比較する（７０３）。ここでの比較処理は、タスク通信部１４０から得た処理要求と通知メッセージ２０１が含んでいた処理要求の互いの識別情報や付随データ等の一致判定など、二つの処理要求に応じた処理の結果が同一であると判定するのに十分な処理であればよい。 When the corresponding notification message 201 has arrived from the leader node (702: Yes), the message processing unit 142 of the follower node compares the processing request referred to by the notification message 201 with the processing request received from the task communication unit 140. (703). The comparison processing here is the result of processing in accordance with two processing requests, such as the determination of whether the processing request obtained from the task communication unit 140 and the processing request included in the notification message 201 match each other's identification information or accompanying data. As long as they are sufficient to determine that they are the same.

　このステップ７０３の結果、通知メッセージ２０１の参照する処理要求と、タスク通信部１４０から受け取った処理要求が同一であると判定したならば（７０３：ＹＥＳ）、フォロワーノードのメッセージ処理部１４２は、リーダーノードに対する承認メッセージ２０２の送信をノード間通信部１４１を通じて行う（７０４）。 If it is determined in step 703 that the processing request referred to by the notification message 201 is the same as the processing request received from the task communication unit 140 (703: YES), the message processing unit 142 of the follower node The approval message 202 is transmitted to the node through the inter-node communication unit 141 (704).

　また、フォロワーノードのメッセージ処理部１４２は、該当通知メッセージ２０１をもとに、必要であれば対応する所定処理を実行して（７０５）、処理結果を生成し（７０６）、これをタスク通信部１４０に対して伝達する（７０７）。 Further, the message processing unit 142 of the follower node executes a corresponding predetermined process if necessary based on the corresponding notification message 201 (705), generates a processing result (706), and outputs it as a task communication unit. It transmits to 140 (707).

　他方、上述のステップ７０３の結果、通知メッセージ２０１の参照する処理要求と、タスク通信部１４０から受け取った処理要求が同一でないと判定したならば（７０３：Ｎｏ）、フォロワーノードのメッセージ処理部１４２は、リーダーノードに対する否認メッセージ２０３の送信をノード間通信部１４１を通じて行う（７０８）。なお、このときフォロワーの立場からはリーダーノードの挙動が異常とみなせるので、フォロワーノードのメッセージ処理部１４２は、リーダーノードの再選出等の提案を他の計算機ノード各々に行うなど、障害発生時の処理を行う（７１１）。 On the other hand, if it is determined in step 703 that the processing request referred to by the notification message 201 is not the same as the processing request received from the task communication unit 140 (703: No), the message processing unit 142 of the follower node Then, the denial message 203 is transmitted to the leader node through the inter-node communication unit 141 (708). At this time, since the behavior of the leader node can be regarded as abnormal from the standpoint of the follower, the message processing unit 142 of the follower node makes a proposal such as re-election of the leader node to each of the other computer nodes. Processing is performed (711).

　上述のステップ７０１にてタスク通信部１４０からの処理要求を受け取った時点で、リーダーノードからの処理要求が到着していない場合（７０２：Ｎｏ）、フォロワーノードのメッセージ処理部１４２は、リーダーノードからの処理要求の到着を待つこととなる（７０９）。リーダーノードからの処理要求を受信した場合（７１０：Ｎｏ）、フォロワーノードのメッセージ処理部１４２は、リーダーノードからの処理要求が既に通知メッセージリスト１５１に存在していた場合（７０２：Ｙｅｓ）と同じ処理に進む。 When the processing request from the leader node has not arrived when the processing request from the task communication unit 140 is received in step 701 described above (702: No), the message processing unit 142 of the follower node receives the request from the leader node. (709). When the processing request from the leader node is received (710: No), the message processing unit 142 of the follower node is the same as when the processing request from the leader node already exists in the notification message list 151 (702: Yes). Proceed to processing.

　他方、リーダーノードの場合と同様に、フォロワーノードにおいて、リーダーノードもしくはノード間通信インタフェース１２３の障害等により、リーダーノードからの通知メッセージ２０１が著しく遅延するか、あるいは、これを受信できないことがある。このため、フォロワーノードのメッセージ処理部１４２は、ステップ７０９の待ちにおいて、処理要求の受信時刻からある時間（以下、タイムアウト時間という）が経過しても、リーダーノードから相当する通知メッセージ２０１を受信しなかった時は、リーダーノードあるいはそのノード間通信インタフェース１２３において障害が発生したとみなす（７１０：Ｙｅｓ）。そして、ステップ７０３で、通知メッセージ２０１の参照する処理要求と、タスク通信部１４０から受け取った処理要求とが同一でないと判定された場合と同様に、リーダーノードの再選出提案などの障害発生時の処理を行う（７１１）。 On the other hand, as in the case of the leader node, in the follower node, the notification message 201 from the leader node may be significantly delayed or cannot be received due to a failure of the leader node or the inter-node communication interface 123 or the like. Therefore, the message processing unit 142 of the follower node receives the corresponding notification message 201 from the leader node even when a certain time (hereinafter referred to as a timeout time) has elapsed from the reception time of the processing request while waiting for step 709. If not, it is considered that a failure has occurred in the leader node or the inter-node communication interface 123 (710: Yes). Then, in the case where it is determined in step 703 that the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are not the same, at the time of failure such as a leader node re-election proposal Processing is performed (711).

　なお、ステップ７０１において、フォロワーノードのメッセージ処理部１４２が、ノード間通信部１４１から通知メッセージ２０１を受け取った場合（７０１：Ｎｏ）、当該通知メッセージ２０１に対応する処理要求は、タスク通信部１４０から未だ受信していないため、後の処理のために、該当通知メッセージ２０１の情報を通知メッセージリスト１５１に格納する（７１２）。その後、フォロワーノードのメッセージ処理部１４２は、処理要求もしくは通知メッセージ２０１の受信待ち（７００）の処理に戻る。 In step 701, when the message processing unit 142 of the follower node receives the notification message 201 from the inter-node communication unit 141 (701: No), the processing request corresponding to the notification message 201 is sent from the task communication unit 140. Since it has not been received yet, the information of the corresponding notification message 201 is stored in the notification message list 151 for later processing (712). Thereafter, the message processing unit 142 of the follower node returns to the process of waiting for reception of the processing request or notification message 201 (700).

　以上の第１の実施形態を採用することで、多数決冗長構成を成すリーダーノードとフォロワーノードの動作を同一に保ちつつ、リーダーノードでのタイムアウト判定（フォロワーノードからの承認／否認メッセージの受信待ち期限の判定）を同期メッセージに関するもののみに減らすことにより、少数のフォロワーノードの一時的な処理や伝送の遅延を許容することができる。 By adopting the above first embodiment, the operation of the leader node and the follower node in the majority redundant configuration is kept the same, and the timeout determination at the leader node (reception waiting time of the approval / denial message from the follower node) Is reduced to only those related to synchronization messages, it is possible to allow a temporary processing and transmission delay of a small number of follower nodes.

　また、メッセージ種類テーブル１５２を使用したメッセージ同期種類の判定を行うことによって、例えば、同期を行うための同期要求を発行するＡＰＩをタスク通信部１４０が提供することにより、アプリケーション１３２が同期するタイミングを制御できるようになり、アプリケーション１３２の特性に一致したタイムアウト判定の制御を実現することができる。 Further, by determining the message synchronization type using the message type table 152, for example, the task communication unit 140 provides an API for issuing a synchronization request for performing synchronization, whereby the timing at which the application 132 is synchronized is determined. Control of timeout determination that matches the characteristics of the application 132 can be realized.

　なお、以上の説明では、タスク１３３が単一の場合を述べたが、複数ある場合も同様である。ただし、タスク通信部１４０およびメッセージ処理部１４２の処理は、タスクごとにスレッド化を行うなどの並列化を行うか、あるいは、入出力待ちの多重化処理を加えることによって、一つのタスク１３３に対する処理要求の処理によって、他のタスク１３３が滞らないようにする必要がある。 In the above description, the case where there is a single task 133 has been described. However, the processing of the task communication unit 140 and the message processing unit 142 is performed for one task 133 by performing parallel processing such as threading for each task or by adding multiplexing processing waiting for input / output. It is necessary to prevent other tasks 133 from being delayed by processing the request.

　また、タスク通信部１４０からの処理要求の受け取りとノード間通信部１４１からの通知メッセージ２０１の受け取りを単一の流れとして記述したが、効率化のために複数のスレッド等を用いることによって、それぞれを並列的に実行しても構わない。
－－－第２の実施形態－－－
　計算機ノードのメッセージ処理部１４２が要求リストを用いて処理を行う形態も想定できる。この要求リストとは、メッセージ処理部１４２が、他の計算機ノードから通知メッセージ２０１を受け取った際に、それに対して承認メッセージ２０２を返すか否認メッセージ２０３を返すかを選択するためのテーブルである。そのため要求リストは、返信するメッセージを承認／否認メッセージのいずれにするか判断するための十分なデータ、例えば、処理要求や要求付随データの各値を持つ。 In addition, the reception of the processing request from the task communication unit 140 and the reception of the notification message 201 from the inter-node communication unit 141 are described as a single flow, but by using a plurality of threads or the like for efficiency, May be executed in parallel.
--- Second Embodiment ---
It can be assumed that the message processing unit 142 of the computer node performs processing using the request list. This request list is a table for selecting, when the message processing unit 142 receives the notification message 201 from another computer node, whether to return the approval message 202 or the denial message 203 in response thereto. Therefore, the request list has sufficient data for determining whether the message to be returned is an approval / denial message, for example, each value of a processing request or request-accompanying data.

　こうした形態を第２の実施例として説明する。なお、第２の実施形態におけるＦＴシステム１は図１と同様の構成である。第２の実施形態におけるフォロワーノードの挙動は、図１１に示すフローに沿ったものとなる。これ以外の動作と構成は、第１の実施形態と同様であるため、重複箇所の説明については省略する。 Such a form will be described as a second embodiment. The FT system 1 in the second embodiment has the same configuration as that in FIG. The behavior of the follower node in the second embodiment follows the flow shown in FIG. Since other operations and configurations are the same as those of the first embodiment, description of overlapping portions is omitted.

　図１１、図１２は第２の実施形態におけるフォールトトレラントシステムのフォロワーノードのメッセージ処理部での動作フロー例１、例２を示す図である。ここでの第１の実施形態との大きな違いは、タスク通信部１４０で得た処理要求に対する処理を、リーダーノードからの通知メッセージ２０１の到着を待たずに進行させる点となる。 FIG. 11 and FIG. 12 are diagrams showing an operation flow example 1 and an example 2 in the message processing unit of the follower node of the fault tolerant system in the second embodiment. The major difference from the first embodiment is that the processing for the processing request obtained by the task communication unit 140 proceeds without waiting for the notification message 201 from the leader node to arrive.

　第２の実施形態におけるフォロワーノードのメッセージ処理部１４２は、タスク通信部１４０からの処理要求、またはノード間通信部１４１からの通知メッセージの伝達を待つ（８００）。例えば、タスク通信部１４０から処理要求を受け取った場合（８０１：Ｙｅｓ）、フォロワーノードのメッセージ処理部１４２は、通知メッセージリスト１５１を確認し、対応する通知メッセージ２０１がリーダーノードから到着しているかどうか確認する（８０２）。 The message processing unit 142 of the follower node in the second embodiment waits for a processing request from the task communication unit 140 or a notification message from the inter-node communication unit 141 (800). For example, when a processing request is received from the task communication unit 140 (801: Yes), the message processing unit 142 of the follower node checks the notification message list 151 and determines whether the corresponding notification message 201 has arrived from the leader node. Confirm (802).

　ステップ８０２の結果、タスク通信部１４０から得た処理要求に対応する通知メッセージ２０１が既に到着していると判定した場合（８０２：Ｙｅｓ）、フォロワーノードのメッセージ処理部１４２は、第１の実施形態と同様の処理を実行する（８０３～８１０）。該当処理は第１の実施形態と同様であるので説明は省略する。 As a result of step 802, when it is determined that the notification message 201 corresponding to the processing request obtained from the task communication unit 140 has already arrived (802: Yes), the message processing unit 142 of the follower node is the first embodiment. The same processing is executed (803 to 810). Since the corresponding process is the same as that of the first embodiment, the description thereof is omitted.

　他方、ステップ８０２の結果、タスク通信部１４０から得た処理要求に対応する通知メッセージ２０１が到着していないと判定した場合（８０２：Ｎｏ）、フォロワーノードのメッセージ処理部１４２は、該当処理要求に対応する処理を行う（８０６）。ただし、外部通信インタフェース１２２からの送信などの、外部に対して影響を与える処理の場合には、実際にはその処理を行わずに、その想定される結果のみ計算する。また、フォロワーノードのメッセージ処理部１４２は、上述した要求リストに対して、処理要求に関する情報と、その処理の結果などを含む要求付随データを付け加える（８０７）。こうしてレコードが生成された要求リストは、図６に示した通知メッセージ２０１の処理要求４００および付随データ４０６と同様形式のデータを含むレコードの集合体となる。 On the other hand, when it is determined in step 802 that the notification message 201 corresponding to the processing request obtained from the task communication unit 140 has not arrived (802: No), the message processing unit 142 of the follower node responds to the processing request. Corresponding processing is performed (806). However, in the case of processing that affects the outside, such as transmission from the external communication interface 122, only the expected result is calculated without actually performing the processing. Further, the message processing unit 142 of the follower node adds request-accompanying data including information related to the processing request and the result of the processing to the above-described request list (807). The request list in which the records are generated in this way is a collection of records including data in the same format as the processing request 400 and the accompanying data 406 of the notification message 201 shown in FIG.

　フォロワーノードのメッセージ処理部１４２は、上述のステップ８０６の結果から処理結果を生成し（８０９）、当該処理結果をタスク通信部１４０に伝達する（８１０）。その後、フォロワーノードのメッセージ処理部１４２は、タスク通信部１４０とノード間通信部１４１からの伝達待ちに戻る（８００）。 The message processing unit 142 of the follower node generates a processing result from the result of step 806 described above (809), and transmits the processing result to the task communication unit 140 (810). Thereafter, the message processing unit 142 of the follower node returns to waiting for transmission from the task communication unit 140 and the inter-node communication unit 141 (800).

　一方、ステップ８００において、ノード間通信部１４１から通知メッセージ２０１を受け取った場合（８０１：Ｎｏ）、フォロワーノードのメッセージ処理部１４２は、要求リストを確認し、通知メッセージ２０１のもつ処理要求参照に対応する処理要求があるかどうか確認する（８１１）。この確認処理の結果、ノード間通信部１４１から受け取った通知メッセージ２０１に対応する処理要求が、要求リスト中に存在すると判定した場合（８１１：Ｙｅｓ）、フォロワーノードのメッセージ処理部１４２は、タスク通信部１４０から得た通知メッセージ２０１と、要求リストに含まれていた該当処理要求とを比較する　（８１２）。この比較の処理は、図１０のステップ７０３に関して上述したものと同様である。 On the other hand, when the notification message 201 is received from the inter-node communication unit 141 in Step 800 (801: No), the message processing unit 142 of the follower node confirms the request list and corresponds to the processing request reference included in the notification message 201. It is confirmed whether there is a processing request to be performed (811). As a result of this confirmation processing, when it is determined that a processing request corresponding to the notification message 201 received from the inter-node communication unit 141 exists in the request list (811: Yes), the message processing unit 142 of the follower node performs task communication. The notification message 201 obtained from the unit 140 is compared with the corresponding processing request included in the request list (812). This comparison process is similar to that described above with respect to step 703 of FIG.

　こうした比較処理の結果、通知メッセージ２０１の参照する処理要求と、タスク通信部１４０から受け取った処理要求とが同一であるとみなされた場合（８１２：Ｙｅｓ）、フォロワーノードのメッセージ処理部１４２は、承認メッセージ２０２をリーダーノードに送信する（８１３）。他方、通知メッセージ２０１の参照する処理要求と、タスク通信部１４０から受け取った処理要求とが同一でなかったとみなされた場合（８１２：ＮＯ）、フォロワーノードのメッセージ処理部１４２は、否認メッセージ２０３をリーダーノードに送信し（８１４）、リーダーノードに関する障害時処理を行う（８１５）。なお、上述のステップ８１１において、対応する処理要求が要求リスト中に無かった場合、フォロワーノードのメッセージ処理部１４２は、該当通知メッセージ２０１の情報を通知メッセージリスト１５１に追加する（８１６）。 As a result of such comparison processing, when the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are considered to be the same (812: Yes), the message processing unit 142 of the follower node An approval message 202 is transmitted to the leader node (813). On the other hand, when the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are not the same (812: NO), the message processing unit 142 of the follower node displays the denial message 203. The data is transmitted to the leader node (814), and a failure process related to the leader node is performed (815). If there is no corresponding processing request in the request list in step 811 described above, the message processing unit 142 of the follower node adds the information of the notification message 201 to the notification message list 151 (816).

　以上の処理により、フォロワーノードは、リーダーノードからの通知メッセージを待つことなく、タスクに関する処理を各々進めることができるため、処理時間の遅延を短縮することができる。ただし、この場合のフォロワーノードは、リーダーノードでの結果を使わずに処理を進めるため、処理内容の相違が発生する確率が高くなる。そのためフォロワーノードは、一部の処理については、リーダーノードからの通知メッセージを待ち、その他の処理については、待たずに進めるといった、本実施例の部分的な適用も可能である。こうした部分的な適用をする際のフォロワーノードにおける判断には、メッセージ同期種類を用いることもできる。例えば、メッセージ同期種類が「同期」あるいは「準同期」のものについてはリーダーノードからの通知メッセージを待ち、次の実施例で述べるメッセージ同期種類が「非同期」の場合については、リーダーノードからの通知メッセージを待たないとするアルゴリズムである。
－－－第３の実施形態－－－
　上述の第１および第２の実施形態では、メッセージ同期種類が「同期」、「準同期」のいずれかの場合について選択的に処理を行う例を示した。ここでは、これら同機種類に加えて、メッセージ同期種類が「非同期」の場合についても想定した例を示すものとする。図１３は第３の実施形態におけるＦＴシステム１のリーダーノードのメッセージ処理部１４２での動作フロー例を示す図であり、図８のフローに対する差分のみを示したフローチャートである。従って、第１の実施例と同様の構成と処理については説明を省略する。 With the above processing, the follower node can proceed with each process related to a task without waiting for a notification message from the leader node, and therefore the processing time delay can be reduced. However, since the follower node in this case proceeds with the process without using the result at the leader node, there is a high probability that a difference in processing contents will occur. Therefore, the follower node can also partially apply the present embodiment in which a part of the processing waits for a notification message from the leader node and the other processing proceeds without waiting. The message synchronization type can also be used for the determination in the follower node when performing such partial application. For example, if the message synchronization type is “synchronous” or “semi-synchronous”, it waits for a notification message from the leader node, and if the message synchronization type described in the next embodiment is “asynchronous”, notification from the leader node It is an algorithm that does not wait for a message.
--- Third embodiment ---
In the first and second embodiments described above, an example in which the process is selectively performed when the message synchronization type is “synchronous” or “semi-synchronous” has been described. Here, in addition to the same machine type, an example in which the message synchronization type is “asynchronous” is assumed. FIG. 13 is a diagram showing an example of an operation flow in the message processing unit 142 of the leader node of the FT system 1 in the third embodiment, and is a flowchart showing only a difference from the flow of FIG. Therefore, the description of the same configuration and processing as in the first embodiment is omitted.

　ここで、リーダーノードのメッセージ処理部１４２は、タスク通信部１４０からの処理要求の伝達、もしくはノード間通信部１４１からの通知メッセージ２０１の受信を待つ（１０００）。待機中、タスク通信部１４０から処理要求を受け取った場合（１００１：Ｙｅｓ）、リーダーノードのメッセージ処理部１４２は、図８のフローにおけるステップ６０１～６０３を同様に実行した後、メッセージ同期種類を判定する（１００２）。メッセージ同期種類が「同期」であった場合及び「準同期」であった場合の処理は、第１の実施形態で示した場合と同様である。 Here, the message processing unit 142 of the leader node waits for transmission of a processing request from the task communication unit 140 or reception of the notification message 201 from the inter-node communication unit 141 (1000). If a processing request is received from the task communication unit 140 during standby (1001: Yes), the message processing unit 142 of the leader node determines the message synchronization type after executing steps 601 to 603 in the flow of FIG. (1002). The processing when the message synchronization type is “synchronous” and “semi-synchronous” is the same as the case shown in the first embodiment.

　他方、メッセージ同期種類が「非同期」であった場合（１００２：非同期）、リーダーノードのメッセージ処理部１４２は、フォロワーノードからの承認／否認メッセージを待つことなく、図８のフローにおけるステップ６１１に進む。なお、十分な数の承認／否認メッセージをフォロワーノードから既に受信していた場合、リーダーノードのメッセージ処理部１４２は、この時点で「準同期」の場合と同様に多数決処理（図８のステップ６１０）を行なってもよい。他方、多数決処理をしない場合は、タスク通信部１４０に対して結果を返した後に多数決処理を行う必要があるが、その場合、ノード間通信部１４１から通知メッセージ２０１を受信した場合（１００１：Ｎｏ）に実行する。 On the other hand, when the message synchronization type is “asynchronous” (1002: asynchronous), the message processing unit 142 of the leader node proceeds to step 611 in the flow of FIG. 8 without waiting for an approval / denial message from the follower node. . When a sufficient number of approval / denial messages have already been received from the follower node, the message processing unit 142 of the leader node at this point determines the majority process (step 610 in FIG. 8) as in the case of “semi-synchronization”. ) May be performed. On the other hand, when majority processing is not performed, it is necessary to perform majority processing after returning a result to the task communication unit 140. In this case, when the notification message 201 is received from the inter-node communication unit 141 (1001: No) ) To run.

　リーダーノードのメッセージ処理部１４２は、ノード間通信部１４１から通知メッセージ２０１を受信したとき（１００１：Ｎｏ）、それが承認メッセージ２０２であれば（１００３：Ａｃｋ）、それに対応する承認メッセージリスト１５０の行の承認ノード集合３０１に、その送信元のフォロワーノードの情報を加える（１００４）。他方、否認メッセージ２０３であれば（１００３：Ｎａｃｋ）、リーダーノードのメッセージ処理部１４２は、否認ノード集合３０２に送信元のフォロワーノードの情報を加える（１００５）。この行に関して、承認ノード集合３０１あるいは否認ノード集合３０２にまだ入っていないフォロワーノードが存在する場合（１００６：Ｎｏ）、リーダーノードのメッセージ処理部１４２は受信待ちに戻る（１０００）。すべてのフォロワーノードから承認／否認メッセージを受信した場合（１００６：Ｙｅｓ）、リーダーノードのメッセージ処理部１４２は、多数決処理にうつる。この時、否認メッセージ２０３が多数の場合（１００７：Ｎｏ）、リーダーノードのメッセージ処理部１４２は、当該リーダーノードに関する障害時処理を行う。 When the message processing unit 142 of the leader node receives the notification message 201 from the inter-node communication unit 141 (1001: No), if it is the approval message 202 (1003: Ack), the message processing unit 142 of the approval node in the corresponding approval message list 150 Information of the follower node of the transmission source is added to the approval node set 301 of the row (1004). On the other hand, if the message is the denial message 203 (1003: Nack), the message processing unit 142 of the leader node adds the information of the source follower node to the denial node set 302 (1005). When there is a follower node not yet included in the approval node set 301 or the denial node set 302 regarding this row (1006: No), the message processing unit 142 of the leader node returns to reception waiting (1000). When the approval / denial message is received from all the follower nodes (1006: Yes), the message processing unit 142 of the leader node goes to the majority process. At this time, when there are a large number of rejection messages 203 (1007: No), the message processing unit 142 of the leader node performs a failure process related to the leader node.

　他方、承認メッセージ２０２が多数の場合（１００７：Ｙｅｓ）、リーダーノードのメッセージ処理部１４２は、承認メッセージリスト１５０の当該行の削除などを行い（１００９）、受信待ち（１０００）に戻る。この時、リーダーノードのメッセージ処理部１４２は、必要に応じて否認メッセージ２０３を送ったフォロワーノードを停止させる処理等を行なってもよい。 On the other hand, when there are many approval messages 202 (1007: Yes), the message processing unit 142 of the leader node deletes the corresponding line in the approval message list 150 (1009), and returns to waiting for reception (1000). At this time, the message processing unit 142 of the leader node may perform processing for stopping the follower node that has sent the denial message 203 as necessary.

　また、十分な数の承認／否認メッセージをフォロワーノードから既に受信していた場合、リーダーノードのメッセージ処理部１４２は、ステップ１００６から多数決処理（１００７）へ処理を移すとしてもよい。 If a sufficient number of approval / denial messages have already been received from the follower node, the message processing unit 142 of the leader node may move the processing from step 1006 to majority processing (1007).

　以上の実施形態により、第１の実施形態の場合と比べて、リーダーノードがフォロワーノードからの承認／否認メッセージを待つ回数を低減することができるため、個々の計算機ノードの遅れをより効率的かつ適切に許容することが可能になる。
－－－第４の実施形態－－－
　ここでは、メッセージ同期種類が「同期」、「非同期」の２種である場合について想定した例を示すものとする。図１４は第４の実施形態におけるＦＴシステム１のリーダーノードのメッセージ処理部１４２での動作フロー例を示す図であり、図８のフローに対する差分のみを示したフローチャートである。従って、第１の実施例と同様の構成と処理については説明を省略する。また、第３の実施形態と同様の構成と処理についても説明を省略する。 According to the above embodiment, the number of times that the leader node waits for the approval / denial message from the follower node can be reduced as compared with the case of the first embodiment. It becomes possible to tolerate appropriately.
--- Fourth embodiment ---
Here, an example is assumed in which there are two types of message synchronization types, “synchronous” and “asynchronous”. FIG. 14 is a diagram showing an example of an operation flow in the message processing unit 142 of the leader node of the FT system 1 in the fourth embodiment, and is a flowchart showing only a difference from the flow of FIG. Therefore, the description of the same configuration and processing as in the first embodiment is omitted. The description of the same configuration and processing as those in the third embodiment is also omitted.

　ここで、リーダーノードのメッセージ処理部１４２は、タスク通信部１４０からの処理要求の伝達、もしくはノード間通信部１４１からの通知メッセージ２０１の受信を待つ（１１００）。待機中、タスク通信部１４０から処理要求を受け取った場合（１１０１：Ｙｅｓ）、リーダーノードのメッセージ処理部１４２は、図８のフローにおけるステップ６０１～６０３を同様に実行した後、メッセージ同期種類を判定する（１１０２）。メッセージ同期種類が「同期」であった場合の処理は、第１の実施形態で示した場合と同様である（図８のフローにおけるステップ６０５へ）。 Here, the message processing unit 142 of the leader node waits for transmission of a processing request from the task communication unit 140 or reception of the notification message 201 from the inter-node communication unit 141 (1100). When a processing request is received from the task communication unit 140 during standby (1101: Yes), the message processing unit 142 of the leader node determines the message synchronization type after executing steps 601 to 603 in the flow of FIG. (1102). The processing when the message synchronization type is “synchronization” is the same as that shown in the first embodiment (to step 605 in the flow of FIG. 8).

　他方、メッセージ同期種類が「非同期」であった場合（１１０２：非同期）、リーダーノードのメッセージ処理部１４２は、フォロワーノードからの承認／否認メッセージを待つことなく、図８のフローにおけるステップ６１１に進む。なお、十分な数の承認／否認メッセージをフォロワーノードから既に受信していた場合、リーダーノードのメッセージ処理部１４２は、この時点で「準同期」の場合と同様に多数決処理（図８のステップ６１０）を行なってもよい。他方、多数決処理をしない場合は、タスク通信部１４０に対して結果を返した後に多数決処理を行う必要があるが、その場合、ノード間通信部１４１から通知メッセージ２０１を受信した場合（１１０１：Ｎｏ）に実行する。また、ステップ１１０１において、ノード間通信部１４１から通知メッセージ２０１を受信したとき（１００１：Ｎｏ）、以降の処理（１１０３～１１０９）については、第３の実施形態と同様である。 On the other hand, when the message synchronization type is “asynchronous” (1102: asynchronous), the message processing unit 142 of the leader node proceeds to step 611 in the flow of FIG. 8 without waiting for an approval / denial message from the follower node. . When a sufficient number of approval / denial messages have already been received from the follower node, the message processing unit 142 of the leader node at this point determines the majority process (step 610 in FIG. 8) as in the case of “semi-synchronization”. ) May be performed. On the other hand, when majority processing is not performed, it is necessary to perform majority processing after returning the result to the task communication unit 140. In this case, when the notification message 201 is received from the inter-node communication unit 141 (1101: No) ) To run. In step 1101, when the notification message 201 is received from the inter-node communication unit 141 (1001: No), the subsequent processing (1103 to 1109) is the same as in the third embodiment.

　以上の実施形態により、第１の実施形態の場合と比べて、リーダーノードがフォロワーノードからの承認／否認メッセージを待つ回数を低減することができるため、個々の計算機ノードの遅れをより効率的かつ適切に許容することが可能になる。 According to the above embodiment, the number of times that the leader node waits for the approval / denial message from the follower node can be reduced as compared with the case of the first embodiment. It becomes possible to tolerate appropriately.

　以上、本発明を実施するための最良の形態などについて具体的に説明したが、本発明はこれに限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。 The best mode for carrying out the present invention has been specifically described above. However, the present invention is not limited to this, and various modifications can be made without departing from the scope of the present invention.

　こうした本実施形態によれば、フォールトトレラントシステムにおいて、計算機ノード間の同期のためのメッセージを分類し、その種類に応じて、一部のメッセージでのみ多数決投票の結果を待ち、そうでない場合には非同期的に実行するといった多数決投票の挙動変更を実行して、計算機ノード間での処理の待ち合わせ回数を低減し、以って計算機ノード間での性能ばらつきの影響を相対的に抑制可能となる。そうした効果は、システム全体における処理の遅延や効率低下を軽減できることにもつながる。また、障害判定を行うにあたり、アプリケーションの制約から明確である時間を障害判定の基準として用い、かつ、その基準点とメッセージ種類を一致させることにより、性能差を許容しつつ、過度な障害判定による可用性の低減を防止することも出来る。 According to the present embodiment, in the fault-tolerant system, messages for synchronization between computer nodes are classified, and depending on the type, only a part of the messages waits for the result of majority voting. The behavior change of majority voting such as executing asynchronously is executed to reduce the number of processing waits between computer nodes, thereby relatively suppressing the influence of performance variation between computer nodes. Such an effect also leads to reduction of processing delay and efficiency reduction in the entire system. Also, when performing failure determination, use the time that is clear from the application constraints as the failure determination criterion, and match the reference point and message type to allow for performance differences while allowing excessive failure determination. It is also possible to prevent a reduction in availability.

　したがって、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性が向上可能となる。 Therefore, in a fault-tolerant system composed of multiple computer nodes with different architectures and basic software, etc., the influence of overall performance due to performance differences and variations between computer nodes is reduced, and excessive fault determination due to such influences is reduced. By preventing this, availability in the system can be improved.

　本明細書の記載により、少なくとも次のことが明らかにされる。すなわち、フォールトトレラントシステムにおいて、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または準同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行するものである、としてもよい。記載 At least the following will be made clear by the description in this specification. That is, in the fault tolerant system, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is either synchronous or semi-synchronous. If the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing content corresponds to quasi-synchronization, it is possible to wait for reception of the follow-up propriety message for at least half of the computers other than the other computers.

　こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、準同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。 In this way, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or semi-synchronization is necessary, the computer that is the leader in the fault-tolerant system should wait for receipt of a message indicating whether the task can be tracked Control the number of follower calculators. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.

　また、フォールトトレラントシステムにおいて、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期、準同期、および非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しないものである、としてもよい。 In the fault tolerant system, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is synchronous, semi-synchronous, and Judgment is made whether it is asynchronous or not, and if the processing contents correspond to synchronization, all the computers other than the other computer are subjected to waiting for reception of the followability message. If the processing content corresponds to quasi-synchronization, wait for reception of the followability message for at least half of the computers other than the other computer, and the processing content is If it is asynchronous, it does not wait to receive the followability message from a computer other than the other computer. It may be.

　こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、準同期、非同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。特に、該当タスクが各計算機間で非同期で処理されるものである場合、リーダーたる計算機において、フォロワー計算機からの上述のメッセージの受信待ちが不要となり、障害判定に要する各計算機の使用リソースを効率化し、ひいてはシステムにおける更なる可用性向上が可能となる。 In this way, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization, quasi-synchronization, or asynchronous is necessary, the computer that is the leader in the fault-tolerant system receives a message indicating whether the task can be followed or not. Control the number of follower computers to wait for. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved. In particular, when the corresponding task is processed asynchronously between the computers, the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

　また、フォールトトレラントシステムにおいて、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しないものである、としてもよい。 In the fault tolerant system, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is either synchronous or asynchronous. If the processing content corresponds to synchronization, the processing waits for reception of the follow-up propriety message for all computers other than the other computer, and the processing When the content corresponds asynchronously, it may be configured not to wait for reception of the follow-up enable / disable message from a computer other than the other computer.

　こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、非同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。特に、該当タスクが各計算機間で非同期で処理されるものである場合、リーダーたる計算機において、フォロワー計算機からの上述のメッセージの受信待ちが不要となり、障害判定に要する各計算機の使用リソースを効率化し、ひいてはシステムにおける更なる可用性向上が可能となる。 In this way, the followers to wait for the follow-up message about the corresponding task in the fault-tolerant system, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or asynchronous is necessary Control the number of calculators. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved. In particular, when the corresponding task is processed asynchronously between the computers, the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

　また、フォールトトレラント制御方法において、前記他計算機となった計算機のメッセージ処理部が、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または準同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行する、としてもよい。 In the fault tolerant control method, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is synchronous or semi-synchronous. It is determined whether it corresponds to any one, and when the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing content corresponds to quasi-synchronization, waiting for reception of the followability message may be executed for at least half of the computers other than the other computers.

　また、フォールトトレラント制御方法において、前記他計算機となった計算機のメッセージ処理部が、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期、準同期、および非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しない、としてもよい。 In the fault tolerant control method, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is synchronous, semi-synchronous, And if the processing content corresponds to synchronization, and if all of the computers other than the other computer are the target, wait for reception of the follow-up enable / disable message. If the processing content corresponds to quasi-synchronization, wait for reception of the followability message for at least half of the computers other than the other computer, and execute the processing content Is not asynchronous, it does not wait to receive the follow-up message from other computers than the other computer. Good.

　また、フォールトトレラント制御方法において、前記他計算機となった計算機のメッセージ処理部が、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しない、としてもよい。 In the fault tolerant control method, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is either synchronous or asynchronous. If the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing contents are asynchronous, it may be determined not to wait for reception of the followability message from a computer other than the other computer.

１　フォールトトレラントシステム
１０２　端末
１１０　計算機ノード（計算機）
１１０Ａ　リーダーノード
１１０Ｂ、１１０Ｃ　フォロワーノード
１１１　外部ネットワーク
１１２　ノード間ネットワーク
１２０　中央演算装置
１２１　主記憶装置
１２２　外部通信インタフェース
１２３　ノード間通信インタフェース
１２４　補助記憶装置
１３０　基本ソフトウェア
１３１　ＦＴミドルウェア
１３２　アプリケーション
１３３　タスク
１４０　タスク通信部
１４１　ノード間通信部
１４２　メッセージ処理部
１５０　承認メッセージリスト
１５１　通知メッセージリスト
１５２　メッセージ種類テーブル 1 fault tolerant system 102 terminal 110 computer node (computer)
110A leader node 110B, 110C follower node 111 external network 112 inter-node network 120 central processing unit 121 main storage unit 122 external communication interface 123 inter-node communication interface 124 auxiliary storage unit 130 basic software 131 FT middleware 132 application 133 task 140 task communication unit 141 Inter-node communication unit 142 Message processing unit 150 Approval message list 151 Notification message list 152 Message type table

Claims

In each of a plurality of computers that form a majority redundant configuration,
A task communication unit for acquiring the processing contents from a program operating inside the computer;
The predetermined message indicating the processing content of the program running on the other computer is received from any other computer forming the majority voting redundant configuration, and the processing content indicated by the predetermined message and the processing obtained by the task communication unit In accordance with the result of determination of coincidence with the contents, a message processing unit that returns a followability message to the other computer,
The message processing unit of the computer that has become the other computer has a number of computers other than the other computer that are subject to reception of the follow-up / possibility message according to the processing content obtained by the task communication unit of the other computer. To further execute the process of changing
A fault-tolerant system.

The message processing unit of the computer that has become the other computer is:
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to either synchronization or quasi-synchronization, and the processing content corresponds to synchronization If it is, for all computers other than the other computer, wait for reception of the followability message, and if the processing content corresponds to quasi-synchronization, the computer other than the other computer For at least half of the computers, waiting for receiving the follow-up message is executed.
The fault tolerant system according to claim 1.

The message processing unit of the computer that has become the other computer is:
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to one of synchronous, semi-synchronous, and asynchronous, and the processing content is synchronized. If it is compatible, execute waiting for reception of the follow-up enable / disable message for all computers other than the other computer, and if the processing content corresponds to semi-synchronization, If at least half of the other computers are targeted for receiving the follow-up success / failure message, and the processing content is asynchronous, the follow-up from a computer other than the other computer It does not execute the waiting for receiving the message of availability.
The fault tolerant system according to claim 1.

The message processing unit of the computer that has become the other computer is:
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard, it is determined whether the corresponding processing content corresponds to either synchronous or asynchronous, and the processing content corresponds to synchronization. If there is, execute the waiting for reception of the followability message for all the computers other than the other computer, and if the processing content is asynchronous, if from the computer other than the other computer It does not execute reception waiting for the follow-up message.
The fault tolerant system according to claim 1.

In each of a plurality of computers that form a majority redundant configuration,
Processing to acquire the processing contents from a program operating inside the computer;
A predetermined message indicating the processing content of a program operating on the other computer is received from any other computer forming the majority voting redundant configuration, and the processing content indicated by the predetermined message and the processing content obtained by the processing In accordance with the result of the match determination, a process of returning a followability message to the other computer is executed.
Furthermore, in the other computer, a process of changing the number of computers other than the other computer, which is subject to reception of the followability message, according to the processing content obtained from the program operating on the other computer. Perform further,
A fault tolerant system control method characterized by the above.

The message processing unit of the computer that has become the other computer,
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to either synchronization or quasi-synchronization, and the processing content corresponds to synchronization If it is, for all computers other than the other computer, wait for reception of the followability message, and if the processing content corresponds to quasi-synchronization, the computer other than the other computer Waiting for reception of the followability message for at least half of the computers
The fault tolerant control method according to claim 5.

The message processing unit of the computer that has become the other computer,
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to one of synchronous, semi-synchronous, and asynchronous, and the processing content is synchronized. If it is compatible, execute waiting for reception of the follow-up enable / disable message for all computers other than the other computer, and if the processing content corresponds to semi-synchronization, If at least half of the other computers are targeted for receiving the follow-up success / failure message, and the processing content is asynchronous, the follow-up from a computer other than the other computer Do not wait for receipt of acceptable messages,
The fault tolerant control method according to claim 5.

The message processing unit of the computer that has become the other computer,
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard, it is determined whether the corresponding processing content corresponds to either synchronous or asynchronous, and the processing content corresponds to synchronization. If there is, execute the waiting for reception of the followability message for all the computers other than the other computer, and if the processing content is asynchronous, if from the computer other than the other computer Do not wait to receive the follow-up message.
The fault tolerant control method according to claim 5.