JPH06119303A

JPH06119303A - Loosely coupled multiprocessor system

Info

Publication number: JPH06119303A
Application number: JP4267211A
Authority: JP
Inventors: Hiroshi Tanaka; 浩田中
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-10-06
Filing date: 1992-10-06
Publication date: 1994-04-28

Abstract

(57)【要約】【目的】自律的に回復できない障害に陥っているスレー
ブプロセッサをマスタプロセッサによって再起動できる
ようにし、システム稼働率の向上を図る。【構成】データ通信用のデータパッシングバス１０に加
えて、リセット信号線１２−２〜１２−ｎが配設されて
いる。マスタプロセッサ１１−１は、複数のスレーブプ
ロセッサ１１−２〜１１−ｎの動作状態を生存確認メッ
セージ等のメッセージ受け渡しによって監視し、例えば
スレーブプロセッサから応答がなくなった場合、そのス
レーブプロセッサに対してリセット信号線１２−２〜１
２−ｎを介してリセット信号を送信する。これにより、
障害が発生しているスレーブプロセッサはリセットさ
れ、その再起動を試みることができる。 (57) [Summary] [Purpose] A master processor can restart a slave processor that has fallen into a failure that cannot be recovered autonomously, and improves the system operation rate. Structure: In addition to a data passing bus 10 for data communication, reset signal lines 12-2 to 12-n are provided. The master processor 11-1 monitors the operating states of the plurality of slave processors 11-2 to 11-n by passing a message such as a survival confirmation message, and resets the slave processors when no response is received from the slave processors. Signal lines 12-2 to 1
The reset signal is transmitted via 2-n. This allows
The failing slave processor is reset and you can try to restart it.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は疎結合マルチプロセッ
サシステムに関し、特に、データ通信用バスを介して結
合されたマスタプロセッサと複数のスレーブプロセッサ
とから構成される疎結合マルチプロセッサシステムに関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a loosely coupled multiprocessor system, and more particularly to a loosely coupled multiprocessor system including a master processor and a plurality of slave processors coupled via a data communication bus.

【０００２】[0002]

【従来の技術】一般に、疎結合マルチプロセッサシステ
ムは、複数のプロセッサから構成されており、これらプ
ロセッサは、データ通信用のデータパッシングバスを介
して相互接続されている。このような疎結合マルチプロ
セッサシステムにおいては、各プロセッサは個別のオペ
レーティングシステムによって動作している。このた
め、ある１つのプロセッサがそのハードウェアまたはソ
フトウェアの障害等によって故障して正常動作できなく
なっても、その故障したプロセッサを別の新たなプロセ
ッサで代替でき、これによって、疎結合マルチプロセッ
サシステム全体の動作の信頼性を維持することができ
る。2. Description of the Related Art Generally, a loosely coupled multiprocessor system comprises a plurality of processors, which are interconnected via a data passing bus for data communication. In such a loosely coupled multiprocessor system, each processor is operated by an individual operating system. Therefore, even if a certain processor fails due to a hardware or software failure and cannot operate normally, the failed processor can be replaced with another new processor, which allows the loosely coupled multiprocessor system as a whole. The reliability of the operation can be maintained.

【０００３】故障したプロセッサは、プロセッサが正常
に動作しているか否かを検出するための監視処理によっ
て発見される。この監視処理は、メッセージパッシング
方式によるプロセッサ間通信を利用して行なわれるもの
であり、具体的には次のような２つの方法が知られてい
る。The failed processor is found by a monitoring process for detecting whether the processor is operating normally. This monitoring processing is performed by utilizing inter-processor communication based on the message passing method, and specifically, the following two methods are known.

【０００４】１つは、正常動作中のプロセッサの各々が
正常動作中を示すメッセージを他のプロセッサ全てにブ
ロードキャストする方法である。この方法では、各プロ
セッサの動作状態が他の全てのプロセッサに通知される
ので、各プロセッサは他の全てのプロセッサ各々の動作
状態、つまり正常動作状態か故障状態かを認識すること
ができる。しかし、この監視方法は、正常動作中を示す
メッセージが多発されるので、疎結合マルチプロセッサ
システムを構成するプロセッサの数が多い場合にはその
メッセージの多発によって、疎結合マルチプロセッサシ
ステム本来のデータ処理機能に支障がきたされる欠点が
ある。One is a method in which each of the normally operating processors broadcasts a message indicating the normal operation to all the other processors. In this method, since the operating state of each processor is notified to all the other processors, each processor can recognize the operating state of all the other processors, that is, the normal operating state or the faulty state. However, in this monitoring method, a message indicating that the system is operating normally is frequently issued. Therefore, if the number of processors that make up the loosely coupled multiprocessor system is large, the frequent data processing will result in the data processing originally performed by the loosely coupled multiprocessor system. There is a drawback that the function is impaired.

【０００５】もう１つの監視方法は、疎結合マルチプロ
セッサシステムを構成する複数のプロセッサをマスタプ
ロセッサとスレーブプロセッサに２分し、マスタープロ
セッサが複数のスレーブプロセッサの動作状態を監視す
るマスタースレーブ方法である。この監視方法は、前者
の方法のように監視のためのメッセージがマルチプロセ
ッサシステム内で多発されることがないので、システム
本来のデータ処理機能を維持することができる。Another monitoring method is a master-slave method in which a plurality of processors forming a loosely coupled multiprocessor system are divided into a master processor and a slave processor, and the master processor monitors the operating states of the plurality of slave processors. . This monitoring method can maintain the original data processing function of the system because the monitoring message does not occur frequently in the multiprocessor system unlike the former method.

【０００６】このため、比較的高速動作が要求されるシ
ステム、例えばデジタル交換器のような通信制御システ
ムを構成する疎結合マルチプロセッサシステムにおいて
は、監視処理方式として後者のマスタ・スレーブ方式を
利用することが有効である。Therefore, in a system requiring relatively high speed operation, for example, a loosely coupled multiprocessor system constituting a communication control system such as a digital exchange, the latter master / slave system is used as a monitoring processing system. Is effective.

【０００７】ところで、各プロセッサには、その障害状
態を自己検出して復帰する機能が設けられており、比較
的軽い障害の場合には自律的に自らをリセットすること
により障害から回復することができる。ところが、障害
の情況によっては自己検出することができず、その障害
から自律的に回復できない場合がある。特に、デジタル
交換器のような通信制御システムに使用されるプロセッ
サにおいては、通信回線から混入されるノイズ等によっ
てこのような自己検出できない障害が発生しやすい。By the way, each processor is provided with a function of self-detecting and recovering from the failure state, and in the case of a relatively minor failure, it is possible to recover from the failure by autonomously resetting itself. it can. However, depending on the circumstances of the failure, it may not be possible to self-detect and the failure may not be recovered autonomously. In particular, in a processor used in a communication control system such as a digital exchange, such a failure that cannot be self-detected is likely to occur due to noise mixed from a communication line.

【０００８】例えば、複数のスレーブプロセッサがデジ
タル交換器の交換モジュールとしてそれぞれ使用され、
マスタプロセッサがそれら交換モジュールを制御する制
御モジュールとして使用されている場合には、各交換モ
ジュールには通信回線が接続されるため、交換モジュー
ルを構成するスレーブプロセッサに障害が発生しやす
い。スレーブプロセッサにこのような自己検出できない
障害が発生した場合、マスタプロセッサからは、そのス
レーブプロセッサを復帰させる術がないので、その障害
からそのスレーブプロセッサを回復させるためにはスレ
ーブプロセッサをシステム管理者によって再起動させる
等の操作が必要とされる。このため、自動的に障害回復
を行なうことができないため、結果として、システム稼
働率の低下が引き起こされるという問題が生じる。For example, a plurality of slave processors are respectively used as exchange modules of a digital exchange,
When the master processor is used as a control module for controlling the exchange modules, a communication line is connected to each exchange module, so that a failure easily occurs in the slave processors constituting the exchange module. When a slave processor experiences such a failure that cannot be detected by itself, the master processor has no way of recovering the slave processor. Operations such as restarting are required. For this reason, failure recovery cannot be performed automatically, and as a result, there arises a problem that the system operation rate is lowered.

【０００９】[0009]

【発明が解決しようとする課題】従来の疎結合マルチプ
ロセッサシステムでは、スレーブプロセッサが自律的に
障害回復できない場合には、そのスレーブプロセッサを
システム管理者によって再起動させる等の人為的操作が
必要とされ、結果として、システム稼働率の低下が引き
起こされる欠点があった。In the conventional loosely coupled multiprocessor system, when the slave processor cannot autonomously recover from the failure, it is necessary to perform an artificial operation such as restarting the slave processor by the system administrator. As a result, there is a drawback that the system operation rate is lowered.

【００１０】この発明はこのような点に鑑みてなされた
もので、スレーブプロセッサが自律的に障害回復できな
いような障害が発生した場合に、マスタプロセッサによ
ってそのスレーブプロセッサを自動的に再起動できるよ
うにし、システム稼働率の向上を図ることができる疎結
合マルチプロセッサシステムを提供することを目的とす
る。The present invention has been made in view of the above circumstances, and when a failure occurs such that the slave processor cannot autonomously recover from the failure, the master processor can automatically restart the slave processor. It is an object of the present invention to provide a loosely coupled multiprocessor system capable of improving the system operating rate.

【００１１】[0011]

【課題を解決するための手段および作用】この発明は、
データ通信用バスを介して結合されたマスタプロセッサ
と複数のスレーブプロセッサとから構成される疎結合マ
ルチプロセッサシステムにおいて、前記マスタプロセッ
サと前記複数のスレーブプロセッサ間に配設され、それ
ら複数のスレーブプロセッサにリセット信号をそれぞれ
分配するためのリセット信号分配配線を具備し、前記マ
スタプロセッサに、前記データパッシングバスを介して
前記各スレーブプロセッサとの間でメッセージ受け渡し
を行うことよって各スレーブプロセッサの動作状態を監
視する監視手段と、この監視手段によって障害発生が検
出されたスレーブプロセッサに対して前記リセット信号
分配配線を介してリセット信号を送信するリセット信号
送信手段とを具備することを特徴とする。Means and Actions for Solving the Problems
In a loosely coupled multiprocessor system composed of a master processor and a plurality of slave processors coupled via a data communication bus, the master processor and the plurality of slave processors are arranged between the plurality of slave processors. A reset signal distribution wiring for distributing a reset signal is provided, and the operating state of each slave processor is monitored by passing messages to and from the slave processor via the data passing bus to the master processor. And a reset signal transmitting means for transmitting a reset signal to the slave processor whose failure is detected by the monitoring means through the reset signal distribution wiring.

【００１２】この疎結合マルチプロセッサシステムにお
いては、通常のデータ通信用バスに加えてリセット信号
分配配線が配設されている。マスタプロセッサは、複数
のスレーブプロセッサの動作状態を生存確認メッセージ
等のメッセージ受け渡しによって監視し、例えばスレー
ブプロセッサから応答がなくなった場合、そのスレーブ
プロセッサに対してリセット信号分配配線を介してリセ
ット信号を送信する。これにより、障害が発生している
スレーブプロセッサはリセットされ、その再起動を試み
ることができる。したがって、もしスレーブプロセッサ
が陥っている障害が間欠的なものであれば、リセットに
よってそのスレーブプロセッサは障害から回復されて疎
結合マルチプロセッサシステムの系に組み込まれ、結果
としてシステムの稼働率を向上させることができる。In this loosely coupled multiprocessor system, a reset signal distribution wiring is provided in addition to the usual data communication bus. The master processor monitors the operating states of a plurality of slave processors by passing messages such as survival confirmation messages, and, for example, when there is no response from the slave processor, it sends a reset signal to the slave processors via the reset signal distribution wiring. To do. This resets the failing slave processor and attempts to restart it. Therefore, if a slave processor is experiencing intermittent failures, a reset will recover the slave processor from the failure and integrate it into the loosely coupled multiprocessor system, resulting in improved system uptime. be able to.

【００１３】[0013]

【実施例】以下、図面を参照してこの発明の実施例を説
明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１４】図１にはこの発明の一実施例に係わる疎結
合マルチプロセッサシステムのシステム構成が示されて
いる。この疎結合マルチプロセッサシステムは、例えば
パケット交換器に適用されるものであり、メッセージデ
ータ送受信用のデータパッシングバス１０を介して相互
接続された複数のプロセッサ１１−１〜１１−ｎから構
成されている。これら複数のプロセッサ（Ｐ１〜Ｐｎ）
１１−１〜１１−ｎは、それぞれ別個のオペレーティン
グシステムによって動作する。また、これら複数のプロ
セッサ（Ｐ１〜Ｐｎ）１１−１〜１１−ｎ間のメッセー
ジ交換は、例えばＣＭＳＡ／ＣＤ方式等を利用して行な
われる。FIG. 1 shows the system configuration of a loosely coupled multiprocessor system according to an embodiment of the present invention. This loosely coupled multiprocessor system is applied to, for example, a packet switch, and is composed of a plurality of processors 11-1 to 11-n interconnected via a data passing bus 10 for transmitting / receiving message data. There is. These multiple processors (P1 to Pn)
11-1 to 11-n are operated by separate operating systems. The message exchange between the plurality of processors (P1 to Pn) 11-1 to 11-n is performed by using, for example, the CMSA / CD method.

【００１５】ここでは、プロセッサ（Ｐ１）１１−１は
このシステム全体の制御を司るマスタプロセッサとして
動作し、プロセッサ（Ｐ２〜Ｐｎ）１１−２〜１１−ｎ
の交換機能をメーセージの送受信によって制御する。プ
ロセッサ（Ｐ２〜Ｐｎ）１１−２〜１１−ｎはスレーブ
プロセッサとして動作するものであり、それぞれ対応す
る通信回線に結合され、その通信回線間の交換動作、お
よび他のスレーブプロセッサとの間の交換動作を行な
う。In this case, the processor (P1) 11-1 operates as a master processor that controls the entire system, and the processors (P2 to Pn) 11-2 to 11-n.
The exchange function of is controlled by sending and receiving messages. Each of the processors (P2 to Pn) 11-2 to 11-n operates as a slave processor, is connected to a corresponding communication line, performs an exchange operation between the communication lines, and exchanges with another slave processor. Take action.

【００１６】この疎結合マルチプロセッサシステムにお
ける各プロセッサの動作監視は、前述のマスタ・スレー
ブ方式によって実現される。すなわち、マスタプロセッ
サ（Ｐ１）１１−１は、データパッシングバス１０を介
してスレーブプロセッサ（Ｐ２〜Ｐｎ）１１−２〜１１
−ｎの各々に対し、生存確認の問い合わせメッセージを
送信し、生存を示す応答メッセージ“Ｉａｍｌｉｖ
ｅ”がそれらスレーブプロセッサ（Ｐ２〜Ｐｎ）１１−
２〜１１−ｎから一定期間内に返答されるか否かによっ
てそれらスレーブプロセッサ（Ｐ２〜Ｐｎ）１１−２〜
１１−ｎの障害を検出する。また、マスタプロセッサ
（Ｐ１）１１−１は、自律的に回復できない障害に陥っ
ているスレーブプロセッサに対して、リセット信号を供
給してそのスレーブプロセッサの再起動を試みる機能を
有している。The operation monitoring of each processor in this loosely coupled multiprocessor system is realized by the above-mentioned master / slave system. That is, the master processor (P1) 11-1 is connected to the slave processors (P2 to Pn) 11-2 to 11 via the data passing bus 10.
For each of the -n, an inquiry message for existence confirmation is transmitted, and a response message "I am liv" indicating existence is present.
e ”are those slave processors (P2 to Pn) 11-
2-11-n, the slave processors (P2-Pn) 11-2 ...
11-n fault is detected. Further, the master processor (P1) 11-1 has a function of supplying a reset signal to a slave processor which has fallen into a failure that cannot be autonomously recovered and tries to restart the slave processor.

【００１７】具体的には、マスタプロセッサ（Ｐ１）１
１−１は、一定時間間隔で生存確認の問い合わせメッセ
ージをスレーブプロセッサ（Ｐ２〜Ｐｎ）１１−２〜１
１−ｎに順番に送信し、問い合わせメッセージを送信し
てから生存を示す応答メッセージが返送されるまでの時
間を監視する。一定期間内に生存を示す応答メッセージ
が返送されなかった場合、マスタプロセッサ（Ｐ１）１
１−１は、その応答メッセージを返送しないスレーブプ
ロセッサに障害が発生していると見なす。そして、その
障害検知時点からさらに時間監視を行ない、一定期間内
に回復を示すメッセージがそのスレーブプロセッサから
送信されるか否かを調べ、回復メッセージがない場合に
はそのスレーブプロセッサが自律的に回復できない障害
に陥っていると見なし、そのスレーブプロセッサにリセ
ット信号を供給する。このリセット信号の供給は、以下
説明するリセット信号分配器１３、およびリセット信号
線１２−２，１２−３，…１２−ｎを利用して行なわれ
る。Specifically, the master processor (P1) 1
1-1 is a slave processor (P2 to Pn) 11-2 to 1 to send an inquiry message for confirmation of existence at a fixed time interval.
1-n are sequentially transmitted, and the time from the transmission of the inquiry message to the return of the response message indicating existence is monitored. When the response message indicating the existence is not returned within the fixed period, the master processor (P1) 1
1-1 considers that the slave processor which does not send back the response message has a failure. Then, by further monitoring the time from the time when the failure is detected, it is checked whether a message indicating recovery is transmitted from the slave processor within a certain period, and if there is no recovery message, the slave processor recovers autonomously. It considers that it has fallen into the impossible failure and supplies a reset signal to the slave processor. The supply of the reset signal is performed using the reset signal distributor 13 and the reset signal lines 12-2, 12-3, ... 12-n described below.

【００１８】すなわち、この疎結合マルチプロセッサシ
ステムには、リセット信号線１２−１〜１２−ｎと、リ
セット信号分配器１３が設けられている。リセット信号
線１２−１〜１２−ｎは、複数のスレーブプロセッサ１
１−２〜１１−ｎそれぞれに対してリセット信号を分配
するために配設された専用のリセット信号分配配線であ
る。リセット信号線１２−１は、スレーブプロセッサ
（Ｐ２）１１−２に接続され、そのスレーブプロセッサ
（Ｐ２）１１−２にリセット信号を供給するために使用
される。同様に、リセット信号線１２−２はスレーブプ
ロセッサ（Ｐ３）１１−３に接続され、そのスレーブプ
ロセッサ（Ｐ３）１１−３にリセット信号を供給するた
めに使用され、またリセット信号線１２−ｎはスレーブ
プロセッサ（Ｐｎ）１１−ｎに接続され、そのスレーブ
プロセッサ（Ｐｎ）１１−ｎにリセット信号を供給する
ために使用される。That is, the loosely coupled multiprocessor system is provided with reset signal lines 12-1 to 12-n and a reset signal distributor 13. The reset signal lines 12-1 to 12-n are connected to the plurality of slave processors 1
It is a dedicated reset signal distribution wiring arranged to distribute the reset signal to each of 1-2 to 11-n. The reset signal line 12-1 is connected to the slave processor (P2) 11-2 and is used for supplying a reset signal to the slave processor (P2) 11-2. Similarly, the reset signal line 12-2 is connected to the slave processor (P3) 11-3 and is used for supplying a reset signal to the slave processor (P3) 11-3, and the reset signal line 12-n is used. It is connected to the slave processor (Pn) 11-n and is used to supply a reset signal to the slave processor (Pn) 11-n.

【００１９】リセット信号分配器１３は、マスタプロセ
ッサ（Ｐ１）１１−１からのリセット要求（リセットア
ドレス、リセットトリガ信号）に応じてリセット信号線
１２−１〜１２−ｎに選択的にリセット信号を送出す
る。具体的には、リセット信号分配器１３は、まず、マ
スタプロセッサ（Ｐ１）１１−１からのリセットアドレ
スをデゴードし、そのデゴート結果にしたがってリセッ
ト信号線１２−１〜１２−ｎの１つを選択する。そし
て、リセット信号分配器１３は、マスタプロセッサ（Ｐ
１）１１−１からのリセットトリガに応答してリセット
パルスを生成し、選択したリセット信号線にそのパルス
をリセット信号として供給する。The reset signal distributor 13 selectively outputs a reset signal to the reset signal lines 12-1 to 12-n in response to a reset request (reset address, reset trigger signal) from the master processor (P1) 11-1. Send out. Specifically, the reset signal distributor 13 first degodes the reset address from the master processor (P1) 11-1, and selects one of the reset signal lines 12-1 to 12-n according to the degault result. To do. The reset signal distributor 13 is a master processor (P
1) A reset pulse is generated in response to the reset trigger from 11-1, and the pulse is supplied to the selected reset signal line as a reset signal.

【００２０】図２には、リセット信号分配器１３の具体
的回路構成の一例が示されている。図示のように、リセ
ット信号分配器１３は、デコーダ１３１、リセットパル
ス発生回路１３２、およびＡＮＤゲートＧ２〜Ｇｎを備
えている。デコーダ１３１からの出力信号線はＡＮＤゲ
ートＧ２〜Ｇｎの各第１入力にそれぞれ接続され、また
これらＡＮＤゲートＧ２〜Ｇｎの各第２入力にはリセッ
トパルス発生回路１３２からのリセットパルスが共通に
供給される。ＡＮＤゲートＧ２〜Ｇｎの出力は、それぞ
れリセット信号線１２−２〜１２−ｎに接続されてい
る。FIG. 2 shows an example of a concrete circuit configuration of the reset signal distributor 13. As illustrated, the reset signal distributor 13 includes a decoder 131, a reset pulse generation circuit 132, and AND gates G2 to Gn. An output signal line from the decoder 131 is connected to each first input of the AND gates G2 to Gn, and a reset pulse from the reset pulse generation circuit 132 is commonly supplied to each second input of the AND gates G2 to Gn. To be done. The outputs of the AND gates G2 to Gn are connected to the reset signal lines 12-2 to 12-n, respectively.

【００２１】自律的に回復できない障害に陥っているス
レーブプロセッサの存在が検出された際、マスタプロセ
ッサ（Ｐ１）１１−１は、そのスレーブプロセッサを示
すアドレスとリセットトリガ信号をリセット信号分配器
１３に供給する。リセット信号分配器１３においては、
デコーダ１３１は、マスタプロセッサ（Ｐ１）１１−１
からのリセットアドレスをデゴードし、そのデゴート結
果にしたがってリセット信号線１２−１〜１２−ｎの１
つを選択するためのイネーブル信号をＡＮＤゲートＧ２
〜Ｇｎの１つに供給する。また、リセットパルス発生回
路１３２は、マスタプロセッサ（Ｐ１）１１−１からの
リセットトリガ信号に応答してリセットパルスを生成す
る。このリセットパルスは、全てのＡＮＤゲートＧ２〜
Ｇｎの第２入力に共通に供給される。When the presence of a slave processor which has fallen into a failure that cannot be recovered autonomously is detected, the master processor (P1) 11-1 sends an address indicating the slave processor and a reset trigger signal to the reset signal distributor 13. Supply. In the reset signal distributor 13,
The decoder 131 is a master processor (P1) 11-1.
The reset address from the reset signal line 12-1 to 12-n 1
AND gate G2 for enabling signal for selecting one
~ Gn. Further, the reset pulse generation circuit 132 generates a reset pulse in response to the reset trigger signal from the master processor (P1) 11-1. This reset pulse is applied to all AND gates G2 to G2.
It is commonly supplied to the second input of Gn.

【００２２】イネーブル信号を受けとったＡＮＤゲート
Ｇ２〜Ｇｎの１つは、リセットパルスを対応するリセッ
ト信号線１２−２〜１２−ｎの１つにリセット信号とし
て供給する。このリセット信号は、自律的に回復できな
い障害に陥っているスレーブプロセッサに送られる。次
に、図３を参照して、図１の疎結合マルチプロセッサシ
ステムの監視処理動作を説明する。One of the AND gates G2 to Gn receiving the enable signal supplies a reset pulse to one of the corresponding reset signal lines 12-2 to 12-n as a reset signal. This reset signal is sent to the slave processor which has fallen into a failure that cannot be autonomously recovered. Next, the monitoring processing operation of the loosely coupled multiprocessor system of FIG. 1 will be described with reference to FIG.

【００２３】マスタプロセッサ（Ｐ１）１１−１は、ス
レーブプロセッサ（Ｐ２〜Ｐｎ）１１−２〜１１−ｎそ
れぞれに対して定期的に生存確認のための問い合わせメ
ッセージを送信する。正常動作中のスレーブプロセッサ
（Ｐ２〜Ｐｎ）１１−２〜１１−ｎは、生存を示す応答
メッセージ“ＩａｍＬｉｖｅ”をそれぞれに送信す
る。マスタプロセッサ（Ｐ１）１１−１においては、生
存確認のための問い合わせメッセージを送信してから生
存を示す応答メッセージが返却されるまでの時間が各ス
レーブプロセッサ（Ｐ２〜Ｐｎ）１１−２〜１１−ｎ毎
に監視される。The master processor (P1) 11-1 periodically sends an inquiry message for existence confirmation to each of the slave processors (P2 to Pn) 11-2 to 11-n. The slave processors (P2 to Pn) 11-2 to 11-n that are operating normally send a response message "I am Live" indicating that they are alive. In the master processor (P1) 11-1, the time from the transmission of the inquiry message for confirming the existence to the return of the response message indicating the existence of each slave processor (P2 to Pn) 11-2 to 11- Monitored every n.

【００２４】そして、時間Ｔ１以内に生存を示す応答メ
ッセージが返却されるか否かによって、各スレーブプロ
セッサの障害発生の有無が検出される。スレーブプロセ
ッサの障害発生が検出されると、マスタプロセッサ（Ｐ
１）１１−１は、システム全体としての動作に支障が来
されないように、障害に陥っているスレーブプロセッサ
を系から外したり、他のスレーブプロセッサに対して障
害に陥っているスレーブプロセッサ名を通知する等の処
理を行なう。さらに、マスタプロセッサ（Ｐ１）１１−
１は、障害発生時から時間Ｔ２以内に回復メーセージが
通知されるか否かを検出するために、障害が発生したス
レーブプロセッサについて時間監視を継続する。Whether or not a failure has occurred in each slave processor is detected depending on whether or not a response message indicating existence is returned within the time T1. When the failure of the slave processor is detected, the master processor (P
1) 11-1 removes the faulty slave processor from the system and notifies other slave processors of the faulty slave processor name so that the operation of the entire system is not hindered. Perform processing such as Furthermore, the master processor (P1) 11-
In order to detect whether or not the recovery message is notified within the time T2 from the time of occurrence of the failure, 1 continues the time monitoring of the failed slave processor.

【００２５】障害発生したスレーブプロセッサにおいて
は、ＷＤＴ（ウォッチドッグタイマ）等の機能によって
自律的にその障害を検知して自己立ち上げするための機
能が動作される。もし、自律的に回復できる障害であれ
ば、障害発生から一定期間後に、ＷＤＴ機能によって再
起動がなされる。これにより、正常動作に復帰したスレ
ーブプロセッサは、回復メーセージをマスタプロセッサ
（Ｐ１）１１−１に通知する。一方、自律的に回復でき
ない障害に陥っている場合には、ＷＤＴによる自己立ち
上げは実行されない。In the slave processor in which a failure has occurred, a function such as a WDT (watchdog timer) function for autonomously detecting the failure and starting itself is operated. If the failure can be recovered autonomously, the WDT function restarts after a certain period of time from the occurrence of the failure. As a result, the slave processor that has returned to the normal operation notifies the master processor (P1) 11-1 of the recovery message. On the other hand, when a failure that cannot be recovered autonomously occurs, self-startup by WDT is not executed.

【００２６】マスタプロセッサ（Ｐ１）１１−１は、障
害が発生したスレーブプロセッサから時間Ｔ２以内に回
復メーセージが通知されると、回復したスレーブプロセ
ッサ名を他のスレーブプロセッサに通知する等の処理を
行ない、そのスレーブプロセッサを系に組み込む。ま
た、時間Ｔ２以内に回復メーセージが通知されない場合
には、マスタプロセッサ（Ｐ１）１１−１は、そのスレ
ーブプロセッサが陥っている障害が自律的に回復できな
い障害であると判断し、そのスレーブプロセッサに対応
するアドレスとリセットトリガ信号をリセット信号分配
器１３に供給する。これにより、リセット信号分配器１
３、および対応するリセット信号線を介して、自律的に
回復できない障害に陥っているスレーブプロセッサに対
してリセット信号が供給され、強制的な再起動処理が試
みられる。そして、マスタプロセッサ（Ｐ１）１１−１
は、障害に陥っているスレーブプロセッサから回復メー
ジが供給された時点で、そのスレーブプロセッサを系に
組み込む処理を行なう。When the recovery message is notified from the failed slave processor within the time T2, the master processor (P1) 11-1 performs processing such as notifying other slave processors of the recovered slave processor name. , Incorporate the slave processor into the system. Further, when the recovery message is not notified within the time T2, the master processor (P1) 11-1 determines that the failure that the slave processor has fallen into is a failure that cannot be recovered autonomously, and notifies the slave processor. The corresponding address and the reset trigger signal are supplied to the reset signal distributor 13. As a result, the reset signal distributor 1
3 and the corresponding reset signal line, a reset signal is supplied to the slave processor that is in a failure that cannot be autonomously recovered, and a forced restart process is attempted. Then, the master processor (P1) 11-1
Performs the process of incorporating the slave processor into the system when the recovery image is supplied from the slave processor that has fallen into trouble.

【００２７】以上のように、この実施例においては、デ
ータ通信用のデータパッシングバス１０に加えて、リセ
ット信号線１２−２〜１２−ｎが配設されている。マス
タプロセッサ１１−１は、複数のスレーブプロセッサ１
１−２〜１１−ｎの動作状態を生存確認メッセージ等の
メッセージ受け渡しによって監視し、例えばスレーブプ
ロセッサから応答がなくなった場合、そのスレーブプロ
セッサに対してリセット信号線１２−２〜１２−ｎを介
してリセット信号を送信する。これにより、障害が発生
しているスレーブプロセッサはリセットされ、その再起
動を試みることができる。したがって、もしスレーブプ
ロセッサが陥っている障害がノイズ等による間欠的なも
のであれば、人手を介さずに、リセットによってそのス
レーブプロセッサを障害から回復することができ、結果
としてシステムの稼働率を向上させることができる。As described above, in this embodiment, the reset signal lines 12-2 to 12-n are provided in addition to the data passing bus 10 for data communication. The master processor 11-1 includes a plurality of slave processors 1
The operating states of 1-2 to 11-n are monitored by passing a message such as a survival confirmation message. For example, when there is no response from the slave processor, the slave processor is reset signal lines 12-2 to 12-n. To send a reset signal. This resets the failing slave processor and attempts to restart it. Therefore, if the failure that the slave processor is experiencing is intermittent due to noise, etc., the slave processor can be recovered from the failure by resetting without human intervention, and as a result, the operating rate of the system is improved. Can be made.

【００２８】[0028]

【発明の効果】以上詳記したようにこの発明によれば、
スレーブプロセッサが自律的に障害回復できないような
障害が発生した場合に、マスタプロセッサによってその
スレーブプロセッサを自動的に再起動できるようにな
り、システム稼働率の向上を図ることができる。As described above in detail, according to the present invention,
When a failure occurs such that the slave processor cannot autonomously recover from the failure, the master processor can automatically restart the slave processor, and the system operating rate can be improved.

[Brief description of drawings]

【図１】この発明の一実施例に係る疎結合マルチプロセ
ッサシステムの構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of a loosely coupled multiprocessor system according to an embodiment of the present invention.

【図２】同実施例の疎結合マルチプロセッサシステムに
設けられたリセット信号分配器の具体的な構成を示す回
路図。FIG. 2 is a circuit diagram showing a specific configuration of a reset signal distributor provided in the loosely coupled multiprocessor system of the embodiment.

【図３】同実施例の疎結合マルチプロセッサシステムに
おける動作監視処理動作の流れを説明するための図。FIG. 3 is a diagram for explaining the flow of operation monitoring processing operation in the loosely coupled multiprocessor system of the embodiment.

[Explanation of symbols]

１０…データパッシングバス、１１−１…マスタプロセ
ッサ、１１−２〜１１−ｎ…スレーブプロセッサ、１２
−２〜１２−ｎ…リセット信号線、１３…リセット信号
分配器。10 ... Data passing bus, 11-1 ... Master processor, 11-2 to 11-n ... Slave processor, 12
-2 to 12-n ... Reset signal line, 13 ... Reset signal distributor.

Claims

[Claims]

1. A loosely coupled multiprocessor system comprising a master processor and a plurality of slave processors coupled via a data communication bus, wherein the loosely coupled multiprocessor system is arranged between the master processor and the plurality of slave processors. Each slave processor is provided with a reset signal distribution wiring for distributing a reset signal to each of the plurality of slave processors, and by passing a message to and from each of the slave processors via the data passing bus to the master processor. And a reset signal transmitting means for transmitting a reset signal to the slave processor whose failure is detected by the monitoring means through the reset signal distribution wiring. Loosely coupled multi-pro Ssashisutemu.