JP2001022599A

JP2001022599A - Fault tolerant system, fault tolerant processing method, and fault tolerant control program recording medium

Info

Publication number: JP2001022599A
Application number: JP11191135A
Authority: JP
Inventors: Kozo Matsushita; 耕三松下; Shinji Minazu; 真治水津; Jiro Okamoto; 二朗岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-07-06
Filing date: 1999-07-06
Publication date: 2001-01-26

Abstract

(57)【要約】【課題】マルチプロセッサシステムにおいて，あるＣ
ＰＵが故障した場合に，そのＣＰＵの処理を他のＣＰＵ
がタスク単位で引き継ぐときに，どのＣＰＵが引き継ぐ
かを簡単に指定することができるようにすることを目的
とする。【解決手段】引継判断部11は, 他のＣＰＵの故障を検
出すると, 再構成テーブル2 中のタスクを引き継ぐＣＰ
Ｕ番号の系列である遷移情報と，現在遷移位置テーブル
4 中の現在処理しているＣＰＵを示す現在遷移位置か
ら, 故障したＣＰＵで稼働していたタスクを探し, その
タスクを引き継ぐ次のＣＰＵを調べる。引き継ぐＣＰＵ
が自ＣＰＵであれば, 待機タスク起動部12は，そのタス
クの待機用タスク3 を起動し, 処理の引き継ぎを行う。
現在遷移位置更新部13は, 現在遷移位置テーブル4 の現
在遷移位置を更新する。タスクを引き継ぐＣＰＵがない
場合，処理選択情報に基づきタスクまたはシステム全体
を停止させる。 (57) [Summary] [PROBLEMS] In a multiprocessor system, a C
When a PU fails, the processing of that CPU is performed by another CPU.
It is an object of the present invention to be able to easily specify which CPU takes over when a task takes over in task units. When a failure of another CPU is detected, a takeover determination unit 11 takes over a task in a reconfiguration table 2.
Transition information as a series of U numbers and current transition position table
From the current transition position indicating the currently processing CPU in 4, a task running on the failed CPU is searched, and the next CPU taking over the task is checked. CPU to take over
If is the own CPU, the standby task activating unit 12 activates the standby task 3 of the task and takes over the processing.
The current transition position updating unit 13 updates the current transition position in the current transition position table 4. If there is no CPU taking over the task, the task or the entire system is stopped based on the process selection information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は，複数のプロセッサ
（以下，ＣＰＵという）を搭載したシステムにおけるフ
ォールトトレラント技術に関し，特に，あるＣＰＵが故
障した場合に，そのＣＰＵが行っていた処理を他の正常
なＣＰＵがタスク単位で引き継ぐとき，どのＣＰＵが引
き継ぐのかを正常なＣＰＵが最後の一つになるまで，容
易に指定できるようにしたマルチＣＰＵシステムにおけ
るフォールトトレラント・システム，フォールトトレラ
ント処理方法およびフォールトトレラント制御用プログ
ラム記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a fault-tolerant technique in a system equipped with a plurality of processors (hereinafter, referred to as CPUs), and more particularly, to the processing performed by a CPU when another CPU fails. A fault-tolerant system, a fault-tolerant processing method, and a fault-tolerant system in a multi-CPU system in which when a normal CPU takes over in a task unit, it is possible to easily specify which CPU takes over until the normal CPU becomes the last one. The present invention relates to a tolerant control program recording medium.

【０００２】[0002]

【従来の技術】マルチＣＰＵシステムにおいて，あるＣ
ＰＵが故障したときに，システムを停止することなく，
他の正常なＣＰＵによってその処理を継続できるように
するため，例えば，以下のようなフォールトトレラント
技術がある。2. Description of the Related Art In a multi-CPU system, a C
When the PU fails, without stopping the system,
In order to allow the processing to be continued by another normal CPU, for example, there is the following fault-tolerant technique.

【０００３】（１）「相互ホットスタンバイシステム待
機系選択方式（特開平９−８１４０９号公報）」は，電
子計算機の障害により稼働系処理機能に対応付けられた
待機系処理機能が存在しなくなったとき，操作員の介入
なしに自動的に対応付け可能な待機系処理機能を選択し
て対応付け，ホットスタンバイ関係を構築するためのも
のであり，自電子計算機で動作する稼働系処理機能の識
別子，稼働系処理機能とホットスタンバイ関係を構築す
るように対応付けられた待機系処理機能の識別子および
該待機系処理機能が動作する電子計算機の番号からなる
情報と，自電子計算機で動作する待機系処理機能の識別
子，待機系処理機能とホットスタンバイ関係を構築する
ように対応付けられた稼働系処理機能の識別子および該
稼働系処理機能が動作する電子計算機の番号からなる情
報とを登録する処理機能管理表を備えるとともに，電子
計算機の障害発生時に，対応付けられた待機系処理機能
がなくなった稼働系処理機能からの要求に応じて，前記
処理機能管理表を参照して他の電子計算機の対応付けら
れた稼働系処理機能のない待機系処理機能を選択して要
求元の稼働系処理機能に対応付け，自動的にホットスタ
ンバイ関係を構築させる処理機能管理手段とを備える。[0003] (1) In the "mutual hot standby system standby system selection method" (Japanese Patent Laid-Open No. Hei 9-81409), a standby system processing function associated with an active system processing function no longer exists due to a failure in an electronic computer. When a standby processing function that can be automatically associated without operator intervention is selected and associated to establish a hot standby relationship, the identifier of the active processing function that runs on the local computer Information including an identifier of a standby processing function and a number of an electronic computer on which the standby processing function operates, which is associated with the active processing function so as to establish a hot standby relationship, and a standby system operating on its own computer. The identifier of the processing function, the identifier of the active processing function associated to establish a hot standby relationship with the standby processing function, and the active processing function In addition to a processing function management table for registering information consisting of the number of the computer to be created, when a failure occurs in the computer, in response to a request from an active processing function that has no associated standby processing function, With reference to the processing function management table, a standby processing function having no active processing function associated with another computer is selected and associated with the active processing function of the request source, and the hot standby relationship is automatically established. Processing function management means to be constructed.

【０００４】（２）また，「業務引継システム（特願平
９−５２８７９４号）」は，ホットスタンバイの形態や
ロードシェアの形態で，複数の処理装置により業務処理
の実行を行うシステムにおいて，特に多大なプログラミ
ングを必要とせずに障害発生時の効率的な引き継ぎがで
きるような業務引継システムを提供することを目的とし
て，各処理装置が業務についての現用系であるか待機系
であるかを表すテーブルを保持し，業務についての現用
系である処理装置の障害発生時に，該障害が発生した処
理装置における業務に係る処理を，前記テーブルを参照
して当該業務の待機系となる処理装置に引き継ぐように
するものである。(2) A "business takeover system (Japanese Patent Application No. 9-528794)" is a system in which business processing is executed by a plurality of processing devices in a hot standby mode or a load sharing mode. In order to provide a business takeover system that can take over efficiently in the event of a failure without requiring a great deal of programming, it indicates whether each processing unit is an active system or a standby system for business The table is retained, and when a failure occurs in the processing unit that is the active system for the business, the processing related to the business in the processing unit in which the failure has occurred is transferred to the processing unit serving as the standby system of the business with reference to the table Is to do so.

【０００５】[0005]

【発明が解決しようとする課題】しかし，従来，多数の
ＣＰＵが次々に故障し続けたとき，故障が生じたＣＰＵ
の処理をどのＣＰＵが引き継ぐのかを容易に指定する方
法はなかった。However, conventionally, when a large number of CPUs continue to fail one after another, the failed CPU
There is no method for easily designating which CPU takes over the above processing.

【０００６】また，上記（１）の技術では，処理機能管
理表により，稼働系処理識別子と待機系識別子を管理す
る必要がある。また，この技術では，処理を引き継ぐ計
算機が動的に定まるので，事前に処理を引き継ぐＣＰＵ
を最後の１台まで静的に定めることができず，さらに，
すべてのＣＰＵが同等なハードウェアなどの機能を有し
ている必要があるという問題があった。In the technique (1), it is necessary to manage the active system processing identifier and the standby system identifier using the processing function management table. Also, in this technology, the computer that takes over the processing is dynamically determined, so the CPU that takes over the processing in advance
Cannot be statically determined until the last one,
There is a problem that all CPUs need to have functions such as equivalent hardware.

【０００７】また，上記（２）の技術では，全クラスタ
のテーブルを作成する必要があり，全クラスタが故障す
るまで引き継がせたい業務があれば，すべてのテーブル
に待機業務を記さなければならない。また，処理を引き
継ぐクラスタが待機系を用意している中のどのクラスタ
になるかは不明であり，優先度を付けてクラスタに処理
を引き継がせるような設定をすることができないという
問題があった。In the technique (2), it is necessary to create tables for all clusters, and if there is a task to be taken over until all clusters fail, the standby task must be described in all tables. In addition, it is unknown which cluster takes over the processing while the standby cluster is being prepared, and there is a problem that it is not possible to assign a priority and make settings so that the cluster can take over the processing. .

【０００８】本発明は上記問題点の解決を図り，マルチ
ＣＰＵシステムにおいて，あるＣＰＵが故障した場合
に，そのＣＰＵの処理をタスク単位で引き継ぐとき，ど
のＣＰＵが引き継ぐのかを正常なＣＰＵが最後の１台に
なるまで，容易に指定することができる手段を提供する
ことを目的とする。The present invention solves the above problem. In a multi-CPU system, when a certain CPU breaks down, when the processing of the CPU takes over in units of tasks, the last CPU determines which CPU takes over. It is an object of the present invention to provide a means that can be easily designated until one device is used.

【０００９】[0009]

【課題を解決するための手段】本発明は，複数のＣＰＵ
を搭載したシステムにおいて，各ＣＰＵは，故障したＣ
ＰＵで動作していたタスクをどのＣＰＵによって引き継
ぐのかを，タスクごとにＣＰＵ番号の系列によって記し
た情報を持つ再構成テーブルを持つ。故障したＣＰＵを
発見した場合，それを発見したＣＰＵは，他の全ＣＰＵ
に対し故障したＣＰＵのＣＰＵ番号（各ＣＰＵを識別す
るための識別番号）を知らせる。各ＣＰＵは，再構成テ
ーブルをもとに，故障したＣＰＵ上で動作していたタス
クを知り，そのタスクを引き継ぐＣＰＵが自分であるか
どうかを判断する。自ＣＰＵが引き継ぐように指定され
ているタスクであった場合，そのタスクを引き継ぐため
に待機していたタスクを起動させる。According to the present invention, a plurality of CPUs are provided.
In a system equipped with a
It has a reconfiguration table having information describing, by task, which CPU has taken over a task operating on a PU by a series of CPU numbers. When a failed CPU is found, the CPU that discovers it is all other CPUs.
To the CPU number of the failed CPU (identification number for identifying each CPU). Each CPU knows the task running on the failed CPU based on the reconfiguration table, and determines whether or not the CPU taking over the task is itself. If the task is designated to be taken over by the own CPU, the task that has been waiting to take over the task is started.

【００１０】この引き継ぎのため，再構成テーブルに
は，タスクがどのＣＰＵによって引き継がれていくのか
をＣＰＵ番号の羅列で表現するだけでよいので，容易に
引き継いでいくＣＰＵを指定することができる。また，
現在どのＣＰＵ上でタスクが実行されているのかを覚え
ておくことにより，簡単に引き継ぐＣＰＵがどのＣＰＵ
であるかを調べることができる。[0010] For this handover, the CPU to be handed over can be easily specified in the reconfiguration table, because it is only necessary to express the CPUs that take over tasks by a list of CPU numbers. Also,
By remembering on which CPU the task is currently being executed, the CPU that easily takes over
Can be checked.

【００１１】また，再構成テーブルには，引き継ぐＣＰ
Ｕが存在しなくなった場合に，このタスクだけを停止さ
せるのか，またはシステム全体として停止させるのかを
指定する情報を持たせておく。これによって，タスクだ
けの停止，システム全体の停止を選択することができ
る。[0011] In the reconfiguration table, the CP
When U no longer exists, information for designating whether to stop only this task or the entire system is stored. With this, it is possible to select stopping only the task or stopping the entire system.

【００１２】なお，ここでいうＣＰＵ（central proces
sing unit ）には，ＭＰＵ（microprocessing unit）も
含まれる。また，ここでいうタスクは，ＣＰＵで実行さ
れる処理の単位を意味し，プロセスと呼ばれるような処
理単位も含まれる広い概念のものである。The CPU (central process) referred to here
The sing unit includes an MPU (microprocessing unit). The task here means a unit of processing executed by the CPU, and has a broad concept including a processing unit called a process.

【００１３】以上の各処理手段をコンピュータによって
実現するためのプログラムは，コンピュータが読み取り
可能な可搬媒体メモリ，半導体メモリ，ハードディスク
などの適当な記録媒体に格納することができる。A program for realizing each of the above processing means by a computer can be stored in an appropriate recording medium such as a computer-readable portable medium memory, a semiconductor memory, and a hard disk.

【００１４】[0014]

【発明の実施の形態】図１は，本発明の概要を示すブロ
ック構成図である。本実施の形態では，マルチＣＰＵシ
ステムにおいて，ＣＰＵごとに，フォールトトレラント
処理手段１および再構成テーブル２を備える。また，ど
のＣＰＵもアクセスすることができる記憶領域に，現在
遷移位置テーブル４を備える。FIG. 1 is a block diagram showing an outline of the present invention. In the present embodiment, a multi-CPU system includes a fault-tolerant processing unit 1 and a reconfiguration table 2 for each CPU. Further, a current transition position table 4 is provided in a storage area accessible by any CPU.

【００１５】再構成テーブル２は，タスクごとに，各タ
スクを識別するタスクＩＤ（識別子）と，故障が生じた
ＣＰＵ上で実行されていたタスクを，どのＣＰＵによっ
て引き継ぐのかをＣＰＵ番号（各ＣＰＵを識別するため
のシーケンシャルな番号）の系列で指定する遷移情報
と，引き継ぐＣＰＵが存在しない場合に，そのタスクだ
けを停止するか，システム全体を停止するかを指定する
処理選択情報を持つ。The reconfiguration table 2 contains, for each task, a task ID (identifier) for identifying each task, and a CPU number (for each CPU) indicating which CPU is to take over the task being executed on the failed CPU. (Sequential numbers for identifying the IDs) and process selection information for specifying whether to stop only the task or the entire system when there is no CPU to take over.

【００１６】現在遷移位置テーブル４は，再構成テーブ
ル２に記憶されている各タスクが現在それぞれどのＣＰ
Ｕで稼働しているかを，再構成テーブル２における遷移
情報の系列の位置で示す現在遷移位置情報を保持する。The current transition position table 4 indicates which task each task stored in the reconfiguration table 2 is currently using.
The current transition position information indicating whether or not the operation is performed by U is indicated by the position of the sequence of the transition information in the reconfiguration table 2.

【００１７】フォールトトレラント処理手段１は，他Ｃ
ＰＵに故障が生じた場合に，再構成テーブル２の遷移情
報をもとに，その故障したＣＰＵ上で実行中のタスク
が，自分（自ＣＰＵ）が引き継ぐものかどうかを判断す
る引継判断部１１と，そのタスクの処理を自ＣＰＵが引
き継ぐ場合に，そのタスクの待機用タスク３を起動し
て，引き継ぎを行う待機タスク起動部１２と，再構成テ
ーブル２の現在遷移位置を自ＣＰＵを示す位置に更新す
る現在遷移位置更新部１３とを備える。待機用タスク３
は，あらかじめ生成しておいてもよく，また必要になっ
たときに新たに生成して，制御を渡すことにより起動し
てもよい。The fault-tolerant processing means 1 includes another C
When a failure occurs in a PU, a transition determination unit 11 that determines, based on transition information in the reconfiguration table 2, whether a task being executed on the failed CPU is to be taken over by itself (own CPU). When the own CPU takes over the processing of the task, the standby task 3 for activating and taking over the standby task 3 of the task, and a position indicating the current transition position of the reconfiguration table 2 indicating the own CPU And a current transition position updating unit 13 for updating the current transition position. Waiting task 3
May be generated in advance, or may be newly generated when necessary and activated by passing control.

【００１８】図２は，本発明の処理動作の流れの例を示
す。この例では，ＣＰＵ１〜ＣＰＵ４の番号を付与され
た４つのＣＰＵからなるマルチプロセッサシステムにお
いて，タスクＩＤが１０１のタスク（以下，タスク１０
１と表記する），タスクＩＤが２０１のタスク（以下，
タスク２０１と表記する），タスクＩＤが２０２のタス
ク（以下，タスク２０２と表記する），タスクＩＤが３
０１のタスク（以下，タスク３０１と表記する）の４つ
のタスクが実行されているとする。FIG. 2 shows an example of the flow of the processing operation of the present invention. In this example, in a multiprocessor system including four CPUs numbered from CPU1 to CPU4, a task having a task ID of 101 (hereinafter, task 10) is executed.
1), and a task with a task ID of 201 (hereinafter, referred to as a task ID 201).
A task having a task ID of 202 (hereinafter, referred to as a task 202) and a task ID of 3
It is assumed that four tasks of task 01 (hereinafter referred to as task 301) are being executed.

【００１９】再構成テーブル２には，あらかじめ各タス
クごとに，タスクＩＤと障害発生時にそのタスクを引き
継ぐＣＰＵ番号の順序を遷移情報として設定しておく。
また，引き継ぐＣＰＵがない場合にタスクだけを停止さ
せるのか，システム全体として停止させるのかを示す処
理選択の情報を設定しておく。タスク１０１のタスクを
例に説明すると，再構成テーブル２には，それを実行し
ているＣＰＵの障害時に，ＣＰＵ１→ＣＰＵ２→ＣＰＵ
３→ＣＰＵ４の順に処理を引き継ぐことが設定され，引
き継ぐＣＰＵがない場合にはシステム全体の処理を停止
するように設定されている。In the reconfiguration table 2, a task ID and an order of CPU numbers which take over the task in the event of a failure are set in advance for each task as transition information.
In addition, processing selection information indicating whether to stop only the task when there is no CPU to be taken over or to stop the entire system is set. If the task of the task 101 is described as an example, the reconfiguration table 2 stores the CPU 1 → CPU 2 → CPU
It is set to take over the processing in the order of 3 → CPU4, and if there is no CPU to take over, the processing of the entire system is set to be stopped.

【００２０】現在遷移位置テーブル４には，再構成テー
ブル２に登録されている各タスクごとに，現在実行して
いるＣＰＵが遷移情報中の先頭から何番目であるかを示
す情報が，現在遷移位置として格納されている。故障し
ているＣＰＵがない初期状態においては，現在遷移位置
はすべて「１」である。すなわち，故障発生前では，タ
スク１０１はＣＰＵ１上で，タスク２０１，２０２はＣ
ＰＵ２上で，タスク３０１はＣＰＵ３上で，それぞれ実
行されていることが示されている。The current transition position table 4 contains, for each task registered in the reconfiguration table 2, information indicating the number of the currently executing CPU from the beginning of the transition information in the current transition position table. Stored as position. In the initial state where there is no failed CPU, the current transition positions are all “1”. That is, before a failure occurs, the task 101 is on the CPU 1 and the tasks 201 and 202 are
It is shown that the task 301 is being executed on the CPU 3 on the PU 2.

【００２１】ＣＰＵ２が故障したとすると，それを検出
したＣＰＵから，ＣＰＵ１，ＣＰＵ３，ＣＰＵ４は，Ｃ
ＰＵ２の故障通知を受信する（ステップＳ１）。それぞ
れのＣＰＵでは，再構成テーブル２を参照して，ＣＰＵ
２上で実行されていたタスク，すなわち遷移情報におけ
る１番目の現在遷移位置が「２（ＣＰＵ２）」となって
いるタスクを探す（ステップＳ２）。ここで，ＣＰＵ２
上で実行されていたタスクが，タスク２０１，２０２で
あることがわかる。Assuming that the CPU 2 has failed, the CPU 1, CPU 3, and CPU 4 send C
The failure notification of PU2 is received (step S1). In each CPU, referring to the reconfiguration table 2, the CPU
A task that has been executed on 2, that is, a task in which the first current transition position in the transition information is “2 (CPU2)” is searched for (step S2). Here, CPU2
It can be seen that the tasks executed above are the tasks 201 and 202.

【００２２】ＣＰＵ１は，タスク２０１の遷移情報（２
→１）からこのタスクを引き継ぐＣＰＵが自ＣＰＵであ
ることがわかり，ＣＰＵ４は，タスク２０２の遷移情報
（２→４）からこのタスクを引き継ぐＣＰＵが自ＣＰＵ
であることがわかる。そこで，ＣＰＵ１は，２０１用タ
スクを起動させ，ＣＰＵ４は，２０２用タスクを起動さ
せる（ステップＳ３）。The CPU 1 determines the transition information (2
From (1), it is found that the CPU taking over this task is its own CPU, and the CPU 4 takes over from the transition information (2 → 4) of the task 202,
It can be seen that it is. Therefore, the CPU 1 activates the task for 201, and the CPU 4 activates the task for 202 (step S3).

【００２３】ＣＰＵ１，ＣＰＵ４は，現在遷移位置テー
ブル４のタスク２０１，２０２の現在遷移位置をそれぞ
れ「１」から「２」に更新する（ステップＳ４）。The CPUs 1 and 4 update the current transition positions of the tasks 201 and 202 in the current transition position table 4 from "1" to "2", respectively (step S4).

【００２４】その後，ＣＰＵ１，ＣＰＵ３，ＣＰＵ４が
稼働している状態で，さらにＣＰＵ１が故障したとす
る。それを検出したＣＰＵから，ＣＰＵ３，ＣＰＵ４
は，ＣＰＵ１の故障通知を受信する（ステップＳ５）。
ＣＰＵ３，ＣＰＵ４は，現在遷移位置１，２，２，１を
もとに，再構成テーブル２におけるタスク１０１，２０
１，２０２，３０１の遷移情報のうち，それぞれ１番
目，２番目，２番目，１番目を調べ，ＣＰＵ１で実行さ
れていたタスクがタスク１０１，２０１であることを知
る。これらのタスク１０１，２０１の遷移情報から，引
き継ぐＣＰＵ番号を得ると，タスク１０１については，
引き継ぐべきＣＰＵ２が既に故障しているので，さらに
次の遷移情報を調べて，次のＣＰＵ３が引き継ぐべきＣ
ＰＵであることがわかる。一方，タスク２０１は，引き
継ぐべきＣＰＵがないことがわかる（ステップＳ６）。Thereafter, it is assumed that the CPU 1, the CPU 3 and the CPU 4 are operating, and the CPU 1 further fails. CPU3, CPU4
Receives the failure notification of the CPU 1 (step S5).
The CPUs 3 and 4 perform tasks 101 and 20 in the reconfiguration table 2 based on the current transition positions 1, 2, 2, and 1, respectively.
The first, second, second, and first transition information of the transition information 1, 202, and 301 are examined, and it is known that the tasks executed by the CPU 1 are the tasks 101 and 201. When the CPU number to be taken over is obtained from the transition information of these tasks 101 and 201,
Since the CPU 2 to be taken over has already failed, the next transition information is further examined, and the next CPU 3
It turns out that it is PU. On the other hand, the task 201 knows that there is no CPU to take over (step S6).

【００２５】そこで，ＣＰＵ３では１０１用タスクを起
動させる（ステップＳ７）。引き継ぐべきＣＰＵのない
タスク２０１は，再構成テーブル２の処理選択情報を見
ると「タスク」であるので，システム全体を停止させる
のではなく，タスク２０１だけを停止させる。その後，
現在遷移位置テーブル４におけるタスク１０１の現在遷
移位置を「３」に更新し，引き継ぐべきＣＰＵのないタ
スク２０１の現在遷移位置については，「なし」を示す
「−１」に更新する（ステップＳ８）。Therefore, the CPU 3 activates the task for 101 (step S7). The task 201 having no CPU to be taken over is a "task" in the processing selection information of the reconfiguration table 2, so that only the task 201 is stopped instead of stopping the entire system. afterwards,
The current transition position of the task 101 in the current transition position table 4 is updated to "3", and the current transition position of the task 201 having no CPU to be taken over is updated to "-1" indicating "none" (step S8). .

【００２６】このように，本発明では，故障したＣＰＵ
上で稼働していたタスクを引き継ぐＣＰＵの引き継ぎ順
序を，再構成テーブル２にＣＰＵの番号を羅列して記す
ことにより，タスク単位でどのＣＰＵが引き継ぐべきか
を正常なＣＰＵが最後の一つになるまで，容易に指定す
ることができる。さらに，現在どのＣＰＵ上でタスクが
実行されているのかを示すことで，次に引き継ぐＣＰＵ
を簡単に調べることができる。また，引き継ぐＣＰＵが
なくなった場合の処理も簡単に設定することができ，タ
スクを停止させるだけで，他のタスクにより処理を続行
させるか，システム全体を停止させるかを容易に指定す
ることができる。As described above, according to the present invention, the failed CPU
By listing the CPU takeover order taking over the tasks that were running above in the reconfiguration table 2 by listing the CPU numbers, the normal CPU can determine which CPU should take over in task units as the last one. Until it can be easily specified. In addition, by indicating on which CPU the task is currently being executed,
Can be easily checked. In addition, processing when there is no CPU to be taken over can be easily set, and by simply stopping a task, it is possible to easily specify whether to continue processing with another task or stop the entire system. .

【００２７】このため，ある処理を実行するのに必要な
通信回線やディスク記憶装置などのハードウェアを，す
べてのＣＰＵに同等に用意する必要がなく，タスクの処
理機能に応じて特定のハードウェアが備わっているＣＰ
Ｕにのみ確実に引き継がせることができるようになる。For this reason, it is not necessary to equip all CPUs with hardware such as a communication line and a disk storage device necessary for executing a certain process, and specific hardware may be used in accordance with a task processing function. CP with
Only U can take over reliably.

【００２８】図３は，本発明が利用される複数のＣＰＵ
を搭載したシステムの例を示す。複数のＣＰＵの接続形
態としては，一般に知られているように，疎結合型と密
結合型とがある。本発明はそのどちらの形態においても
適用することができる。FIG. 3 shows a plurality of CPUs to which the present invention is applied.
An example of a system equipped with is shown. As a generally known connection form of a plurality of CPUs, there are a loosely coupled type and a tightly coupled type. The present invention can be applied in either of these forms.

【００２９】図３（Ａ）は，疎結合型の接続形態の例で
あって，各ＣＰＵ２０−１〜２０−ｎは，自己専用のプ
ロセッサバス２１−１〜２１−ｎとローカルメモリ２２
−１〜２２−ｎとを持ち，ＣＰＵ２０−１〜２０−ｎ間
で共有できるのはシステムバス２３を通したグローバル
メモリ２４だけである。FIG. 3A shows an example of a loosely-coupled connection mode, in which each of the CPUs 20-1 to 20-n has its own dedicated processor buses 21-1 to 21-n and a local memory 22.
Only the global memory 24 through the system bus 23 can be shared between the CPUs 20-1 to 20-n.

【００３０】図３（Ｂ）は密結合型の接続形態の例であ
って，各ＣＰＵ３０−１〜３０−ｍはプロセッサバス３
１もローカルメモリ３２も共有し，ローカルメモリ３２
内を各ＣＰＵ３０−１〜３０−ｍがそれぞれ専用的に使
うメモリ空間とＣＰＵ間で共有的に使うメモリ空間とを
論理的に区分けして使用する。FIG. 3B shows an example of a tightly-coupled connection mode, in which each of the CPUs 30-1 to 30-m is connected to a processor bus 3.
1 and the local memory 32 are shared.
A memory space exclusively used by each of the CPUs 30-1 to 30-m and a memory space shared by the CPUs are logically divided and used.

【００３１】図４に，疎結合型のマルチＣＰＵシステム
におけるブロック構成例を示す。ここでは図を簡単に表
すために，２台のＣＰＵを示しているが，多数のＣＰＵ
が接続されていてもよい。本発明に必要な各機能は，Ｃ
ＰＵ２０−１，２０−１のローカルメモリ２２−１，２
２−２に配置され，また，ＣＰＵ間の共通資源を管理す
る領域はグローバルメモリ２４に配置される。FIG. 4 shows an example of a block configuration in a loosely-coupled multi-CPU system. Although two CPUs are shown here for simplicity of illustration, many CPUs are shown.
May be connected. Each function required for the present invention is C
Local memories 22-1 and 22-2 of PUs 20-1 and 20-1
2-2, and an area for managing common resources between CPUs is arranged in the global memory 24.

【００３２】図５に，密結合型のマルチＣＰＵシステム
におけるブロック構成例を示す。本発明に必要な各手段
は，ＣＰＵ３０−１，３０−２ごとに，ローカルメモリ
３２の各ＣＰＵ専用メモリ空間に配置され，また，ＣＰ
Ｕ間の共通資源を管理する領域はローカルメモリ３２内
に設けられた共有メモリ空間に配置される。FIG. 5 shows an example of a block configuration in a tightly-coupled multi-CPU system. The means required for the present invention are arranged in the CPU dedicated memory space of the local memory 32 for each of the CPUs 30-1 and 30-2.
An area for managing common resources between U is arranged in a shared memory space provided in the local memory 32.

【００３３】図４および図５において，タスク管理部５
１は，自ＣＰＵ上で動作するタスクを制御し，フォール
トトレラント部５３からタスクの引き継ぎ依頼があれ
ば，引き継ぐために待機していたタスクを起動し，停止
依頼があればタスクを停止する。故障検出部５２は，自
ＣＰＵまたは他のＣＰＵの故障を検出し，ＣＰＵ間通信
部５５を介して他のＣＰＵに故障を通知する。フォール
トトレラント部５３は，他のＣＰＵの故障を自ら検出す
るか，またはＣＰＵ間通信部５５を介して，他のＣＰＵ
の故障通知を受け取ると，再構成テーブル５４を参照し
て，自ＣＰＵがタスクを引き継ぐかどうかを判断する。
ＣＰＵ間通信部５５は，各ＣＰＵ間でメッセージ通信を
行うためのモジュールである。なお，図４に示すグロー
バルメモリ２４，または図５に示すローカルメモリ３２
内の共有メモリ空間には，ＣＰＵ間の共通資源を管理す
るマルチＣＰＵ制御領域５６が設けられる。また，この
例では，現在遷移位置テーブル５７も共有メモリ空間内
に設けられている。In FIGS. 4 and 5, the task management unit 5
1 controls a task that operates on its own CPU, activates a task waiting to be taken over when a request to take over a task is issued from the fault tolerant unit 53, and stops a task when requested to stop. The failure detection unit 52 detects a failure of its own CPU or another CPU, and notifies the other CPU of the failure via the inter-CPU communication unit 55. The fault-tolerant unit 53 detects the failure of another CPU by itself, or sends it to another CPU via the inter-CPU communication unit 55.
When the failure notification is received, it is determined with reference to the reconfiguration table 54 whether the own CPU takes over the task.
The inter-CPU communication unit 55 is a module for performing message communication between the CPUs. The global memory 24 shown in FIG. 4 or the local memory 32 shown in FIG.
In the shared memory space within the CPU, a multi-CPU control area 56 for managing common resources between CPUs is provided. In this example, the current transition position table 57 is also provided in the shared memory space.

【００３４】図６に，本発明をコンピュータによって実
現させるためのプログラムの処理フローチャートを示
す。このフローチャートは，特にフォールトトレラント
部５３を中心とした部分の処理の流れを示している。FIG. 6 shows a processing flowchart of a program for realizing the present invention by a computer. This flowchart particularly shows a processing flow of a portion centering on the fault tolerant section 53.

【００３５】まず，システムにおける一つのＣＰＵの故
障が検出された場合，再構成を行うために正常なＣＰＵ
間で同期をとる（ステップＳ１１）。故障の検出は，例
えば各ＣＰＵ間で定期的に生存確認のメッセージ交換を
行うなど，従来から知られている方式を用いることがで
きる。また，ＣＰＵ間の同期は，共有メモリ空間におけ
るマルチＣＰＵ制御領域５６またはＣＰＵ間通信部５５
を用いて行うが，同期方法については，よく知られてい
るので，ここでの詳しい説明を省略する。First, when a failure of one CPU in the system is detected, a normal CPU is used for reconfiguration.
Synchronization is established (step S11). For the detection of a failure, a conventionally known method such as, for example, periodically exchanging a message for confirming existence between CPUs can be used. The synchronization between the CPUs is performed by the multi-CPU control area 56 or the inter-CPU communication section 55 in the shared memory space.
However, since the synchronization method is well known, a detailed description thereof will be omitted here.

【００３６】その後，再構成テーブル５４の先頭のタス
クから順番に，現在遷移位置テーブル５７から得た現在
遷移位置と，再構成テーブル５４における遷移情報か
ら，着目するタスクが現在どのＣＰＵ上で動作している
のかを調べる（ステップＳ１２）。タスクが，既に動作
していないタスクである場合，すなわち現在遷移位置テ
ーブル５７から得た現在遷移位置が「−１」の場合，ス
テップＳ２１へ進む（ステップＳ１３）。また，現在遷
移位置の示すＣＰＵが停止（故障）しているＣＰＵでな
い場合，ステップＳ２１へ進む（ステップＳ１４）。After that, in order from the task at the head of the reconfiguration table 54, the task of interest runs on which CPU the task of interest is based on the current transition position obtained from the current transition position table 57 and the transition information in the reconfiguration table 54. Is checked (step S12). If the task is not already running, that is, if the current transition position obtained from the current transition position table 57 is “−1”, the process proceeds to step S21 (step S13). If the CPU indicated by the current transition position is not the stopped (failed) CPU, the process proceeds to step S21 (step S14).

【００３７】既に動作していないタスクではなく，ま
た，そのタスクが割り当てられているＣＰＵが停止して
いるＣＰＵであった場合，再構成テーブル５４の遷移情
報から次にそのタスクを動作させるＣＰＵを調べる（ス
テップＳ１５）。If the CPU to which the task is assigned is not the task that has not been operated and the CPU to which the task has been stopped, the CPU that operates the task next is determined from the transition information in the reconfiguration table 54. Check (step S15).

【００３８】遷移情報に次のＣＰＵが存在しない場合，
ステップＳ２３に進む（ステップＳ１６）。遷移情報に
次のＣＰＵが存在するが，その次のＣＰＵも停止してい
るときには（ステップＳ１７），ステップＳ１５の処理
へ戻り，再度，遷移情報から次にそのタスクを動作させ
るＣＰＵを調べる。If the next CPU does not exist in the transition information,
Proceed to step S23 (step S16). If the next CPU exists in the transition information, but the next CPU is also stopped (step S17), the process returns to step S15, and the CPU that operates the task next is checked again from the transition information.

【００３９】また，次のＣＰＵが停止しているＣＰＵで
はない場合，そのタスクを引き継ぐＣＰＵが自ＣＰＵで
あるかどうかを調べる（ステップＳ１８）。自ＣＰＵで
ある場合，ステップＳ１９へ進み，自ＣＰＵでない場
合，ステップＳ２１へ進む。If the next CPU is not the stopped CPU, it is checked whether or not the CPU taking over the task is its own CPU (step S18). If it is its own CPU, the process proceeds to step S19, and if it is not its own CPU, the process proceeds to step S21.

【００４０】タスクを引き継ぐＣＰＵが自ＣＰＵである
場合，現在遷移位置テーブル５７の現在遷移位置を更新
し（ステップＳ１９），待機させていたタスクを起動す
る（ステップＳ２０）。If the CPU that takes over the task is the own CPU, the current transition position in the current transition position table 57 is updated (step S19), and the waiting task is activated (step S20).

【００４１】ステップＳ２１では，再構成テーブル５４
におけるすべてのタスクについて，以上の処理を行った
かどうかを調べ，すべてのタスクについて以上の処理が
終わるまで，ステップＳ１２〜Ｓ２０を繰り返す。再構
成テーブル５４におけるすべてのタスクについて，以上
の処理を行ったならば，他のＣＰＵの処理を待ち，同期
をとる（ステップＳ２２）。正常のＣＰＵのすべてにお
いて，それぞれタスク引き継ぎ処理が完了し，同期がと
れたならば，新しいタスク構成により業務処理を継続す
る。In step S21, the reconstruction table 54
It is checked whether or not the above processing has been performed for all the tasks in step S12, and steps S12 to S20 are repeated until the above processing is completed for all the tasks. When the above processing has been performed for all the tasks in the reconfiguration table 54, the processing is waited for by the other CPUs and synchronized (step S22). When the task takeover processing is completed and synchronized in all the normal CPUs, business processing is continued with the new task configuration.

【００４２】また，ステップＳ１５の処理において，遷
移情報から次にそのタスクを動作させるＣＰＵを調べ，
次のＣＰＵが存在しない場合（ステップＳ１６），その
タスクの処理選択情報を調べる（ステップＳ２３）。処
理選択情報がタスクであれば，現在遷移位置テーブル５
７の現在遷移位置を「−１」とし，そのタスクがシステ
ムからなくなったことを記し（ステップＳ２４），その
後，ステップＳ２１へ進む。処理選択情報がタスクでは
なく，システム全体であれば，システム全体を停止させ
て処理を終了する（ステップＳ２５）。In the process of step S15, the CPU which operates the next task is checked from the transition information.
If the next CPU does not exist (step S16), the process selection information of the task is checked (step S23). If the process selection information is a task, the current transition position table 5
The current transition position of No. 7 is set to "-1" to indicate that the task has disappeared from the system (step S24), and thereafter, the process proceeds to step S21. If the process selection information is not a task but the entire system, the entire system is stopped and the process is terminated (step S25).

【００４３】以上の実施の形態で説明したように，再構
成テーブル５４を各ＣＰＵのローカルメモリ３２に配置
し，現在遷移位置テーブル５７を共有メモリ空間に配置
することによって，再構成テーブル５４へのアクセスの
高速化と現在遷移位置の管理の容易化を実現することが
でき，好適なフォールトトレラント・システムを構築す
ることができる。しかし，再構成テーブル５４と現在遷
移位置テーブル５７とを，必ずしもローカルメモリ３２
と共有メモリ空間とに分けて配置しなければならないわ
けではなく，例えばローカルメモリ３２または共有メモ
リ空間のいずれかにこれらのテーブルを共通に設けて
も，本発明を実施することができる。As described in the above embodiment, the reconfiguration table 54 is arranged in the local memory 32 of each CPU, and the current transition position table 57 is arranged in the shared memory space. It is possible to realize high-speed access and easy management of the current transition position, and to construct a suitable fault-tolerant system. However, the reconfiguration table 54 and the current transition position table 57 are not necessarily stored in the local memory 32.
However, the present invention can be implemented even if these tables are provided in common in either the local memory 32 or the shared memory space, for example.

【００４４】[0044]

【実施例】次に，入力されたデータを加工して出力する
以下のようなタスクを持つシステムを例にして，本発明
の実施例の処理動作を説明する。このシステムは４つの
ＣＰＵ１〜ＣＰＵ４から構成されており，ＣＰＵ１とＣ
ＰＵ３には，入力装置を制御できるコントローラが配置
され，ＣＰＵ２とＣＰＵ４には出力装置を制御するコン
トローラが配置されているとする。Next, the processing operation of an embodiment of the present invention will be described with reference to a system having the following tasks for processing input data and outputting the processed data. This system is composed of four CPU1 to CPU4.
It is assumed that a controller capable of controlling an input device is arranged in PU3, and a controller controlling an output device is arranged in CPU2 and CPU4.

【００４５】本システムで稼働するタスクは，・入力処理タスク・出力処理タスク・データ加工マスタタスク・データ加工サブタスクである。The tasks that operate in this system are: an input processing task, an output processing task, a data processing master task, and a data processing subtask.

【００４６】図７に，初期状態におけるタスクの構成例
を示す。入力処理タスクはＣＰＵ１で，出力処理タスク
はＣＰＵ２で稼働させる。データを加工する処理は分割
して各ＣＰＵに分散させ，データ加工マスタタスクはＣ
ＰＵ１に配置し，実際にデータを加工するデータ加工サ
ブタスクは，すべてのＣＰＵにそれぞれ配置して並列に
動作させる。FIG. 7 shows a configuration example of a task in an initial state. The input processing task is operated by the CPU 1 and the output processing task is operated by the CPU 2. The data processing is divided and distributed to each CPU.
The data processing subtasks arranged in the PU1 and actually processing data are arranged in all the CPUs and operated in parallel.

【００４７】また，データ加工マスタタスクは，ＣＰＵ
２，ＣＰＵ３，ＣＰＵ４で待機させ，入力処理タスクは
ＣＰＵ３で，出力処理タスクはＣＰＵ４で，それぞれ待
機させる。The data processing master task is a CPU
2, the CPU 3 and the CPU 4 wait; the input processing task is the CPU 3; the output processing task is the CPU 4;

【００４８】図８に，再構成テーブルおよび現在遷移位
置テーブルの設定例を示す。再構成テーブル５４のタス
クＩＤ，遷移情報，処理選択の情報を，図８に示すよう
に設定して各ＣＰＵに配置する。再構成テーブル５４
は，あらかじめプログラム中に組み込んでおいてもよ
い。入力処理タスクについては，ＣＰＵ１が故障した場
合にはＣＰＵ３が引き継ぎ，引き継ぐＣＰＵがない場合
にはシステム全体の処理を停止するように設定されてい
る。FIG. 8 shows a setting example of the reconfiguration table and the current transition position table. The task ID, transition information, and process selection information of the reconfiguration table 54 are set as shown in FIG. Reconstruction table 54
May be incorporated in the program in advance. The input processing task is set so that when the CPU 1 breaks down, the CPU 3 takes over, and when there is no CPU to take over, the processing of the entire system is stopped.

【００４９】出力処理タスクについては，ＣＰＵ２が故
障した場合にはＣＰＵ４が引き継ぎ，これも引き継ぐＣ
ＰＵがない場合にはシステム全体の処理を停止するよう
に設定されている。データ加工マスタタスクは，稼働し
ているＣＰＵが故障すると，ＣＰＵ１，ＣＰＵ２，ＣＰ
Ｕ３，ＣＰＵ４の順番で順次使用可能なＣＰＵに処理が
引き継がれ，使用可能なＣＰＵがなくなると，システム
全体の処理を停止するように設定されている。データ加
工サブタスク１は，ＣＰＵ１が故障しても他のＣＰＵは
引き継がず，そのタスク処理を停止するように設定され
ている。データ加工サブタスク２，３，４も，それぞれ
動作しているＣＰＵ２，ＣＰＵ３，ＣＰＵ４が故障する
と，そのタスクは引き継がれない。When the CPU 2 fails, the CPU 4 takes over the output processing task.
When there is no PU, the processing of the entire system is set to be stopped. When the operating CPU fails, the data processing master task executes the CPU1, CPU2, CP
The processing is taken over by the CPUs that can be sequentially used in the order of U3 and CPU4, and when there are no more usable CPUs, the processing of the entire system is stopped. The data processing subtask 1 is set so that even if the CPU 1 fails, the other CPUs do not take over and stop the task processing. The data processing subtasks 2, 3, and 4 do not take over if the operating CPUs 2, 3, and 4 fail.

【００５０】現在遷移位置テーブル５７は，共有メモリ
空間の領域に割り当て，各現在遷移位置（各タスクが現
在どのＣＰＵ上で稼働しているのかを示すための遷移情
報の系列中の位置）は，システムの初期化時にすべて
「１」に設定される。The current transition position table 57 is allocated to an area of the shared memory space, and each current transition position (a position in a sequence of transition information for indicating on which CPU each task is currently operating) is: All are set to "1" at system initialization.

【００５１】今，ＣＰＵ２に故障が発生し，ＣＰＵ２を
除いた構成でシステムを構成し直す場合，ＣＰＵ１，Ｃ
ＰＵ３，ＣＰＵ４は，再構成テーブル５４の遷移情報と
現在遷移位置テーブル５７の現在遷移位置とを参照し
て，ＣＰＵ２上で稼働していたタスクを調べる。これに
より，ＣＰＵ２上で稼働していたタスクは，「出力処理
タスク」と「データ加工サブタスク２」であること，お
よび，出力処理タスクはＣＰＵ４が引き継ぎ，「データ
加工サブタスク２」は引き継ぐ必要がないことがわか
る。Now, if a failure occurs in the CPU 2 and the system is reconfigured with the configuration excluding the CPU 2, the CPUs 1 and 2
The PU 3 and the CPU 4 refer to the transition information of the reconfiguration table 54 and the current transition position of the current transition position table 57 to check the task running on the CPU 2. Thus, the tasks operating on the CPU 2 are the “output processing task” and the “data processing subtask 2”, and the output processing task is taken over by the CPU 4 and the “data processing subtask 2” does not need to take over. You can see that.

【００５２】遷移情報に従って，「データ加工サブタス
ク２」の現在遷移位置に「−１」を設定する。ＣＰＵ４
は，「出力処理タスク」を引き継ぐために待機させてい
たタスクを起動し，「出力処理タスク」がＣＰＵ４に引
き継がれたことを示すために，現在遷移位置テーブル５
７における現在遷移位置を「２」に更新する。この状態
のタスク構成は，図９に示すようになる。また，更新後
の再構成テーブル５４および現在遷移位置テーブル５７
は，図１０に示すようになる。According to the transition information, "-1" is set at the current transition position of "data processing subtask 2". CPU4
Starts the task that has been waiting to take over the “output processing task”, and displays the current transition position table 5 to indicate that the “output processing task” has been taken over by the CPU 4.
7 is updated to “2”. The task configuration in this state is as shown in FIG. The reconfiguration table 54 and the current transition position table 57 after the update
Is as shown in FIG.

【００５３】その後さらに，ＣＰＵ１に故障が発生し
て，ＣＰＵ１を除いてシステムを構成し直す場合，ＣＰ
Ｕ３，ＣＰＵ４は，現在遷移位置テーブル５７の現在遷
移位置と再構成テーブル５４の遷移情報とを参照して，
ＣＰＵ１で稼働していたタスクを調べ，ＣＰＵ１で稼働
していたタスクが，「入力処理タスク」，「データ加工
マスタタスク」，「データ加工サブタスク１」であるこ
と，および「入力処理タスク」はＣＰＵ３が引き継ぐこ
と，「データ加工サブタスク１」は引き継ぐ必要がない
ことがわかる。また，「データ加工マスタタスク」はＣ
ＰＵ２が引き継ぐことがわかるが，ＣＰＵ２は既に故障
しているので，さらに次の遷移情報を調べて，ＣＰＵ３
が引き継ぐことを認識する。なお，ＣＰＵ２が既に故障
しているかどうかは，共有メモリ空間のマルチＣＰＵ制
御領域５６に記されているので，それによって認識す
る。Thereafter, if a failure occurs in the CPU 1 and the system is reconfigured except for the CPU 1, the CP
U3 and the CPU 4 refer to the current transition position in the current transition position table 57 and the transition information in the reconfiguration table 54, and
The task running on the CPU 1 is checked, and the tasks running on the CPU 1 are “input processing task”, “data processing master task”, and “data processing subtask 1”. It can be understood that the data processing subtask 1 does not need to be taken over. "Data processing master task" is C
It is known that PU2 takes over, but since CPU2 has already failed, the next transition information is further examined and CPU3 takes over.
Recognize that will take over. It should be noted that whether or not the CPU 2 has already failed is described in the multi-CPU control area 56 in the shared memory space, and is thereby recognized.

【００５４】ＣＰＵ３は，遷移情報から，「入力処理タ
スク」および「データ加工マスタタスク」を引き継ぐこ
とを認識すると，これらを引き継ぐために待機させてい
たタスクをそれぞれ起動して，「入力処理タスク」の現
在遷移位置を「２」に，「データ加工マスタタスク」の
現在遷移位置を「３」に更新する。また，「データ加工
サブタスク１」の現在遷移位置に「−１」を設定する。
この状態のタスク構成は，図１１に示すようになる。ま
た，更新後の再構成テーブル５４および現在遷移位置テ
ーブル５７は，図１２に示すようになる。When the CPU 3 recognizes that the “input processing task” and the “data processing master task” are to be taken over from the transition information, the CPU 3 activates the tasks that have been waiting to take over these tasks, and “input processing task” Is updated to “2”, and the current transition position of “data processing master task” is updated to “3”. Further, “−1” is set to the current transition position of “data processing subtask 1”.
The task configuration in this state is as shown in FIG. The updated reconfiguration table 54 and current transition position table 57 are as shown in FIG.

【００５５】[0055]

【発明の効果】以上説明したように，本発明によれば，
マルチＣＰＵシステムにおいて，故障したＣＰＵ上のタ
スクをどのＣＰＵによって引き継ぐのかを，再構成テー
ブルにＣＰＵの番号を羅列して記すことにより，容易に
引き継ぎの順序を指定することができ，さらに，現在ど
のＣＰＵ上でタスクが実行されているのかを示すこと
で，次に引き継ぐＣＰＵを簡単に調べることができる。As described above, according to the present invention,
In a multi-CPU system, the order of the takeover can be easily specified by listing the CPU numbers in the reconfiguration table to indicate which CPU takes over the task on the failed CPU. By indicating whether a task is being executed on the CPU, it is possible to easily check the next CPU to take over.

【００５６】これにより，遷移情報ですべてのＣＰＵを
指定することによって，正常なＣＰＵが最後の一つにな
るまで引き継ぐような指定や，特定のいくつかのＣＰＵ
だけが引き継ぎを行うような指定を，簡単に設定するこ
とができる。Thus, by specifying all CPUs in the transition information, it is possible to specify that a normal CPU takes over until it becomes the last one, or to specify some specific CPUs.
You can easily specify that only one will take over.

【００５７】さらに，タスクごとの処理選択情報を再構
成テーブル中に持たせることにより，引き継ぐＣＰＵが
なくなった場合にそのタスクだけを停止させたり，また
はシステム全体を停止させたりするような指定も，簡単
に設定することができる。Further, by giving the process selection information for each task in the reconfiguration table, it is possible to specify that only the task is stopped or the entire system is stopped when there is no CPU to take over. It can be set easily.

【００５８】また，あらかじめ引き継ぐＣＰＵを設定し
ておくため，タスクに必要なハードウェアを備えるＣＰ
Ｕにだけ確実に引き継ぎを行わせることも可能になる。Further, since a CPU to be taken over is set in advance, a CP having hardware necessary for a task is provided.
It is also possible to ensure that only U takes over.

[Brief description of the drawings]

【図１】本発明の概要を示すブロック構成図である。FIG. 1 is a block diagram showing an outline of the present invention.

【図２】本発明の処理動作の流れの例を示す図である。FIG. 2 is a diagram showing an example of a flow of a processing operation of the present invention.

【図３】マルチＣＰＵシステムの接続形態の例を示す図
である。FIG. 3 is a diagram illustrating an example of a connection form of a multi-CPU system.

【図４】疎結合型システムにおけるブロック構成例を示
す図である。FIG. 4 is a diagram illustrating an example of a block configuration in a loosely coupled system.

【図５】密結合型システムにおけるブロック構成例を示
す図である。FIG. 5 is a diagram illustrating an example of a block configuration in a tightly coupled system.

【図６】本発明をコンピュータによって実現させるため
のプログラムの処理フローチャートである。FIG. 6 is a processing flowchart of a program for realizing the present invention by a computer.

【図７】本発明の実施例の初期状態におけるタスクの構
成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of a task in an initial state according to the embodiment of this invention.

【図８】本発明の実施例の再構成テーブルおよび現在遷
移位置テーブルの設定例を示す図である。FIG. 8 is a diagram illustrating a setting example of a reconfiguration table and a current transition position table according to the embodiment of this invention.

【図９】システムの各ＣＰＵのタスクの状態を説明する
ための図である。FIG. 9 is a diagram for explaining a state of a task of each CPU of the system.

【図１０】更新後の再構成テーブルおよび現在遷移位置
テーブルの例を示す図である。FIG. 10 is a diagram illustrating an example of an updated reconfiguration table and a current transition position table.

【図１１】システムの各ＣＰＵのタスクの状態を説明す
るための図である。FIG. 11 is a diagram for explaining a state of a task of each CPU of the system.

【図１２】更新後の再構成テーブルおよび現在遷移位置
テーブルの例を示す図である。FIG. 12 is a diagram illustrating an example of an updated reconfiguration table and a current transition position table.

[Explanation of symbols]

１フォールトトレラント処理手段１１引継判断部１２待機タスク起動部１３現在遷移位置更新部２再構成テーブル３待機用タスク４現在遷移位置テーブル REFERENCE SIGNS LIST 1 fault tolerant processing means 11 takeover determination unit 12 standby task activation unit 13 current transition position update unit 2 reconfiguration table 3 standby task 4 current transition position table

───────────────────────────────────────────────────── フロントページの続き (72)発明者岡本二朗神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内Ｆターム(参考） 5B034 BB11 CC01 5B045 BB02 BB12 GG06 JJ09 JJ13 JJ44 5B098 AA10 GA04 JJ00 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Jiro Okamoto 4-1-1, Kamidadanaka, Nakahara-ku, Kawasaki-shi, Kanagawa F-term in Fujitsu Limited (Reference) 5B034 BB11 CC01 5B045 BB02 BB12 GG06 JJ09 JJ13 JJ44 5B098 AA10 GA04 JJ00

Claims

[Claims]

In a multiprocessor system in which a plurality of tasks, which are execution processing units of a processor, are distributed and arranged in a plurality of processors, and each processor performs processing, each task operates on a failed processor when the processor fails. Storage means for storing transition information defining a series of processors that take over a task that has been performed, and a current transition position indicating on which processor the task is currently operating; Means for determining, based on the transition information and the information on the current transition position, whether the task operating on the failed processor is a task to be taken over by the own processor, and, as a result of the determination, taking over by the own processor. In that case, the task is taken over and the task taken over by the local processor Means for operating the fault tolerant system.

2. The storage means stores, when there is no processor taking over a task, processing selection information indicating either a halt of the task or a halt of the entire system, and the processor operates on a failed processor. 2. The fault tolerant system according to claim 1, wherein when there is no processor taking over the task, the task or the entire system is stopped based on the processing selection information.

3. A fault-tolerant processing method in a multiprocessor system in which a plurality of tasks, which are execution processing units of a processor, are distributed and arranged in a plurality of processors, and each of the tasks is executed when a processor fails. Using a storage means for storing transition information defining a series of processors taking over the task that was operating on the failed processor and a current transition position indicating on which processor the task is currently operating, In the event of a failure, the processor refers to the storage means and determines whether the task operating on the failed processor is a task to be taken over by the own processor based on the transition information and the information on the current transition position, If the result of the determination is that the processor will take over, the processor A fault-tolerant processing method characterized by taking over another task and operating the task taken over by its own processor.

4. When there is no processor taking over a task, processing selection information indicating either a halt of the task or a halt of the entire system is stored in the storage means, and the task operating on the failed processor is stored. 4. The fault tolerant processing method according to claim 3, wherein when there is no processor taking over the task, the task or the entire system is stopped based on the processing selection information.

5. A recording medium on which a program for realizing a fault-tolerant processing method in a multiprocessor system by a computer is recorded, wherein for each task, a processor taking over the task that was operating on the failed processor when the processor failed. Using storage means for storing the transition information defining the series of and the current transition position indicating on which processor the task is currently operating. When a certain processor fails, the storage means is referred to. Based on the transition information and the information on the current transition position, determine whether the task operating on the failed processor is a task to be taken over by the own processor, and as a result of the determination, take over by the own processor Takes over the task, and takes over on its own processor. A program recording medium for fault-tolerant control, characterized by recording a program for causing a computer to execute a process for operating a task.

6. The fault-tolerant control program recording medium according to claim 5, wherein the program comprises:
A processor that takes over the task that has been operating on the failed processor based on the process selection information stored in the storage unit together with the transition information, the process selection information indicating either a task halt or a system halt. A program recording medium for fault-tolerant control, characterized by including a program for causing a computer to execute a process for stopping a task or the entire system when there is no such program.