JPH10214198A

JPH10214198A - Information processing system

Info

Publication number: JPH10214198A
Application number: JP9015034A
Authority: JP
Inventors: Nobuyasu Kanekawa; 信康金川; Naohiro Kasuya; 直大糟谷; Shinichiro Yamaguchi; 伸一朗山口; Naoto Miyazaki; 直人宮崎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-01-29
Filing date: 1997-01-29
Publication date: 1998-08-11

Abstract

(57)【要約】【課題】過渡フォールト発生時に処理を再度実行して
も、デッドラインを守れる情報処理システムの提供。【解決手段】データを格納するメモリと、与えられたデ
ータに基づいて処理を行う処理装置とを有する情報処理
システムでの処理を、予め定められた期間内に処理を終
了し、上記処理の結果を出力しなければならないクリテ
ィカ処理と、期間内に処理の終了を必要としないノンク
リティカル処理とを分け、さらに期間内の前半でクリテ
ィカル処理を実行し、後半でノンクリティカル処理を実
行する。 (57) [Problem] To provide an information processing system capable of protecting a deadline even if a process is executed again when a transient fault occurs. The processing in an information processing system having a memory for storing data and a processing device for performing processing based on given data is completed within a predetermined period, and a result of the processing is provided. Is separated from the non-critical processing that does not require the termination of the processing within the period, the critical processing is executed in the first half of the period, and the non-critical processing is executed in the second half.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は高信頼な情報処理シ
ステムに関し、特に障害発生時のシステムの可用性の向
上，リアルタイム性の向上に関するものである。[0001] 1. Field of the Invention [0002] The present invention relates to a highly reliable information processing system, and more particularly to an improvement in system availability and an improvement in real-time performance when a failure occurs.

【０００２】[0002]

【従来の技術】コンピュータシステムが社会インフラス
トラクチャの重要な役割を果たすようになってくるよう
になるにつれて、コンピュータシステムの障害が社会ま
たは人命に重大な影響を与えないように特別な配慮が必
要となってきている。このような背景の中で、コンピュ
ータシステムを冗長化して信頼性を高めるフォールトト
レラントコンピュータ技術が広く採用されている。BACKGROUND OF THE INVENTION As computer systems begin to play an important role in social infrastructure, special considerations are needed to ensure that computer system failures do not seriously affect society or human life. It has become to. Against this background, fault-tolerant computer technology has been widely adopted to increase the reliability by making the computer system redundant.

【０００３】コンピュータシステムの誤動作を引き起こ
す原因となるフォールトには部品の故障などにより発生
する固定フォールト，電気的雑音や、宇宙線，半導体素
子を構成する材料中の放射線同位元素からのα線などに
より一時的に発生する過渡フォールトなどに分類され
る。電子部品の生産技術の向上により最近では固定フォ
ールトは減少する傾向にあり、相対的に過渡フォールト
の比率が高まる傾向にある。Faults that cause malfunctions of computer systems include fixed faults caused by component failures, electrical noise, cosmic rays, α rays from radiation isotopes in materials constituting semiconductor devices, and the like. It is classified as a transient fault that occurs temporarily. Recently, fixed faults have tended to decrease due to improvements in electronic component production techniques, and the ratio of transient faults has tended to increase relatively.

【０００４】過渡フォールトは文字通り過渡的、即ち一
時的に発生するので、一旦フォールトが発生しても異な
る時間に同一の処理を再度実行すれば、多くの場合には
フォールトが発生することなく無事処理を完了させるこ
とができる。このような方法を時間冗長（当麻喜弘編
著：フォールトトレラントシステム論，電子情報通信学
会，ｐ．１４９）と呼んでいる。Transient faults are literally transient, that is, occur temporarily. Therefore, even if a fault occurs, if the same process is executed again at a different time, the fault can often be safely processed without causing a fault. Can be completed. Such a method is called time redundancy (edited by Yoshihiro Toma: Fault Tolerant System Theory, IEICE, p. 149).

【０００５】[0005]

【発明が解決しようとする課題】上記従来技術はハード
ウェアを新たに付加することなく過渡フォールトを検出
あるいはマスクすることができる優れた方法である。し
かし所要の時間（デッドライン）内に処理結果を出力し
なければならないハードリアルタイムシステムへの適用
に際しては、再度実行により余分に時間を要した場合で
もデッドラインを守るように考慮しなければならない。
そこで本発明では過渡フォールト発生時に処理を再度実
行しても、デッドラインを守れるシステムを提供するこ
とを第１の目的とする。The above prior art is an excellent method for detecting or masking a transient fault without adding new hardware. However, when applying to a hard real-time system in which a processing result must be output within a required time (deadline), consideration must be given to protecting the deadline even when extra time is required by re-execution.
Therefore, a first object of the present invention is to provide a system that can protect the deadline even if the process is executed again when a transient fault occurs.

【０００６】またさらに、上記従来技術はシステムの安
全な動作のためには再実行により回復可能なフォールト
と回復不可能なフォールトとを区別できるようにさらな
る考慮が必要である。固定フォールトは回復不可能なフ
ォールトであり、過渡フォールトはそれによって再実行
のために必要な情報が失われた場合には回復不可能なフ
ォールトとなり、再実行のために必要な情報が失われな
かった場合には回復可能なフォールトとなる。回復可能
なフォールトと回復不可能なフォールトとを区別する手
段がなければ、回復不可能なフォールトが発生した場合
でも回復可能なフォールト発生時と同じように処理の再
度実行を繰り返すことになる。このことにより第１のフ
ォールト発生による処理の再度実行の繰り返しの間に第
２のフォールトが発生する畏れがある。フォールトトレ
ラントコンピュータの多くは単一フォールトのみに対処
できるように設計されているため、このように２つのフ
ォールトが発生した場合の動作は設計者の意図の外で、
正常な動作は保証されないだけでなく、時には危険な動
作をすることが考えられる。そこで本発明では回復可能
なフォールトと回復不可能なフォールトとを区別できる
システムを提供することを第２の目的とする。Furthermore, the above prior art requires further consideration so that a fault that can be recovered by re-execution can be distinguished from a non-recoverable fault for safe operation of the system. Fixed faults are non-recoverable faults, and transient faults are non-recoverable faults if the information needed for re-run is lost, and the information needed for re-run is not lost If it does, a recoverable fault results. If there is no means for distinguishing between a recoverable fault and an unrecoverable fault, even if an unrecoverable fault occurs, the processing is repeated again in the same manner as when a recoverable fault occurs. As a result, there is a fear that the second fault may occur during the re-execution of the processing due to the first fault. Since many fault-tolerant computers are designed to handle only a single fault, the behavior when two faults occur in this way is outside the intention of the designer.
Not only normal operation is not guaranteed, but also dangerous operation is sometimes considered. Accordingly, a second object of the present invention is to provide a system that can distinguish between recoverable faults and unrecoverable faults.

【０００７】[0007]

【課題を解決するための手段】上記目的は、本発明によ
ればデータを格納するメモリと、与えられたデータに基
づいて処理を行う処理装置とを有する情報処理システム
であって、上記処理は、予め定められた期間内に処理を
終了し、上記処理の結果を出力しなければならない第１
の処理と、上記期間内に処理の終了を必要としない第２
の処理からなり、かつ上記期間内に上記第１の処理と上
記第２の処理とを実行できるものであって、上記処理装
置は、上記期間内に上記第２の処理を実行する前に上記
第１の処理を実行することを特徴とすることにより達成
することができる。According to the present invention, there is provided an information processing system having a memory for storing data and a processing device for performing processing based on given data. , The processing must be completed within a predetermined period and the result of the processing must be output.
Processing and the second processing that does not require the termination of the processing within the above period.
And the first processing and the second processing can be performed within the period, and the processing device performs the processing before performing the second processing within the period. This can be achieved by performing the first processing.

【０００８】[0008]

【発明の実施の形態】図１は本発明のクリティカル処理
１とノンクリティカル処理２の制御フレーム内における
処理時間と順序を示したものである。ここで、クリティ
カル処理１とは、所定時間内で処理結果を終了させる処
理のことである。具体的には制御系における、制御対象
からデータを取得し、このデータに基づいてアクチュエ
ータに対する制御指令を生成し出力する処理であり、所
定時間内に制御指令をアクチュエータに与えなければ、
制御を行うことができない。また、ノンクリティカル処
理２とは、所定時間内で必ずしも処理を終了させる必要
のない処理である。具体的には、単に上位の計算機にデ
ータを送るような処理である。FIG. 1 shows the processing time and order in a control frame of a critical processing 1 and a non-critical processing 2 of the present invention. Here, the critical process 1 is a process for terminating the processing result within a predetermined time. Specifically, in a control system, data is acquired from a control target, and a process of generating and outputting a control command to the actuator based on the data is performed.If the control command is not given to the actuator within a predetermined time,
Control cannot be performed. The non-critical processing 2 is processing that does not necessarily need to be completed within a predetermined time. Specifically, it is a process of simply sending data to a host computer.

【０００９】先ずコンピュータシステムが実行する処理
を、制御フレームの間に処理結果を出力しなければなら
ない処理であるクリティカル処理１とそれ以外の処理で
あるノンクリティカル処理２とに分類する。続いてクリ
ティカル処理１の処理時間が制御フレームの半分未満の
時間で済むようにする。このためにプロセッサの処理性
能が足りない場合には、プロセッサの動作周波数を高め
たり、さらに処理性能の高いプロセッサを用いたり、プ
ロセッサを複数台用意して処理を分散させるなどの方法
をとればよい。次に制御フレームの前半でクリティカル
処理１を実行し、後半でノンクリティカル処理２を実行
する。First, processes executed by the computer system are classified into a critical process 1 which is a process for outputting a process result during a control frame and a non-critical process 2 which is another process. Subsequently, the processing time of the critical processing 1 is set to be less than half of the control frame. For this reason, when the processing performance of the processor is insufficient, a method of increasing the operating frequency of the processor, using a processor with higher processing performance, preparing a plurality of processors, and distributing the processing may be adopted. . Next, critical processing 1 is executed in the first half of the control frame, and non-critical processing 2 is executed in the second half.

【００１０】図２は図１でクリティカル処理１の実行中
に障害が発生した場合の動作である。クリティカル処理
１の実行中に障害が発生した場合には、その制御フレー
ム内で再度クリティカル処理を実行し、その結果を出力
する。FIG. 2 shows the operation when a failure occurs during the execution of the critical processing 1 in FIG. If a failure occurs during the execution of the critical process 1, the critical process is executed again in the control frame, and the result is output.

【００１１】図３は図１でノンクリティカル処理２の実
行中に障害が発生した場合の動作である。ノンクリティ
カル処理２の実行中に障害が発生した場合には、次の制
御フレームでノンクリティカル処理を再度実行してその
結果を出力する。FIG. 3 shows the operation when a failure occurs during execution of the non-critical processing 2 in FIG. If a failure occurs during the execution of the non-critical processing 2, the non-critical processing is executed again in the next control frame, and the result is output.

【００１２】以上述べた方法によればクリティカル処理
１の実行中に障害が発生しても、制御フレームの半分以
上の残り時間を用いてクリティカル処理を再度実行でき
るので、デッドラインまでに処理結果を出力することが
できる。According to the above-described method, even if a failure occurs during the execution of the critical processing 1, the critical processing can be executed again using the remaining time of at least half of the control frame. Can be output.

【００１３】図４は本発明が対象とするシステムのハー
ドウェアの構成を示したものである。モジュール１１，
１２はそれぞれＭＰＵ(Microprocessor)１１０，１２
０，出力インタフェース２１，２２から構成される。モ
ジュール１１，１２の内部のアドレスバス１１１，１２
１のアドレス、データバス１１２，１２２のデータはそ
れぞれアドレス同志、データ同志で比較器１５で比較さ
れる。この比較器１５によるアドレスバス１１１，１２
１のアドレス、データバス１１２，１２２のデータの比
較により、モジュール１１，１２の動作の不一致、即ち
障害の発生を検出して割込み信号１６としてＭＰＵ１１
０，１２０に通知する。なお、比較器はアドレスバス１
１１，１２１のアドレス、データバス１１２，１２２の
データだけではなく、比較器１５′のようにモジュール
１１，１２の出力信号４１，４２を比較しても障害の発
生を検出できる。またさらにモジュール１１，１２が互
いの出力をパス４１′，４２′を経由してフィードバッ
クして自分自身の出そうとした出力と比較することによ
っても障害の発生を検出できる。尚、この場合はソフト
ウエアで比較する方法と、インタフェース２１，２２に
比較器を備えて行う方法がある。また、比較器１５，１
５′は特開平7−234801 号に記載されているように本発
明者らの発明によれば、比較器自体の障害も検出できる
ためさらに信頼性を高めることが可能である。FIG. 4 shows a hardware configuration of a system to which the present invention is applied. Module 11,
12 are MPUs (Microprocessors) 110 and 12
0, output interfaces 21 and 22. Address buses 111 and 12 inside modules 11 and 12
The address 1 and the data on the data buses 112 and 122 are compared by the comparator 15 between the address and the data. The address buses 111 and 12 by the comparator 15
By comparing the address 1 and the data on the data buses 112 and 122, a mismatch between the operations of the modules 11 and 12, that is, the occurrence of a failure is detected, and the MPU 11
Notify 0,120. The comparator is the address bus 1
The occurrence of a fault can be detected not only by comparing the addresses of the data buses 11 and 121 and the data of the data buses 112 and 122 but also by comparing the output signals 41 and 42 of the modules 11 and 12 like a comparator 15 '. Further, the occurrence of a fault can also be detected by the modules 11 and 12 feeding back each other's output via the paths 41 'and 42' and comparing the outputs with the outputs they are trying to output. In this case, there are a method of comparing by software and a method of providing a comparator in the interfaces 21 and 22. Further, the comparators 15 and 1
As described in Japanese Patent Application Laid-Open No. Hei 7-234801, 5 'can further detect the failure of the comparator itself according to the invention of the present inventors, so that the reliability can be further improved.

【００１４】図示していないがモジュール１１，１２は
それぞれ内部にメモリを有している。Although not shown, each of the modules 11 and 12 has a memory therein.

【００１５】以上、本発明が対象とするシステムのハー
ド構成の一例を示したが、この例に限らず種々のハード
構成のシステムに本発明が適用できることは言うまでも
ない。Although an example of the hardware configuration of the system to which the present invention is applied has been described above, it is needless to say that the present invention is not limited to this example but can be applied to systems having various hardware configurations.

【００１６】図４に示すハード構成のシステムでは障害
発生時には図５，図６に示すように通常処理７０から割
込処理８０に一旦移り、状況に応じて次の処理を開始す
る。図７は通常処理７０及び割込処理８０の処理フロー
の実施例である。通常処理７０の中では、システム立ち
上げ後にスタート７１して、制御フレームの前半にクリ
ティカル処理１を実行し、続いてノンクリティカル処理
２を実行する。ノンクリティカル処理２完了後には制御
フレーム終了まで時間待ち（処理７６）をする。In the system having the hardware configuration shown in FIG. 4, when a failure occurs, the process temporarily shifts from the normal process 70 to the interrupt process 80 as shown in FIGS. 5 and 6, and the next process is started according to the situation. FIG. 7 shows an example of the processing flow of the normal processing 70 and the interrupt processing 80. In the normal processing 70, the processing is started 71 after the system is started, and the critical processing 1 is executed in the first half of the control frame, and then the non-critical processing 2 is executed. After the completion of the non-critical process 2, the process waits until the control frame ends (process 76).

【００１７】障害が発生した場合にクリティカル処理１
を実行中かどうかをわかるようにするために、クリティ
カル処理１の前にクリティカル処理中フラグをＯＮし
（処理７２）、クリティカル処理１終了後にクリティカ
ル処理中フラグをＯＦＦする(処理７３)。なお、クリテ
ィカル処理中フラグのＯＮ，ＯＦＦの情報は冗長符号な
どの冗長な情報で表わすことによりクリティカル処理中
フラグをＯＮ，ＯＦＦさせる処理中の障害を検出するこ
とができる。Critical processing 1 when a failure occurs
In order to determine whether or not the process is being executed, the critical process flag is turned on before the critical process 1 (process 72), and the critical process flag is turned off after the critical process 1 is completed (process 73). The ON / OFF information of the critical processing flag is represented by redundant information such as a redundant code, so that a failure during the processing of turning the critical processing flag ON / OFF can be detected.

【００１８】一方、割込処理８０では障害割込み８１の
後にクリティカル処理中フラグがＯＮかどうかをチェッ
クし、クリティカル処理中フラグがＯＮの場合にはクリ
ティカル処理１を再実行し、ＯＦＦの場合には制御フレ
ーム終了まで時間待ち（処理７６）をする。クリティカ
ル処理１が再実行かどうかを示すために再実行前には再
実行フラグをＯＮする(処理８３)。特に、クリティカル
処理中フラグＯＦＦを特定の冗長な情報（符号語）で表
わし、それ以外はクリティカル処理中フラグをＯＮとす
れば、クリティカル処理１の最中だけでなくクリティカ
ル処理中フラグをＯＮ，ＯＦＦさせる処理の最中に障害
が発生した場合には、クリティカル処理中フラグは特定
の冗長な情報（符号語）以外の情報（非符号語）となる
ので、クリティカル処理中フラグがＯＮと認識されて、
クリティカル処理１が再度実行される。On the other hand, in the interrupt processing 80, after the failure interrupt 81, it is checked whether the critical processing flag is ON. If the critical processing flag is ON, the critical processing 1 is re-executed. The control waits for the end of the control frame (process 76). Before re-execution, the re-execution flag is turned ON to indicate whether or not the critical process 1 is re-executed (process 83). In particular, when the critical processing flag OFF is represented by specific redundant information (code word) and the other critical processing flag is turned ON, the critical processing flag is turned ON and OFF not only during the critical processing 1 If a failure occurs during the process to be performed, the critical processing flag becomes information (non-codeword) other than the specific redundant information (codeword), so that the critical processing flag is recognized as ON. ,
The critical process 1 is executed again.

【００１９】通常処理７０では再実行フラグがＯＮかど
うか判定し（処理７４）、ＯＦＦのときのみ、クリティ
カル処理１に続いてノンクリティカル処理２を実行し、
ＯＮのときにはノンクリティカル処理２を実行せずに、
再実行フラグをＯＦＦとする（処理７５）だけである。In the normal processing 70, it is determined whether or not the re-execution flag is ON (processing 74). Only when the re-execution flag is OFF, the non-critical processing 2 is executed following the critical processing 1,
When ON, the non-critical processing 2 is not executed,
Only the re-execution flag is turned off (process 75).

【００２０】以上述べた方法によれば、クリティカル処
理実行中に障害が発生しても、フレームの半分以上の残
り時間を用いて再度実行できるので、デッドラインまで
に処理結果を出力することができる。According to the above-described method, even if a failure occurs during the execution of the critical processing, the processing can be executed again using the remaining time of at least half of the frame, so that the processing result can be output before the deadline. .

【００２１】図８は障害割込み時にクリティカル処理１
を再実行するかどうかの判断（処理８４）を付加した実
施例である。判断結果８５が処理続行である場合には、
図７に示す実施例と同様にクリティカル処理１を再実行
する。判断結果８５が処理停止である場合には、処理を
停止する（処理８６）。処理を停止する際にはシステム
の動作の安全性を保証するために、システムの出力は安
全側出力とする。安全側出力はシステムの適用分野に異
なるが、例えば列車制御の分野では列車を停止させる指
令が安全側出力である。FIG. 8 shows a critical process 1 at the time of a fault interrupt.
This is an embodiment in which a determination (process 84) as to whether or not to re-execute is added. If the determination result 85 indicates that processing is to be continued,
The critical process 1 is executed again as in the embodiment shown in FIG. If the determination result 85 indicates that the processing has been stopped, the processing is stopped (step 86). When the processing is stopped, the output of the system is set to a safe side output in order to guarantee the safety of the operation of the system. The safety output differs depending on the application field of the system. For example, in the field of train control, a command to stop the train is the safety output.

【００２２】図８による障害発生時のシステムの動作を
図９に示す。障害が発生した後に割込処理８０を実行
し、そのあと処理停止８６となる。FIG. 9 shows the operation of the system when a failure occurs according to FIG. After the failure occurs, the interrupt processing 80 is executed, and then the processing is stopped 86.

【００２３】図１０〜図２１は判断結果８５の生成方法
を示したものである。FIGS. 10 to 21 show a method of generating the judgment result 85. FIG.

【００２４】図１０では、環境条件観測部９００によっ
て観測された環境条件の情報から再実行許容回数／頻度
テーブル９１により再実行許容回数／頻度を得る。一方
モジュール１１，１２は再実行回数を計数する機能９２
を有し、再実行回数／頻度を得る。再実行回数／頻度と
再実行許容回数／頻度とを比較し、実行回数／頻度が再
実行回数許容回数／頻度よりも小さい場合には判断結果
８５を処理続行とし、大きい場合には処理停止とする。In FIG. 10, the permissible re-execution frequency / frequency is obtained from the permissible re-execution frequency / frequency table 91 from the information on the environmental conditions observed by the environmental condition observing section 900. On the other hand, the modules 11 and 12 have a function 92 for counting the number of re-executions.
And the number of re-executions / frequency is obtained. The number of re-executions / frequency is compared with the number of permitted re-executions / frequency. If the number of executions / frequency is smaller than the allowed number of re-executions / frequency, the determination result 85 is determined to be the processing continuation. I do.

【００２５】ここでは、電子機器の障害が環境条件に密
接に関連している点に着目したもので、電子機器の障害
に密接に関連する環境条件としては、雷，電気雑音，宇
宙線などが挙げられる。従って環境条件観測部９００
は、これらの環境条件を観測する様々なセンサーが挙げ
られる。Here, attention is paid to the point that the failure of the electronic device is closely related to the environmental conditions. The environmental conditions closely related to the failure of the electronic device include lightning, electric noise, cosmic rays, and the like. No. Therefore, the environmental condition observation unit 900
Includes various sensors that observe these environmental conditions.

【００２６】以下、環境条件観測部９００の具体例につ
いて説明する。Hereinafter, a specific example of the environmental condition observing section 900 will be described.

【００２７】図１１は、環境条件観測部９００にカレン
ダーと時計機能を有するＲＴＣ（Real Time Clock)９０
を使用した場合を示したものである。ＲＴＣ９０から得
た月日時刻の情報から再実行許容回数／頻度テーブル９
１により再実行許容回数／頻度を得る。一方モジュール
１１，１２は再実行回数を計数する機能９２を有し、再
実行回数／頻度を得る。再実行回数／頻度と再実行許容
回数／頻度とを比較し、再実行回数／頻度が再実行回数
許容回数／頻度よりも小さい場合には判断結果８５を処
理続行とし、大きい場合には処理停止とする。このよう
に、ＲＴＣ（Real Time Clock)９０を用いれば、例えば
月日時刻に大きく依存する雷等による障害を回避するこ
とができ、野外に設置するシステムの障害の回避に利用
できる。FIG. 11 shows an RTC (Real Time Clock) 90 having a calendar and a clock function in the environmental condition observation unit 900.
It shows the case where is used. From the date and time information obtained from the RTC 90, the permissible number of re-executions / frequency table 9
The number of permissible re-executions / frequency is obtained by 1. On the other hand, the modules 11 and 12 have a function 92 for counting the number of re-executions, and obtain the number of re-executions / frequency. The number of re-executions / frequency is compared with the permissible number of re-executions / frequency. If the number of re-executions / frequency is smaller than the permissible number of re-executions / frequency, the determination result 85 is regarded as continuation. And As described above, if the RTC (Real Time Clock) 90 is used, it is possible to avoid, for example, a failure due to lightning that largely depends on the date and time, and it can be used to avoid a failure in a system installed outdoors.

【００２８】図１２は図１１の再実行許容回数／頻度テ
ーブル９１を示したものである。ここでは月別、時刻は
６時間ごとに分けて再実行許容頻度を示している。雷発
生の多い７月から８月の午後の許容頻度は３（回／時
間）と他の時期よりも再実行許容頻度は大きく設定して
いる。FIG. 12 shows the allowable re-execution number / frequency table 91 of FIG. Here, the time per month and the time are shown every six hours to indicate the permissible re-execution frequency. The allowable frequency in the afternoon from July to August, when lightning occurs frequently, is 3 (times / hour), and the allowable re-execution frequency is set higher than in other periods.

【００２９】図１３は環境条件観測部９００に、電源回
路９４に備えた瞬停／サージ検出回路９５を使用した場
合を示したものである。瞬停／サージ検出回路８５から
再実行許容回数／頻度テーブル９１により再実行許容回
数／頻度を得る。そして図１０で説明したのと同様に再
実行回数／頻度と再実行許容回数／頻度とを比較し、実
行回数／頻度が再実行回数許容回数／頻度よりも小さい
場合には判断結果８５を処理続行とし、大きい場合には
処理停止とする。FIG. 13 shows a case where an instantaneous interruption / surge detection circuit 95 provided in the power supply circuit 94 is used for the environmental condition observation section 900. The permissible number of re-executions / frequency is obtained from the instantaneous interruption / surge detection circuit 85 from the permissible number of re-executions / frequency table 91. Then, the number of re-executions / frequency is compared with the permissible number of re-executions / frequency as described with reference to FIG. 10, and if the number of executions / frequency is smaller than the permissible number of re-executions / frequency, the determination result 85 is processed. The processing is continued, and if it is larger, the processing is stopped.

【００３０】図１４は図１３の再実行許容回数／頻度テ
ーブル９１を示したものである。ここでは、瞬停／サー
ジ検出頻度よりも１（回／時間）多い頻度を再実行回数
許容頻度としている。FIG. 14 shows the permitted number of re-executions / frequency table 91 of FIG. Here, the frequency that is one (times / hour) greater than the instantaneous power failure / surge detection frequency is set as the re-execution frequency allowable frequency.

【００３１】図１５は、環境条件観測部９００が、アン
テナ９６，空電受信機９７、受信した空電の回数／頻度
を測定する測定器を備えたものを示したものである。こ
の場合も図１０と同様に再実行回数／頻度と再実行許容
回数／頻度テーブル９１からの再実行許容回数／頻度と
を比較し、実行回数／頻度が再実行回数許容回数／頻度
よりも小さい場合には判断結果８５を処理続行とし、大
きい場合には処理停止とする。FIG. 15 shows a configuration in which the environmental condition observation unit 900 includes an antenna 96, a static receiver 97, and a measuring device for measuring the number / frequency of received static. Also in this case, as in FIG. 10, the number of re-executions / frequency is compared with the number of permitted re-executions / frequency from the allowable number of re-executions / frequency table 91, and the number of executions / frequency is smaller than the allowed number of re-executions / frequency. In this case, the processing is continued with the judgment result 85, and when it is larger, the processing is stopped.

【００３２】図１６は障害発生時にアクセスしていたア
ドレスにより処理続行か処理停止かを決定する例を示し
たものである。ここでは障害発生時にアクセスしていた
アドレスがバックアップ領域であった場合には判断結果
８５を処理停止とし、そうでない場合には処理続行とす
る。FIG. 16 shows an example in which whether to continue or stop processing is determined based on the address accessed at the time of occurrence of the failure. Here, if the address accessed at the time of occurrence of the failure is the backup area, the determination result 85 is stopped, and if not, the processing is continued.

【００３３】以上説明したように環境条件観測部９００
には、環境に応じて様々な態様をとるが、上述のように
１つのものではなく、複数のものを組み合わせてもよ
い。As described above, the environmental condition observation unit 900
Takes various modes depending on the environment, but may be a combination of a plurality of components instead of one as described above.

【００３４】障害が発生した場合に、処理再実行に必要
な情報が障害によって失われないようにバックアップを
取る必要がある。処理再実行に必要な情報には例えば過
去の処理結果や、システムの動作モードなどの情報があ
る。これらの情報のバックアップの方法には例えば図１
７，図１８に示す方法と、図１９，図２０に示す方法な
どがある。When a failure occurs, it is necessary to make a backup so that information necessary for re-executing the process is not lost due to the failure. Information necessary for re-execution of the process includes, for example, information on a past process result and an operation mode of the system. For example, FIG.
7, the method shown in FIG. 18, and the method shown in FIGS.

【００３５】図１７，図１８に示す方法は、正常動作時
には図１７に示すようにクリティカル処理１終了後に処
理再実行に必要な情報をメモリの通常領域９４からバッ
クアップ領域９５にコピー即ちバックアップする。障害
発生時には図１８に示すようにバックアップ領域９５に
ある情報を通常領域９４にコピーすることにより回復し
て再実行する。In the method shown in FIGS. 17 and 18, during the normal operation, information necessary for re-executing the processing after the completion of the critical processing 1 is copied from the normal area 94 of the memory to the backup area 95 as shown in FIG. When a failure occurs, the information in the backup area 95 is copied to the normal area 94 as shown in FIG.

【００３６】図１９，図２０に示す方法は、正常動作時
には図１９に示すように２つの領域を交互に通常領域９
４，バックアップ領域９５として用い、制御フレームｉ
では通常領域９４として使用していた領域を制御フレー
ムｉ＋１ではバックアップ領域９５として用いる。従っ
て制御フレームｉで書き込まれた領域に制御フレームｉ
＋１では書き込みを行わないため、障害により制御フレ
ームｉ＋１で誤った情報をメモリに書き込んでも制御フ
レームｉで書き込まれた情報は破壊されない。従って障
害発生時には図２０に示すように領域９５にある情報を
用いて再実行することができる。The method shown in FIGS. 19 and 20 alternates the two regions in the normal region 9 during normal operation as shown in FIG.
4, used as the backup area 95, and
Then, the area used as the normal area 94 is used as the backup area 95 in the control frame i + 1. Therefore, the control frame i is stored in the area written in the control frame i.
Since writing is not performed at +1, even if erroneous information is written to the memory at the control frame i + 1 due to a failure, the information written at the control frame i is not destroyed. Therefore, when a failure occurs, re-execution can be performed using information in the area 95 as shown in FIG.

【００３７】以上述べたようにバックアップ領域には再
実行に必要な情報が格納されているため、この領域への
アクセス中に障害検出されたということは、再実行に必
要な情報が破壊された可能性が高いということになる。
従ってこのような場合には再実行せずに処理を停止す
る。As described above, since information necessary for re-execution is stored in the backup area, the fact that a failure is detected during access to this area means that the information necessary for re-execution has been destroyed. It is likely.
Therefore, in such a case, the processing is stopped without re-execution.

【００３８】図２１は図１６の実現に必要な障害発生時
にアクセスしていたアドレスを得るための構成である。
比較器１５からの割込信号１６によりアドレス記憶装置
151,１５２がモジュール１１，１２内のアドレスバス１
１１，１２１の内容を記憶する。続いて割込処理８０で
このアドレス記憶装置１５１，１５２の内容をみて、障
害が発生したのはバックアップ領域アクセス中であった
かどうかを判断できる。FIG. 21 shows a configuration for obtaining an address accessed at the time of occurrence of a failure necessary for realizing FIG.
Address storage device according to interrupt signal 16 from comparator 15
151 and 152 are the address bus 1 in the modules 11 and 12
11 and 121 are stored. Subsequently, by looking at the contents of the address storage devices 151 and 152 in the interrupt processing 80, it can be determined whether or not a failure has occurred during access to the backup area.

【００３９】以上述べたように図８〜図２１に示す実施
例によれば様々な情報により再実行回数許容回数／頻度
を木目細かく決定することにより、再実行により回復可
能なフォールトと回復不可能なフォールトとをより厳密
に区別できる。従って回復不可能なフォールト発生時に
は速やかにシステムの動作を安全側に停止して、フォー
ルトによるシステムの危険な動作を防止できる。As described above, according to the embodiment shown in FIG. 8 to FIG. 21, the allowable number of re-execution times / frequency is determined finely based on various information, so that a fault that can be recovered by re-execution and an unrecoverable fault can be recovered. Faults can be more strictly distinguished. Therefore, when an unrecoverable fault occurs, the operation of the system is immediately stopped on the safe side, and dangerous operation of the system due to the fault can be prevented.

【００４０】[0040]

【発明の効果】本発明によれば障害が発生してもデッド
ラインまでに処理結果を出力するシステムを提供するこ
とができる。さらに、フォールトによるシステムの危険
な動作を防止できる。According to the present invention, it is possible to provide a system for outputting a processing result by a deadline even if a failure occurs. Further, dangerous operation of the system due to a fault can be prevented.

[Brief description of the drawings]

【図１】本発明の基本的な動作。FIG. 1 shows the basic operation of the present invention.

【図２】クリティカル処理中の障害発生時の動作。FIG. 2 shows an operation when a failure occurs during critical processing.

【図３】ノンクリティカル処理中の障害発生時の動作。FIG. 3 shows an operation when a failure occurs during non-critical processing.

【図４】本発明が対象とするシステムのハード構成。FIG. 4 is a hardware configuration of a system to which the present invention is applied.

【図５】図４の構成でのクリティカル処理中の障害発生
時の動作。FIG. 5 is an operation when a failure occurs during critical processing in the configuration of FIG. 4;

【図６】図４の構成でのノンクリティカル処理中の障害
発生時の動作。FIG. 6 is an operation when a failure occurs during non-critical processing in the configuration of FIG. 4;

【図７】本発明の処理フロー。FIG. 7 is a processing flow of the present invention.

【図８】処理停止の判断を付加した処理フローの実施
例。FIG. 8 is an embodiment of a processing flow to which a determination of a processing stop is added.

【図９】図８のフローの動作。FIG. 9 shows the operation of the flow in FIG.

【図１０】環境条件観測による処理停止の判断を説明す
るための図。FIG. 10 is a diagram for explaining determination of processing stop based on environmental condition observation.

【図１１】月日時刻による処理停止の判断の具体例。FIG. 11 is a specific example of a process stop determination based on a date and time.

【図１２】図１０の実施例のための再実行許容頻度テー
ブルの具体例。FIG. 12 is a specific example of a re-execution permissible frequency table for the embodiment in FIG. 10;

【図１３】瞬停／サージ検出回数／頻度による処理停止
の判断の具体例。FIG. 13 is a specific example of determination of processing stop based on instantaneous power failure / surge detection frequency / frequency.

【図１４】図１２の実施例のための再実行許容頻度テー
ブルの具体例。FIG. 14 is a specific example of a re-execution allowable frequency table for the embodiment of FIG. 12;

【図１５】空電受信回数／頻度による処理停止の判断の
具体例。FIG. 15 is a specific example of a determination to stop processing based on the number / frequency of receiving static electricity.

【図１６】障害発生アドレスによる処理停止の判断の具
体例。FIG. 16 is a specific example of a process stop determination based on a failure occurrence address.

【図１７】バックアップの方法１。FIG. 17 shows backup method 1.

【図１８】バックアップの方法１（障害発生時）。FIG. 18 shows a backup method 1 (when a failure occurs).

【図１９】バックアップの方法２。FIG. 19 shows backup method 2.

【図２０】バックアップの方法２（障害発生時）。FIG. 20 shows backup method 2 (when a failure occurs).

【図２１】図１４の実施例のための障害アドレス検出機
能の構成。FIG. 21 shows a configuration of a failure address detection function for the embodiment of FIG. 14;

[Explanation of symbols]

１…クリティカル処理、２…ノンクリティカル処理、１
１，１２…モジュール、１５，１５′…比較器、１６…
割込信号、７０…通常処理、８０…割込処理。1: critical processing, 2: non-critical processing, 1
1,12 ... module, 15,15 '... comparator, 16 ...
Interrupt signal, 70: normal processing, 80: interrupt processing.

───────────────────────────────────────────────────── フロントページの続き (72)発明者宮崎直人茨城県日立市大みか町七丁目１番１号株式会社日立製作所日立研究所内 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Naoto Miyazaki 7-1-1, Omika-cho, Hitachi City, Ibaraki Prefecture Within Hitachi Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

An information processing system comprising: a memory for storing data; and a processing device for performing processing based on given data, wherein the processing is completed within a predetermined period. A first process for outputting the result of the above process,
A second process that does not require the end of the process within the period, and that can execute the first process and the second process within the period; An information processing system, wherein the first processing is executed before the second processing is executed.

2. The information processing system according to claim 1, wherein the processing device executes the first process again when a failure occurs during the execution of the first process. Information processing system.

3. The high-reliability system according to claim 1, wherein said processing device, when a failure occurs during execution of said second processing, at least within said next period. And outputting the result again.

4. The information processing system according to claim 1, wherein the processing device compares the number of re-executions / frequency of the processing with a predetermined allowable number of re-executions / frequency. Processing to execute the first processing or the second processing again based on the first processing or the second processing.
An information processing system characterized by stopping the processing of (1).

5. The information processing system according to claim 1, wherein said processing device stops processing when a failure occurs while accessing an address of a backup area of said memory. Processing system.