JP2009020545A

JP2009020545A - Anomaly monitoring device for computer

Info

Publication number: JP2009020545A
Application number: JP2007180414A
Authority: JP
Inventors: Yasushi Yaginuma; 康司柳沼
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 2007-07-10
Filing date: 2007-07-10
Publication date: 2009-01-29

Abstract

<P>PROBLEM TO BE SOLVED: To detect the abnormal operation of an OS or an application in an early stage, and to specify the application which has abnormally operated while decreasing the CPU load of a system. <P>SOLUTION: Each application 1A to 1C issues information to specify its own application and application information in which application-categorized monitoring timeout time is written as a request to validate a watch dog function, and successively stores it in an FIFO memory 4 through an operating system 2, and a local CPU 5 successively reads the application information, and operates watch dog monitoring about the application-categorized timeout time. The presence/absence of abnormality is monitored by every application-categorized operation confirmation, and the monitoring is operated by the local CPU through a monitoring agent and the operating system. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数のアプリケーションを搭載したコンピュータの異常監視装置に係り、特にアプリケーションやＯＳの異常動作を監視する装置に関する。 The present invention relates to an abnormality monitoring apparatus for computers equipped with a plurality of applications, and more particularly to an apparatus for monitoring abnormal operations of applications and OS.

コンピュータの異常動作監視は、一般にはウオッチドッグ機能が使用される。このウオッチドッグ機能による監視方式を図３で説明する。コンピュータのアプリケーション１は、複数のものが搭載され、それらの機能は、オペレーティングシステム（ＯＳ）２を介して実行される。アプリケーション１を実行するコンピュータの監視機能として、ウオッチドッグ機能を一定周期で有効にする通知を発生し、この通知がデバイスドライバ２Ａを通じてウオッチドッグ機能部３に通知される。ウオッチドッグ機能部３は、コンピュータシステムからの通知が一定時間内にないとき（タイムアウト）にコンピュータの異常動作として検出し、システムをリセットして再起動する。 A watchdog function is generally used for monitoring abnormal operation of a computer. A monitoring method using the watchdog function will be described with reference to FIG. A plurality of computer applications 1 are installed, and their functions are executed via an operating system (OS) 2. As a monitoring function of the computer that executes the application 1, a notification for enabling the watchdog function is generated at a certain period, and this notification is notified to the watchdog function unit 3 through the device driver 2A. The watchdog function unit 3 detects an abnormal operation of the computer when the notification from the computer system is not within a predetermined time (timeout), and resets and restarts the system.

このウオッチドッグ機能による監視方式は、システムの負荷増を防止するためには比較的長い時間（１秒から２秒程度）のタイマ時限に設定されるため、異常発生を迅速に検出できない。また、異常発生原因の特定まではできない。 Since the monitoring method using the watchdog function is set to a timer period of a relatively long time (about 1 to 2 seconds) in order to prevent an increase in the load on the system, the occurrence of an abnormality cannot be detected quickly. Also, it is impossible to identify the cause of the abnormality.

異常発生を迅速に検出する監視方式として、コンピュータシステムのアドレスバス上のアドレスデータを直接に監視し、アドレスデータが割り当てられていないアドレス空間になるときに異常として検出し、ＣＰＵに割り込みをかけるものがある（例えば、特許文献１参照）。
特開平８−２７８９０２号公報 As a monitoring method to quickly detect the occurrence of an abnormality, it directly monitors the address data on the address bus of the computer system, detects an abnormality when it becomes an address space to which no address data is assigned, and interrupts the CPU (For example, refer to Patent Document 1).
JP-A-8-278902

前記のように、ウオッチドッグ機能による監視方式は、異常発生を迅速に検出できない。また、ウオッチドッグ機能はタイムアウトとなった場合にＣＰＵに割り込みをかけてシステムを再起動するが、単にアプリケーションからウオッチドッグ機能にタイマ更新の通知がないという場合しか異常検知できない。また、異常が発生したという情報も残すことが困難である。 As described above, the monitoring method using the watchdog function cannot quickly detect the occurrence of an abnormality. When the watchdog function times out, the CPU is interrupted and the system is restarted. However, an abnormality can be detected only when there is no notification of timer update from the application to the watchdog function. It is also difficult to leave information that an abnormality has occurred.

この点、特許文献１の監視方式は、ＣＰＵの負荷増を招くことなく、異常発生を迅速に検出でき、また異常が発生したという情報も残すことができるが、アドレス空間の異常監視しかできない。このため、ＯＳやアプリケーション自体の異常動作を監視できない場合があり、さらにいずれのアプリケーションが異常かの特定ができない。 In this regard, the monitoring method of Patent Document 1 can quickly detect the occurrence of an abnormality without causing an increase in the load on the CPU and can also retain information that an abnormality has occurred, but can only monitor an abnormality in the address space. For this reason, the abnormal operation of the OS or the application itself may not be monitored, and it is not possible to identify which application is abnormal.

本発明の目的は、ＯＳやアプリケーションの異常動作を早期に検出でき、またシステムのＣＰＵ負担を下げながら、異常動作したアプリケーションを特定できるコンピュータの異常監視装置を提供することにある。 An object of the present invention is to provide an abnormality monitoring apparatus for a computer that can detect an abnormal operation of an OS or an application at an early stage and can identify an abnormally operated application while reducing the CPU load of the system.

本発明は、前記の課題を解決するため、アプリケーション別に個々に定めた時間により異常の有無を監視し、またはアプリケーション別の個々の動作確認により異常の有無を監視し、これら監視をローカルＣＰＵで行うようにしたもので、以下の構成を特徴とする。 In order to solve the above problems, the present invention monitors the presence / absence of an abnormality by a time determined individually for each application, or monitors the presence / absence of an abnormality by an individual operation check for each application, and performs these monitoring by a local CPU. As described above, it has the following configuration.

（１）複数のアプリケーションを搭載したコンピュータシステムの異常動作を監視する異常監視装置であって、
各アプリケーションは、自アプリケーションを特定する情報と、アプリケーション別の監視タイムアウト時間を記載したアプリケーション情報を、ウオッチドッグ機能を有効にする要求として発行する手段を備え、
前記アプリケーション情報をオペレーティングシステムのデバイスドライバを通じて与えられ、これを順次記憶するＦＩＦＯメモリを備え、
前記ＦＩＦＯメモリからアプリケーション情報を順次読み出し、アプリケーション別の前記時間についてウオッチドッグ監視を行い、この時間内に当該アプリケーションから次回の通知がないときに、当該アプリケーションの異常として検出するローカルＣＰＵを備えたことを特徴とする。 (1) An abnormality monitoring apparatus for monitoring abnormal operation of a computer system having a plurality of applications,
Each application has means for issuing information for identifying its own application and application information describing the monitoring timeout period for each application as a request to enable the watchdog function.
The application information is provided through a device driver of an operating system, and includes a FIFO memory that sequentially stores the application information.
It has a local CPU that sequentially reads application information from the FIFO memory, performs watchdog monitoring of the time for each application, and detects an abnormality of the application when there is no next notification from the application within this time It is characterized by.

（２）複数のアプリケーションを搭載したコンピュータシステムの異常動作を監視する異常監視装置であって、
各アプリケーションは、自アプリケーションを特定する情報を付した「監視機能を有効にする要求」を発行し、この後にアプリケーション別に定める一定周期で「動作中」通知を発行する手段を備え、
前記アプリケーションから発行された、「監視機能を有効にする要求」と「動作中」通知を受け、これら情報をオペレーティングシステム２を通してローカルＣＰＵに通知する監視エージェントを備え、
前記ローカルＣＰＵは、前記監視エージェントから「監視機能を有効にする要求」を受けたときに当該アプリケーションの異常監視に入り、前記「動作中」通知があるときにオペレーティングシステムおよび前記監視エージェントに「確認通知」を発行し、この「確認通知」に対して前記オペレーティングシステムおよび監視エージェントから「応答通知」が返されたときにオペレーティングシステムおよびアプリケーションおよび監視エージェントが正常に動作可能状態であると判断する手段を備えたことを特徴とする。 (2) An abnormality monitoring device for monitoring abnormal operation of a computer system equipped with a plurality of applications,
Each application is provided with a means for issuing a “request for enabling the monitoring function” with information identifying its own application, and thereafter issuing a “working” notification at a fixed period determined for each application.
A monitoring agent that receives a "request for enabling monitoring function" and "in operation" notifications issued from the application and notifies the local CPU of these information through the operating system 2;
When the local CPU receives a “request to enable the monitoring function” from the monitoring agent, the local CPU enters an abnormality monitoring of the application, and when there is a notification of “in operation”, the local CPU performs “confirmation” to the operating system and the monitoring agent. Means for determining that the operating system, the application, and the monitoring agent are normally operable when the “response notification” is returned from the operating system and the monitoring agent in response to the “confirmation notification”. It is provided with.

（３）前記ローカルＣＰＵは、アプリケーションまたはオペレーティングシステムの異常を判断したときに、不揮発性メモリにその旨の情報を記憶しておく手段を備えたことを特徴とする。 (3) The local CPU includes means for storing information to that effect in a nonvolatile memory when an abnormality of an application or an operating system is determined.

以上のとおり、本発明によれば、アプリケーション別に個々に定めた時間により異常の有無を監視し、またはアプリケーション別の個々の動作確認により異常の有無を監視し、これら監視をローカルＣＰＵで行うようにしたため、ＯＳやアプリケーションの異常動作を早期に検出でき、またシステムのＣＰＵ負担を下げながら、異常動作したアプリケーションを特定できる。 As described above, according to the present invention, the presence or absence of an abnormality is monitored by a time determined individually for each application, or the presence or absence of an abnormality is monitored by individual operation confirmation for each application, and these monitoring are performed by the local CPU. Therefore, the abnormal operation of the OS or application can be detected at an early stage, and the abnormally operated application can be specified while reducing the CPU load of the system.

（実施形態１）
図１は、本発明の実施形態を示す異常監視装置の要部構成図である。コンピュータシステムは複数のアプリケーション１Ａ〜１Ｃと、オペレーティングシステム（ＯＳ）２によって構築されものとする。このコンピュータシステムの監視装置として、本実施形態では、ＦＩＦＯ（ＦｉｒｓｔｉｎＦｉｒｓｔｏｕｔ）メモリ４とローカルＣＰＵ５と不揮発性メモリ６により異常監視機能を実現する。 (Embodiment 1)
FIG. 1 is a main part configuration diagram of an abnormality monitoring apparatus showing an embodiment of the present invention. The computer system is constructed by a plurality of applications 1A to 1C and an operating system (OS) 2. In this embodiment, as a monitoring device of this computer system, an abnormality monitoring function is realized by a FIFO (First in First Out) memory 4, a local CPU 5, and a nonvolatile memory 6.

本実施形態による異常監視処理を以下に説明する。 The abnormality monitoring process according to this embodiment will be described below.

（Ｓ１）アプリケーション１Ａ〜１Ｃは、その実行時など適当な時点で、自アプリケーションを特定する情報（ここではＩＤとする）と、アプリケーション別に個々に定めた監視タイムアウト時間を記載したアプリケーション情報を付して、ウオッチドッグ機能を有効にする要求を発行する。 (S1) Each of the applications 1A to 1C attaches information for identifying its own application (here, ID) and application information describing a monitoring timeout period individually determined for each application at an appropriate time such as the time of execution. Issue a request to enable the watchdog function.

（Ｓ２）ウオッチドッグ機能を有効にする発行の通知（アプリケーション情報つき）は、オペレーティングシステム２がデバイスドライバ２Ａを通じてＦＩＦＯメモリ４に書き込む。この通知は、アプリケーション１Ａ〜１Ｃから要求が発行される度に、ＦＩＦＯメモリ４に順次蓄積される。 (S2) The issuance notification (with application information) for enabling the watchdog function is written into the FIFO memory 4 by the operating system 2 through the device driver 2A. This notification is sequentially stored in the FIFO memory 4 every time a request is issued from the applications 1A to 1C.

（Ｓ３）ローカルＣＰＵ５は、ＦＩＦＯメモリ４からアプリケーション情報を順次読み出し、ＩＤ別にアプリケーションに個々に定めた時間情報についてウオッチドッグ機能による監視を行う。この監視で、各ＩＤについて個々に定めた時間内に当該アプリケーションから次回の通知がないときに、当該ＩＤをもつアプリケーションの異常として検出し、不揮発性メモリ６にその旨の情報を書き込んで保存すると共に、オペレーティングシステム２へのＮＭＩ割り込みなどによって、システムの再起動などを要求する。 (S3) The local CPU 5 sequentially reads the application information from the FIFO memory 4, and monitors the time information individually determined for each application by ID by the watch dog function. In this monitoring, when there is no next notification from the application within the time determined individually for each ID, it is detected as an abnormality of the application having the ID, and information to that effect is written and stored in the nonvolatile memory 6. At the same time, a system restart or the like is requested by an NMI interrupt or the like to the operating system 2.

したがって、本実施形態によれば、ローカルＣＰＵ５は、アプリケーション別に定めた時間（ウオッチドッグタイマ）でかつアプリケーションを特定して個々に異常監視ができ、いずれのアプリケーションに異常発生したかを検出、記憶することができる。この詳細な異常状態の記憶により、異常状態の解析も可能となる。 Therefore, according to the present embodiment, the local CPU 5 can monitor an abnormality individually by specifying the application at a time (watchdog timer) determined for each application, and detect and store which application has an abnormality. be able to. By storing the detailed abnormal state, it is possible to analyze the abnormal state.

また、頻繁に起動されるアプリケーションでは、それに定めるウオッチドッグタイマ時間を短くすることで、迅速な異常検出とシステム保護ができる。また、さらに、ローカルＣＰＵによる異常監視により、オペレーティングシステム側のメインＣＰＵの負担を軽減することができる。 In applications that are frequently started, it is possible to quickly detect anomalies and protect the system by shortening the watchdog timer time. Furthermore, the load on the main CPU on the operating system side can be reduced by monitoring the abnormality by the local CPU.

また、各アプリケーションからの通知は、オペレーティングシステム２を通して与えられることから、オペレーティングシステム２の異常動作も含めた監視ができる。 Further, since the notification from each application is given through the operating system 2, it is possible to monitor the abnormal operation of the operating system 2.

（実施形態２）
図２は、本発明の実施形態を示す異常監視装置の構成図である。コンピュータシステムは複数のアプリケーション１Ａ、１Ｂと、オペレーティングシステム（ＯＳ）２によって構築されるものとする。このコンピュータシステムの監視装置として、本実施形態では、監視エージェント（プログラム）７とローカルＣＰＵ８と不揮発性メモリ９により異常監視機能を実現する。 (Embodiment 2)
FIG. 2 is a configuration diagram of the abnormality monitoring apparatus showing the embodiment of the present invention. The computer system is constructed by a plurality of applications 1A and 1B and an operating system (OS) 2. In this embodiment, a monitoring agent (program) 7, a local CPU 8, and a nonvolatile memory 9 implement an abnormality monitoring function as a monitoring device of this computer system.

（Ｓ１１）アプリケーション１Ａ，１Ｂは、その実行時など適当な時点で、自アプリケーションを特定する情報（ここではＩＤとする）と、監視機能を有効にする要求を監視エージェント７に発行する。 (S11) The applications 1A and 1B issue, to the monitoring agent 7 a request for enabling the monitoring function and information for identifying the application (ID here) at an appropriate time such as the time of execution.

（Ｓ１２）監視機能を有効にする要求（アプリケーションＩＤ情報つき）を通知された監視エージェント７は、オペレーティングシステム２とデバイスドライバ２Ａを通じて、ローカルＣＰＵ８に通知する。 (S12) The monitoring agent 7 notified of the request for enabling the monitoring function (with application ID information) notifies the local CPU 8 through the operating system 2 and the device driver 2A.

（Ｓ１３）ローカルＣＰＵ８は、監視機能を有効にする要求の通知を、アプリケーションのＩＤで記憶しておく。 (S13) The local CPU 8 stores a notification of a request for enabling the monitoring function as an application ID.

（Ｓ１４）アプリケーションは、監視機能を有効にする要求を発行した後、一定周期（アプリケーション別に定めた時間）でＩＤ情報と共に監視エージェント７に動作中であることの「動作中」通知を発行する。 (S14) After issuing the request for enabling the monitoring function, the application issues an “in operation” notification that the operation is being performed to the monitoring agent 7 together with the ID information at a fixed period (time determined for each application).

（Ｓ１５）ローカルＣＰＵ８は、アプリケーションの「動作中」通知を監視エージェントからオペレーティングシステム２を介して受けたとき、一定周期でオペレーティングシステム２に「確認通知」を発行する。 (S15) The local CPU 8 issues a “confirmation notification” to the operating system 2 at regular intervals when receiving an “in operation” notification of the application from the monitoring agent via the operating system 2.

（Ｓ１６）オペレーティングシステム２は、直ちに「確認通知」の「受領通知」をローカルＣＰＵ８に通知する。この通知をもって、ローカルＣＰＵ８はオペレーティングシステム２が正常に稼動していると判断する。オペレーティングシステム２が稼動状態で無い場合、ローカルＣＰＵ８は不揮発性メモリ９にその旨の情報を書き込む。 (S16) The operating system 2 immediately notifies the local CPU 8 of “acknowledgment notification” of “confirmation notification”. With this notification, the local CPU 8 determines that the operating system 2 is operating normally. When the operating system 2 is not in an operating state, the local CPU 8 writes information to that effect in the nonvolatile memory 9.

（Ｓ１７）ローカルＣＰＵ８からの「確認通知」は、オペレーティングシステム２から監視エージェント７に通知する。 (S 17) “Confirmation notification” from the local CPU 8 is sent from the operating system 2 to the monitoring agent 7.

（Ｓ１８）監視エージェント７は、オペレーティングシステム２を介してローカルＣＰＵ８からの「確認通知」を受けたとき、アプリケーションＩＤ毎の稼動情報とともに「応答通知」をオペレーティングシステム２を通してローカルＣＰＵ８側に返す。 (S18) When the monitoring agent 7 receives the “confirmation notification” from the local CPU 8 via the operating system 2, it returns a “response notification” to the local CPU 8 side through the operating system 2 together with the operation information for each application ID.

（Ｓ１９）ローカルＣＰＵ８は、「応答通知」があることでオペレーティングシステム２およびアプリケーションおよび監視エージェント７が正常に動作可能状態であると判断する。 (S19) The local CPU 8 determines that the operating system 2, the application, and the monitoring agent 7 are normally operable due to the “response notification”.

アプリケーションまたは監視エージェント７が正常な稼動状態で無い場合、ローカルＣＰＵ８は、不揮発性メモリ９にその旨の情報を記憶する。また、アプリケーション毎の稼動状態をチェックし、該当アプリケーションが稼動状態で無い場合は不揮発性メモリ９にその旨の情報を記憶する。 When the application or the monitoring agent 7 is not in a normal operating state, the local CPU 8 stores information to that effect in the nonvolatile memory 9. Also, the operating state of each application is checked, and if the corresponding application is not in the operating state, information to that effect is stored in the nonvolatile memory 9.

（Ｓ２０）ローカルＣＰＵ８は、監視した状態、情報を何らかの通信経路で他のＣＰＵへ通知する。 (S20) The local CPU 8 notifies the monitored state and information to other CPUs through some communication path.

したがって、本実施形態によれば、アプリケーションとオペレーティングシステム２およびローカルＣＰＵとの間で、「応答通知」と「確認通知」のやり取りにより異常監視を行うため、ＯＳレベルの稼動状態の検出、記録、アプリケーションレベル（監視エージェント）の稼動状態の検出、記録、また、個々のアプリケーションプロゲラムの稼動状態の検出、記録をすることができる。また、詳細な情報が得られることにより異常状態の解析も可能となる。 Therefore, according to the present embodiment, the abnormality monitoring is performed by exchanging “response notification” and “confirmation notification” between the application and the operating system 2 and the local CPU. It is possible to detect and record the operating state of the application level (monitoring agent) and to detect and record the operating state of each application program. In addition, it is possible to analyze an abnormal state by obtaining detailed information.

また、実施形態１と同様に、頻繁に起動されるアプリケーションでは、それに定める一定周期の通知時間を短くすることで、迅速な異常検出とシステム保護ができる。また、さらに、ローカルＣＰＵによる異常監視により、オペレーティングシステム側のメインＣＰＵの負担を軽減することができる。 Similarly to the first embodiment, an application that is frequently started can quickly detect an abnormality and protect the system by shortening the notification time of a predetermined period. Furthermore, the load on the main CPU on the operating system side can be reduced by monitoring the abnormality by the local CPU.

本発明の実施形態１を示す異常監視装置の要部構成図。The principal part block diagram of the abnormality monitoring apparatus which shows Embodiment 1 of this invention. 本発明の実施形態２を示す異常監視装置の要部構成図。The principal part block diagram of the abnormality monitoring apparatus which shows Embodiment 2 of this invention. ウオッチドッグ機能による監視方式。Monitoring method with watchdog function.

Explanation of symbols

１Ａ〜１Ｃアプリケーション
２オペレーティングシステム
２Ａデバイスドライバ
４ＦＩＦＯメモリ
５ローカルＣＰＵ
６不揮発性メモリ
７監視エージェント
８ローカルＣＰＵ 1A to 1C Application 2 Operating system 2A Device driver 4 FIFO memory 5 Local CPU
6 Nonvolatile memory 7 Monitoring agent 8 Local CPU

Claims

An abnormality monitoring device that monitors abnormal operation of a computer system equipped with a plurality of applications,
Each application includes means for issuing information for identifying its own application and application information describing the monitoring timeout period for each application as a request to enable the watchdog function.
The application information is provided through an operating system device driver, and includes a FIFO memory for sequentially storing the application information,
A local CPU is provided that sequentially reads application information from the FIFO memory, performs watchdog monitoring of the time for each application, and detects that there is no next notification from the application within this time. An abnormality monitoring apparatus for a computer characterized by the above.

An abnormality monitoring device that monitors abnormal operation of a computer system equipped with a plurality of applications,
Each application is provided with a means for issuing a “request for enabling the monitoring function” with information for identifying its own application and thereafter issuing a “working” notification at a fixed period determined for each application.
A monitoring agent that receives a "request for enabling the monitoring function" and "in operation" notifications issued from the application and notifies the local CPU of these information through the operating system 2;
When the local CPU receives a “request to enable the monitoring function” from the monitoring agent, the local CPU enters an abnormality monitoring of the application, and when there is a notification of “in operation”, the local CPU performs “confirmation” to the operating system and the monitoring agent. Means for determining that the operating system, the application, and the monitoring agent are normally operable when the “response notification” is returned from the operating system and the monitoring agent in response to the “confirmation notification”. An abnormality monitoring apparatus for a computer, comprising:

3. The computer abnormality according to claim 1, wherein said local CPU comprises means for storing information to that effect in a nonvolatile memory when an abnormality of an application or an operating system is determined. Monitoring device.