JP2018160009A

JP2018160009A - Failure information processing program, computer, failure notification method, and computer system

Info

Publication number: JP2018160009A
Application number: JP2017055756A
Authority: JP
Inventors: 正吉小佐野; Masayoshi Osano
Original assignee: NEC Platforms Ltd
Current assignee: NEC Platforms Ltd
Priority date: 2017-03-22
Filing date: 2017-03-22
Publication date: 2018-10-11
Anticipated expiration: 2037-03-22
Also published as: JP7063445B2

Abstract

PROBLEM TO BE SOLVED: To provide a failure information processing program capable of preventing apparent performance deterioration when a failure has occurred.SOLUTION: A CPU 100 includes a BIOS 01 which functions as a first log acquisition unit, an analysis unit, a reduction unit, and a second log recording unit, by execution of a failure information processing program 03. A log collection unit of the BIOS 01 reads information from a register capable of identifying a failed section of hardware, and generates first log data on the basis of the information. A first log recording unit records the first log data in a first storage area. The first log acquisition unit acquires information stored in the first storage area. The analysis unit analyzes the acquired first log data. The reduction unit generates second log data with a data volume smaller than that of the first log data on the basis of the analysis result of the analysis unit. The second log recording unit records the second log data in a second storage area. A transmission unit transmits the second log data acquired by the second log acquisition unit to a management device.SELECTED DRAWING: Figure 2

Description

本発明は、障害情報処理プログラム、コンピュータ、障害通知方法、コンピュータシステムに関する。 The present invention relates to a failure information processing program, a computer, a failure notification method, and a computer system.

コンピュータにおいて障害が発生すると、コンピュータは処理をＯＳ（Operating System）からＢＩＯＳ（Basic Input Output System）にハンドオーバし、ＢＩＯＳによりログデータの採取がなされ、当該ログデータがＢＭＣ（Baseboard Management Controller）に送信される。ハンドオーバとは、ＣＰＵの制御権を移転することである。
特許文献１には、障害発生などによりＢＩＯＳとＢＭＣとの間の通信要求が発生した場合に、ＯＳに処理を再開させ、当該通信要求に伴う処理を分割して定期的に送信し、実行することが開示されている。 When a failure occurs in the computer, the computer hands over the processing from the OS (Operating System) to the BIOS (Basic Input Output System), the log data is collected by the BIOS, and the log data is transmitted to the BMC (Baseboard Management Controller). The Handover refers to transferring the control right of the CPU.
In Patent Document 1, when a communication request between the BIOS and the BMC occurs due to a failure or the like, the OS restarts the process, and the process associated with the communication request is divided and periodically transmitted and executed. It is disclosed.

特開２０１１−１６４９７１号公報JP 2011-164971 A

コンピュータに障害が発生してからログデータの送信が完了するまで、ＯＳは制御権を有しない。つまり、この間、ＯＳの処理は停止した状態となる。これにより、コンピュータに障害が発生したときに、見かけ上の処理性能の低下が生じてしまうという。
本発明の目的は、上述した課題を解決する障害情報処理プログラム、コンピュータ、障害通知方法、コンピュータシステムを提供することにある。 The OS has no control right until the transmission of the log data is completed after the failure of the computer. That is, during this time, the OS processing is stopped. As a result, when a failure occurs in the computer, the apparent processing performance is degraded.
An object of the present invention is to provide a failure information processing program, a computer, a failure notification method, and a computer system that solve the above-described problems.

本発明の第１の態様によれば、障害情報処理プログラムは、コンピュータに、前記コンピュータのＢＩＯＳによって生成された、自装置に発生した障害に関する第１障害情報を取得する取得ステップと、前記第１の障害情報に基づいて、前記第１障害情報のデータ量を縮小することで、管理装置に送信するための障害情報である第２障害情報を生成する縮小ステップと実行させる。 According to the first aspect of the present invention, the failure information processing program acquires, in the computer, first failure information related to a failure that has occurred in its own device, generated by the BIOS of the computer, and the first Based on the failure information, the data amount of the first failure information is reduced, thereby executing the reduction step of generating second failure information that is failure information to be transmitted to the management apparatus.

本発明の第２の態様によれば、コンピュータは、ＣＰＵと、前記ＣＰＵに、自装置に発生した障害に関する第１の障害情報を生成する生成ステップと、前記第１の障害情報に基づいて生成された第２の障害情報を管理装置に送信する送信ステップとを実行させるためのプログラムを含むＢＩＯＳを記憶する第１記憶装置と、上記態様に係る障害情報処理プログラムを記憶する第２記憶装置とを備える。 According to the second aspect of the present invention, the computer generates the CPU based on the first failure information, the generation step of generating first failure information related to the failure that has occurred in the CPU, and the CPU. A first storage device that stores a BIOS including a program for executing a transmission step of transmitting the second failure information that has been transmitted to the management device, and a second storage device that stores a failure information processing program according to the above aspect Is provided.

本発明の第３の態様によれば、障害通知方法は、コンピュータがＢＩＯＳの実行により、当該コンピュータに発生した障害に関する第１の障害情報を取得する取得ステップと、前記第１の障害情報に基づいて、前記第１障害情報のデータ量を縮小した障害情報である第２障害情報を生成する縮小ステップと、前記第２の障害情報を管理装置に送信する送信ステップとを含む。 According to the third aspect of the present invention, the failure notification method is based on the acquisition step in which the computer acquires the first failure information related to the failure that has occurred in the computer by executing the BIOS, and the first failure information. A reduction step of generating second failure information which is failure information obtained by reducing the data amount of the first failure information, and a transmission step of transmitting the second failure information to the management device.

本発明の第４の態様によれば、コンピュータシステムは、自装置に発生した障害に関する第１障害情報を解析することで第２障害情報を生成するコンピュータと、前記第２障害情報を解析する管理装置とを備える。 According to the fourth aspect of the present invention, the computer system includes a computer that generates the second failure information by analyzing the first failure information related to the failure that has occurred in the device, and the management that analyzes the second failure information. Device.

上記態様のうち少なくとも１つの態様によれば、コンピュータは、障害が発生したときに、見かけ上の処理性能の低下が生じることを防ぐことができる。 According to at least one of the above aspects, the computer can prevent an apparent decrease in processing performance when a failure occurs.

第１の実施形態に係るコンピュータのハードウェア構成を示す概略ブロック図である。It is a schematic block diagram which shows the hardware constitutions of the computer which concerns on 1st Embodiment. 第１の実施形態に係るコンピュータのソフトウェア構成を示す概略ブロック図である。It is a schematic block diagram which shows the software configuration of the computer which concerns on 1st Embodiment. 第１の実施形態に係るコンピュータによるログ収集処理を示すフローチャートである。It is a flowchart which shows the log collection process by the computer which concerns on 1st Embodiment. 第１の実施形態に係るコンピュータによる障害通知処理を示すフローチャートである。It is a flowchart which shows the failure notification process by the computer which concerns on 1st Embodiment. コンピュータの基本構成を示す概略ブロック図である。It is a schematic block diagram which shows the basic composition of a computer. コンピュータシステムの基本構成を示す概略ブロック図である。It is a schematic block diagram which shows the basic composition of a computer system.

以下、図面を参照しながら実施形態について詳しく説明する。
図１は、第１の実施形態に係るコンピュータのハードウェア構成を示す概略ブロック図である。
コンピュータ１は、ＣＰＵ１００、メインメモリ２００、不揮発メモリ３００、ストレージ４００、インタフェース５００を備える。
ＣＰＵ１００は、プログラムを不揮発メモリ３００またはストレージ４００から読み出してメインメモリ２００に展開し、当該プログラムに従って処理を実行する。
不揮発メモリ３００は、ＢＩＯＳ０１を記憶する。不揮発メモリ３００の例としては、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read-Only Memory）、フラッシュメモリなどが挙げられる。
ストレージ４００は、ＯＳ０２、およびＯＳ０２上で動作するアプリケーションプログラムである障害情報処理プログラム０３を記憶する。ストレージ４００の例としては、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＤＶＤ−ＲＯＭ（Digital Versatile Disc Read Only Memory）、半導体メモリ等が挙げられる。ストレージ４００は、コンピュータ１のバスに直接接続された内部メディアであってもよいし、インタフェース５００または通信回線を介してコンピュータ１に接続される外部メディアであってもよい。
メインメモリ２００、不揮発メモリ３００およびストレージ４００は、記憶装置の一例である。 Hereinafter, embodiments will be described in detail with reference to the drawings.
FIG. 1 is a schematic block diagram illustrating a hardware configuration of a computer according to the first embodiment.
The computer 1 includes a CPU 100, a main memory 200, a nonvolatile memory 300, a storage 400, and an interface 500.
The CPU 100 reads a program from the nonvolatile memory 300 or the storage 400, expands it in the main memory 200, and executes processing according to the program.
The nonvolatile memory 300 stores the BIOS01. Examples of the non-volatile memory 300 include an EEPROM (Electrically Erasable Programmable Read-Only Memory), a flash memory, and the like.
The storage 400 stores OS02 and a failure information processing program 03 that is an application program that runs on the OS02. Examples of the storage 400 include an HDD (Hard Disk Drive), an SSD (Solid State Drive), a magnetic disk, a magneto-optical disk, a CD-ROM (Compact Disc Read Only Memory), and a DVD-ROM (Digital Versatile Disc Read Only Memory). And semiconductor memory. The storage 400 may be an internal medium directly connected to the bus of the computer 1 or an external medium connected to the computer 1 via the interface 500 or a communication line.
The main memory 200, the non-volatile memory 300, and the storage 400 are examples of storage devices.

また、ＣＰＵ１００は、ＢＩＯＳ０１に従って、第１記憶領域Ｍ１および第２記憶領域Ｍ２をメインメモリ２００に確保する。
第１記憶領域Ｍ１は、ＢＩＯＳ０１によって生成されたログデータである第１ログデータと障害の検出日時と解析済フラグとを記憶する領域である。解析済フラグは、関連付けられた第１ログデータの解析が完了しているか否かを示すビットであって、解析されていない場合に０を示し、解析されている場合に１を示す。
第２記憶領域Ｍ２は、障害情報処理プログラム０３によって生成されたログデータである第２ログデータと障害の検出日時と通報済フラグとを記憶する領域である。通報済フラグは、関連付けられた第２ログデータのＢＭＣへの通報が完了しているか否かを示すビットであって、通報されていない場合に０を示し、通報されている場合に１を示す。
第１記憶領域Ｍ１および第２記憶領域Ｍ２は、いずれもＢＩＯＳ０１と障害情報処理プログラム０３との両方によって参照される。そのため、ＣＰＵ１００は、メインメモリ２００のうちアドレスが変動しないメモリブロック（固定メモリ）の予め定められたアドレスに、予め定められたサイズの領域を、第１記憶領域Ｍ１および第２記憶領域Ｍ２として確保する。 Further, the CPU 100 secures the first storage area M1 and the second storage area M2 in the main memory 200 in accordance with the BIOS01.
The first storage area M1 is an area for storing first log data, which is log data generated by the BIOS 01, a failure detection date and time, and an analyzed flag. The analyzed flag is a bit indicating whether or not the analysis of the associated first log data has been completed, and indicates 0 when not analyzed and indicates 1 when analyzed.
The second storage area M2 is an area for storing second log data, which is log data generated by the failure information processing program 03, a failure detection date and time, and a reported flag. The reported flag is a bit indicating whether or not reporting to the BMC of the associated second log data has been completed, and indicates 0 when not reported and indicates 1 when reported. .
Both the first storage area M1 and the second storage area M2 are referred to by both the BIOS 01 and the failure information processing program 03. Therefore, the CPU 100 secures areas of a predetermined size as the first storage area M1 and the second storage area M2 at a predetermined address of a memory block (fixed memory) whose address does not change in the main memory 200. To do.

図２は、第１の実施形態に係るコンピュータのソフトウェア構成を示す概略ブロック図である。
ＣＰＵ１００は、ＢＩＯＳ０１の実行により、ログ収集部１０１、第１ログ記録部１０２、第２ログ取得部１０３、送信部１０４として機能する。 FIG. 2 is a schematic block diagram illustrating a software configuration of the computer according to the first embodiment.
The CPU 100 functions as the log collection unit 101, the first log recording unit 102, the second log acquisition unit 103, and the transmission unit 104 by executing the BIOS 01.

ログ収集部１０１は、障害を検出したハードウェアの障害箇所を特定できる情報を保持しているレジスタをリードし、リードした情報に基づいて第１ログデータを生成する。 The log collection unit 101 reads a register holding information that can identify the failure location of the hardware that has detected the failure, and generates first log data based on the read information.

第１ログ記録部１０２は、ログ収集部１０１が生成した第１ログデータを、解析済フラグに関連付けて第１記憶領域Ｍ１に記録する。第１ログ記録部１０２は、第１ログデータの記録時に、解析済フラグを０にセットする。 The first log recording unit 102 records the first log data generated by the log collection unit 101 in the first storage area M1 in association with the analyzed flag. The first log recording unit 102 sets the analyzed flag to 0 when recording the first log data.

第２ログ取得部１０３は、第２記憶領域Ｍ２に格納されている情報を取得する。第２記憶領域Ｍ２は固定メモリのメモリブロックに確保されているため、第２ログ取得部１０３は、第２記憶領域Ｍ２を参照することで、障害情報処理プログラム０３が生成した第２ログデータを取得することができる。第２ログ取得部１０３は、０を示す解析済フラグに関連付けられた第２ログデータを取得する。 The second log acquisition unit 103 acquires information stored in the second storage area M2. Since the second storage area M2 is secured in the memory block of the fixed memory, the second log acquisition unit 103 refers to the second storage area M2, and stores the second log data generated by the failure information processing program 03. Can be acquired. The second log acquisition unit 103 acquires the second log data associated with the analyzed flag indicating 0.

送信部１０４は、第２ログ取得部１０３が取得した第２ログデータを、ＢＭＣに送信する。送信部１０４は、第２記憶領域Ｍ２が記憶する通報済フラグのうち、送信が完了した第２ログデータに関連付けられたものを１にセットする。 The transmission unit 104 transmits the second log data acquired by the second log acquisition unit 103 to the BMC. The transmission unit 104 sets 1 in association with the second log data for which transmission has been completed among the reported flags stored in the second storage area M2.

ＣＰＵ１００は、障害情報処理プログラム０３の実行により、第１ログ取得部１０５、解析部１０６、縮小部１０７、第２ログ記録部１０８として機能する。 The CPU 100 functions as a first log acquisition unit 105, an analysis unit 106, a reduction unit 107, and a second log recording unit 108 by executing the failure information processing program 03.

第１ログ取得部１０５は、第１記憶領域Ｍ１に格納されている情報を取得する。第１記憶領域Ｍ１は固定メモリのメモリブロックに確保されているため、第１ログ取得部１０５は、第１記憶領域Ｍ１を参照することで、ＢＩＯＳ０１が生成した第１ログデータを取得することができる。第１ログ取得部１０５は、０を示す解析済フラグに関連付けられた第１ログデータを取得する。 The first log acquisition unit 105 acquires information stored in the first storage area M1. Since the first storage area M1 is secured in the memory block of the fixed memory, the first log acquisition unit 105 can acquire the first log data generated by the BIOS 01 by referring to the first storage area M1. it can. The first log acquisition unit 105 acquires first log data associated with an analyzed flag indicating 0.

解析部１０６は、取得された第１ログデータの解析を実行する。つまり、ＣＰＵ１００は、ＢＭＣによる解析の前に、ＯＳ０２上で一時解析を行う。解析部１０６は、第１記憶領域Ｍ１に記録された解析済フラグのうち、解析が完了した第１ログデータに関連付けられたものを１にセットする。 The analysis unit 106 performs analysis of the acquired first log data. That is, the CPU 100 performs a temporary analysis on the OS 02 before the analysis by the BMC. The analysis unit 106 sets 1 in the analyzed flag recorded in the first storage area M1 and associated with the first log data that has been analyzed.

縮小部１０７は、解析部１０６による解析の結果に基づいて、第１ログデータよりデータ量の小さい第２ログデータを生成する。例えば、縮小部１０７は、第１ログデータのうち、障害との関係性の小さいイベントログを削除することで、データ量を削減する。なお、縮小部１０７は、解析部１０６による解析の終了後、第２ログデータを生成するため、解析済フラグは、第１ログデータに基づいて第２ログデータが作成されたか否かを示す情報であるともいえる。 The reduction unit 107 generates second log data having a data amount smaller than that of the first log data based on the analysis result by the analysis unit 106. For example, the reduction unit 107 reduces the data amount by deleting an event log having a small relationship with the failure from the first log data. Since the reduction unit 107 generates the second log data after the analysis by the analysis unit 106 is completed, the analyzed flag is information indicating whether or not the second log data has been created based on the first log data. It can be said that.

第２ログ記録部１０８は、第２ログデータを、通報済フラグに関連付けて第２記憶領域Ｍ２に記録する。第２ログ記録部１０８は、第２ログデータの記録時に、通報済フラグを０にセットする。 The second log recording unit 108 records the second log data in the second storage area M2 in association with the reported flag. The second log recording unit 108 sets the reported flag to 0 when recording the second log data.

図３は、第１の実施形態に係るコンピュータによるログ収集処理を示すフローチャートである。
コンピュータ１に障害が発生し、障害を検出したハードウェアがＣＰＵ１００に割り込み要求を発すると、ログ収集処理を開始する。まず、ＣＰＵ１００は、ハンドオーバにより、ＯＳ０２からＢＩＯＳ０１に制御権を移転する（ステップＳ００１）。ＢＩＯＳ０１の実行により、ＣＰＵ１００のログ収集部１０１は、ハードウェアの障害箇所を特定可能な情報を保持するレジスタから情報をリードし、これに基づいて第１ログデータを生成する（ステップＳ００２）。 FIG. 3 is a flowchart showing log collection processing by the computer according to the first embodiment.
When a failure occurs in the computer 1 and the hardware that detects the failure issues an interrupt request to the CPU 100, log collection processing is started. First, the CPU 100 transfers the control right from the OS 02 to the BIOS 01 by handover (step S001). By executing the BIOS 01, the log collection unit 101 of the CPU 100 reads information from a register that holds information that can identify a hardware failure location, and generates first log data based on the read information (step S002).

次に、第１ログ記録部１０２は、第１記憶領域Ｍ１に空きがあるか否かを判定する（ステップＳ００３）。第１記憶領域Ｍ１に空きがない場合（ステップＳ００３：ＮＯ）、第１記憶領域Ｍ１が記憶する最も古い第１ログデータを削除する（ステップＳ００４）。第１記憶領域Ｍ１に空きがある場合（ステップＳ００３：ＹＥＳ）、または第１記憶領域Ｍ１から第１ログデータを削除した場合、第１ログ記録部１０２は、ステップＳ００２で生成された第１ログデータを、第１記憶領域Ｍ１に記録する（ステップＳ００５）。このとき、第１ログ記録部１０２は、第１ログデータに関連付けられた解析済フラグを０にセットする。なお、第１ログ記録部１０２は、第１記憶領域Ｍ１の空き領域の末尾に第１ログデータを記録する。 Next, the first log recording unit 102 determines whether or not there is a free space in the first storage area M1 (step S003). If there is no free space in the first storage area M1 (step S003: NO), the oldest first log data stored in the first storage area M1 is deleted (step S004). When the first storage area M1 is empty (step S003: YES) or when the first log data is deleted from the first storage area M1, the first log recording unit 102 generates the first log generated in step S002. Data is recorded in the first storage area M1 (step S005). At this time, the first log recording unit 102 sets the analyzed flag associated with the first log data to 0. The first log recording unit 102 records the first log data at the end of the free area of the first storage area M1.

次に、第１ログ記録部１０２は、ハンドオーバにより、ＢＩＯＳ０１からＯＳ０２に制御権を移転し（ステップＳ００６）、ログ収集処理を終了する。これにより、コンピュータ１は、ログデータをＢＭＣに送信する前にＯＳ０２が制御権を取得する。 Next, the first log recording unit 102 transfers the control right from the BIOS 01 to the OS 02 by handover (step S006), and ends the log collection processing. As a result, in the computer 1, the OS 02 acquires the control right before sending the log data to the BMC.

図４は、第１の実施形態に係るコンピュータによる障害通知処理を示すフローチャートである。
ＣＰＵ１００は、所定周期に係るタイミングに障害情報処理プログラム０３を実行し、障害通知処理を開始する。まず、第１ログ取得部１０５は、第１記憶領域Ｍ１に格納された情報を参照し、０を示す解析済フラグがあるか否かを判定する（ステップＳ１０１）。０を示す解析済フラグが無い場合（ステップＳ１０１：ＮＯ）、コンピュータ１は、障害通知処理を終了する。 FIG. 4 is a flowchart showing failure notification processing by the computer according to the first embodiment.
The CPU 100 executes the failure information processing program 03 at a timing related to a predetermined cycle, and starts failure notification processing. First, the first log acquisition unit 105 refers to the information stored in the first storage area M1, and determines whether there is an analyzed flag indicating 0 (step S101). If there is no analyzed flag indicating 0 (step S101: NO), the computer 1 ends the failure notification process.

他方、０を示す解析済フラグがある場合（ステップＳ１０１：ＹＥＳ）、第１ログ取得部１０５は、０を示す解析済フラグに関連付けられた第１ログデータを取得する（ステップＳ１０２）。次に、解析部１０６は、取得された第１ログデータを解析する（ステップＳ１０３）。縮小部１０７は、解析部１０６による解析結果に基づいて、第１ログデータのデータ量を削減した第２ログデータを生成する（ステップＳ１０４）。解析部１０６は、解析対象となった第１ログデータに関連付けられた解析済フラグの値を１にセットする（ステップＳ１０５）。 On the other hand, when there is an analyzed flag indicating 0 (step S101: YES), the first log acquisition unit 105 acquires first log data associated with the analyzed flag indicating 0 (step S102). Next, the analysis unit 106 analyzes the acquired first log data (step S103). The reduction unit 107 generates second log data in which the data amount of the first log data is reduced based on the analysis result by the analysis unit 106 (step S104). The analysis unit 106 sets the value of the analyzed flag associated with the first log data to be analyzed to 1 (step S105).

次に、第２ログ記録部１０８は、第２記憶領域Ｍ２に空きがあるか否かを判定する（ステップＳ１０６）。第２記憶領域Ｍ２に空きがない場合（ステップＳ１０６：ＮＯ）、第２記憶領域Ｍ２が記憶する最も古い第２ログデータを削除する（ステップＳ１０７）。第２記憶領域Ｍ２に空きがある場合（ステップＳ１０６：ＹＥＳ）、または第２記憶領域Ｍ２から第２ログデータを削除した場合、第２ログ記録部１０８は、ステップＳ１０４で生成された第２ログデータを、第２記憶領域Ｍ２に記録する（ステップＳ１０８）。このとき、第２ログ記録部１０８は、第２ログデータに関連付けられた通報済フラグを０にセットする。なお、第２ログ記録部１０８は、第２記憶領域Ｍ２の空き領域の末尾に第２ログデータを記録する。 Next, the second log recording unit 108 determines whether or not there is a free space in the second storage area M2 (step S106). If there is no free space in the second storage area M2 (step S106: NO), the oldest second log data stored in the second storage area M2 is deleted (step S107). When the second storage area M2 is empty (step S106: YES), or when the second log data is deleted from the second storage area M2, the second log recording unit 108 generates the second log generated in step S104. Data is recorded in the second storage area M2 (step S108). At this time, the second log recording unit 108 sets the reported flag associated with the second log data to 0. The second log recording unit 108 records the second log data at the end of the free area of the second storage area M2.

次に、ＣＰＵ１００は、ハンドオーバにより、ＯＳ０２からＢＩＯＳ０１に制御権を移転する（ステップＳ１０９）。ＢＩＯＳ０１の実行により、送信部１０４は、第２記憶領域Ｍ２に格納された情報を参照し、０を示す通報済フラグがあるか否かを判定する（ステップＳ１１０）。０を示す通報済フラグが無い場合（ステップＳ１１０：ＮＯ）、ＣＰＵ１００は、ハンドオーバにより、ＢＩＯＳ０１からＯＳ０２に制御権を移転し（ステップＳ１１１）、障害通知処理を終了する。 Next, the CPU 100 transfers the control right from the OS 02 to the BIOS 01 by handover (step S109). By executing the BIOS 01, the transmission unit 104 refers to the information stored in the second storage area M2, and determines whether there is a reported flag indicating 0 (step S110). If there is no reported flag indicating 0 (step S110: NO), the CPU 100 transfers the control right from the BIOS 01 to the OS 02 by the handover (step S111), and ends the failure notification process.

他方、０を示す通報済フラグがある場合（ステップＳ１１０：ＹＥＳ）、第２ログ取得部１０３は、０を示す解析済フラグに関連付けられた第２ログデータを取得する（ステップＳ１１２）。次に、送信部１０４は、ＢＭＣに取得された第２ログデータを送信する（ステップＳ１１３）。第２ログデータは、第１ログデータよりデータ量が小さいため、第２ログデータの送信時間は第１ログデータの送信時間より短い。送信部１０４は、第２ログデータの送信を完了すると、当該第２ログデータに関連付けられた通報済フラグの値を１にセットする（ステップＳ１１４）。そして、ＣＰＵ１００は、ハンドオーバにより、ＢＩＯＳ０１からＯＳ０２に制御権を移転し（ステップＳ１１１）、障害通知処理を終了する。 On the other hand, when there is a reported flag indicating 0 (step S110: YES), the second log acquisition unit 103 acquires second log data associated with the analyzed flag indicating 0 (step S112). Next, the transmission unit 104 transmits the second log data acquired to the BMC (Step S113). Since the second log data has a smaller data amount than the first log data, the transmission time of the second log data is shorter than the transmission time of the first log data. When the transmission unit 104 completes the transmission of the second log data, the transmission unit 104 sets the value of the reported flag associated with the second log data to 1 (step S114). Then, the CPU 100 transfers the control right from the BIOS 01 to the OS 02 by the handover (step S111), and ends the failure notification process.

このように、第１の実施形態によれば、コンピュータ１に障害が発生した場合、ＣＰＵ１００は、障害情報処理プログラム０３の実行により、第１ログデータのデータ量を縮小して第２ログデータを生成する。発明者は、一般的なコンピュータにおける障害処理において、処理時間の多くがログデータの送信に費やされているという知見を得ている。つまり、第１の実施形態に係るコンピュータ１によれば、ＢＭＣに送信すべきログデータのデータ量を削減することで、ＢＩＯＳ０１が制御権を占有する時間が短くなることがわかる。 As described above, according to the first embodiment, when a failure occurs in the computer 1, the CPU 100 reduces the data amount of the first log data by executing the failure information processing program 03 and outputs the second log data. Generate. The inventor has obtained the knowledge that much of the processing time is spent on transmitting log data in failure processing in a general computer. That is, according to the computer 1 according to the first embodiment, it can be understood that the time for which the BIOS 01 occupies the control right is shortened by reducing the amount of log data to be transmitted to the BMC.

また、第１の実施形態によれば、コンピュータ１は、障害情報処理プログラム０３の実行により、第１ログデータを解析し、これに基づいて第２ログデータを生成する。つまり、コンピュータ１は、第１ログデータの一次解析を行い、その解析結果である第２ログデータをＢＭＣに解析させる。これにより、コンピュータ１は、ログデータの解析精度を確保しつつ、ＢＩＯＳ０１が制御権を占有する時間を短くすることができる。 According to the first embodiment, the computer 1 analyzes the first log data by executing the failure information processing program 03, and generates the second log data based on the first log data. That is, the computer 1 performs a primary analysis of the first log data and causes the BMC to analyze the second log data that is the analysis result. Thereby, the computer 1 can shorten the time for which the BIOS 01 occupies the control right while ensuring the analysis accuracy of the log data.

以上、図面を参照して一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、様々な設計変更等をすることが可能である。
第１の実施形態に係るコンピュータ１は、ＯＳ０２上で動作する障害情報処理プログラム０３によって、第１ログ取得部１０５、解析部１０６、縮小部１０７、第２ログ記録部１０８として機能するが、これに限られない。例えば、他の実施形態においては、ＢＩＯＳ０１がこれらの少なくとも一部の機能を実現するためのプログラムを含むものであってもよい。この場合、第１ログデータの生成後にＯＳ０２へのハンドオーバがなされないが、送信するデータ量が削減されるため、ＢＩＯＳ０１によって第１ログデータを送信する場合と比較して、ＯＳ０２の性能低下を防ぐことができる。また例えば、他の実施形態においては、ＯＳ０２がこれらの少なくとも一部の機能を実現するためのプログラムを含むものであってもよい。 As described above, the embodiment has been described in detail with reference to the drawings. However, the specific configuration is not limited to that described above, and various design changes and the like can be made.
The computer 1 according to the first embodiment functions as the first log acquisition unit 105, the analysis unit 106, the reduction unit 107, and the second log recording unit 108 by the failure information processing program 03 that runs on the OS02. Not limited to. For example, in another embodiment, the BIOS 01 may include a program for realizing at least some of these functions. In this case, the handover to the OS02 is not performed after the first log data is generated, but the amount of data to be transmitted is reduced, so that the performance degradation of the OS02 is prevented compared to the case where the first log data is transmitted by the BIOS01. be able to. For example, in another embodiment, the OS 02 may include a program for realizing at least some of these functions.

また、第１の実施形態に係るコンピュータ１は、ＢＩＯＳ０１によって、第２ログデータがＢＭＣに送信されるが、これに限られない。例えば、他の実施形態に係るコンピュータ１は、障害情報処理プログラム０３またはＯＳ０２によって第２ログデータを送信してもよい。 Moreover, although the computer 1 which concerns on 1st Embodiment transmits 2nd log data to BMC by BIOS01, it is not restricted to this. For example, the computer 1 according to another embodiment may transmit the second log data by the failure information processing program 03 or the OS02.

また、第１の実施形態に係る障害情報処理プログラム０３は、コンピュータ１に第１ログデータを解析させるが、これに限られない。例えば、他の実施形態に係る障害情報処理プログラム０３は、コンピュータ１に第１ログデータの間引きなどにより、解析なしにデータ量を削減させるものであってもよい。 The fault information processing program 03 according to the first embodiment causes the computer 1 to analyze the first log data, but is not limited thereto. For example, the failure information processing program 03 according to another embodiment may reduce the amount of data without analysis by causing the computer 1 to thin out the first log data.

また、第１の実施形態において、第１記憶領域Ｍ１および第２記憶領域Ｍ２がメインメモリ２００上に確保されるが、これに限られない。例えば、他の実施形態においては、第１記憶領域Ｍ１および第２記憶領域Ｍ２が不揮発メモリ３００上に確保されてもよい。この場合、シャットダウン等によりコンピュータ１のサービスが停止しても、ログデータを保持し続けることができる。 In the first embodiment, the first storage area M1 and the second storage area M2 are secured on the main memory 200. However, the present invention is not limited to this. For example, in another embodiment, the first storage area M1 and the second storage area M2 may be secured on the nonvolatile memory 300. In this case, even if the service of the computer 1 is stopped due to shutdown or the like, the log data can be retained.

また、第１の実施形態に係るコンピュータ１は、ＢＭＣにログデータを送信するが、これに限られない。例えば、他の実施形態に係るコンピュータ１は、ＢＭＣにログデータ以外の障害情報（例えば、障害の解析結果）を送信してもよい。この場合、コンピュータ１は、当該障害情報（第１の障害情報）のデータ量を小さくした第２の障害情報を生成する。 Moreover, although the computer 1 which concerns on 1st Embodiment transmits log data to BMC, it is not restricted to this. For example, the computer 1 according to another embodiment may transmit failure information other than log data (for example, failure analysis result) to the BMC. In this case, the computer 1 generates second failure information in which the data amount of the failure information (first failure information) is reduced.

《障害情報処理プログラムの基本構成》
図５は、コンピュータの基本構成を示す概略ブロック図である。
上述した実施形態では、障害情報処理プログラムを実行するコンピュータの一実施形態として図２に示す構成について説明したが、障害情報処理プログラムを実行するコンピュータの基本構成は、図５に示すとおりである。
すなわち、コンピュータ９は、ＣＰＵ９１と、第１記憶装置９２と、第２記憶装置９３とを備える。
第１記憶装置９２は、ＣＰＵ９１に、自装置に発生した障害に関する第１の障害情報を生成する生成ステップと、第１の障害情報に基づいて生成された第２の障害情報を管理装置に送信する送信ステップとを実行させるためのプログラムを含むＢＩＯＳ００１を記憶する。
第２記憶装置９３は、ＢＩＯＳ００１によって生成された第１障害情報を取得する取得ステップと、第１障害情報のデータ量を縮小することで、管理装置に送信するための障害情報である第２障害情報を生成する縮小ステップとを実行させるための障害情報処理プログラム００２を記憶する。
これにより、コンピュータ９は、障害が発生したときに、見かけ上の処理性能の低下が生じることを防ぐことができる。 << Basic configuration of fault information processing program >>
FIG. 5 is a schematic block diagram showing the basic configuration of the computer.
In the above-described embodiment, the configuration illustrated in FIG. 2 has been described as an embodiment of a computer that executes a failure information processing program. The basic configuration of a computer that executes a failure information processing program is as illustrated in FIG.
That is, the computer 9 includes a CPU 91, a first storage device 92, and a second storage device 93.
The first storage device 92 transmits, to the management device, the generation step for generating the first failure information related to the failure that has occurred in the own device, and the second failure information generated based on the first failure information. The BIOS 001 including a program for executing the transmission step is stored.
The second storage device 93 acquires the first failure information generated by the BIOS 001 and the second failure which is failure information to be transmitted to the management device by reducing the data amount of the first failure information. A failure information processing program 002 for executing a reduction step for generating information is stored.
Thus, the computer 9 can prevent the apparent processing performance from being deteriorated when a failure occurs.

《コンピュータシステムの基本構成》
図６は、コンピュータシステムの基本構成を示す概略ブロック図である。
コンピュータシステム２は、コンピュータ９と、管理装置１０とを備える。
コンピュータ９は、自装置に発生した障害に関する第１障害情報を解析することで第２障害情報を生成する。管理装置１０は、第２障害情報を解析する。管理装置１０は、例えば、ＢＭＣである。
これにより、コンピュータシステム２は、コンピュータ９に発生した障害を、コンピュータ９と管理装置１０とで分散して解析することができる。 <Basic configuration of computer system>
FIG. 6 is a schematic block diagram showing the basic configuration of the computer system.
The computer system 2 includes a computer 9 and a management device 10.
The computer 9 generates the second failure information by analyzing the first failure information related to the failure that has occurred in its own device. The management device 10 analyzes the second failure information. The management device 10 is, for example, a BMC.
As a result, the computer system 2 can analyze a failure occurring in the computer 9 in a distributed manner between the computer 9 and the management device 10.

《付記》
上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 <Appendix>
A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
コンピュータに、
前記コンピュータのＢＩＯＳによって生成された、自装置に発生した障害に関する第１障害情報を取得する取得ステップと、
前記第１障害情報のデータ量を縮小することで、管理装置に送信するための障害情報である第２障害情報を生成する縮小ステップと
実行させるための障害情報処理プログラム。 (Appendix 1)
On the computer,
An acquisition step of acquiring first failure information related to a failure that has occurred in the device, generated by the BIOS of the computer;
A failure information processing program for executing a reduction step of generating second failure information, which is failure information to be transmitted to the management device, by reducing the data amount of the first failure information.

（付記２）
前記コンピュータに、前記第１障害情報の解析を実行する解析ステップをさらに実行させ、
前記縮小ステップは、前記コンピュータが前記解析ステップによる前記第１障害情報の前記解析の結果に基づいて前記第２障害情報を生成するステップである
付記１に記載の障害情報処理プログラム。 (Appendix 2)
Causing the computer to further execute an analysis step of analyzing the first failure information;
The failure information processing program according to claim 1, wherein the reduction step is a step in which the computer generates the second failure information based on a result of the analysis of the first failure information by the analysis step.

（付記３）
前記ＢＩＯＳによって生成された前記第１障害情報は、前記コンピュータの記憶装置の所定のアドレスおよびデータサイズに係る記憶領域に記録され、
前記取得ステップは、前記コンピュータが前記記憶装置の前記記憶領域に記憶された情報を取得するステップである
付記１または付記２に記載の障害情報処理プログラム。 (Appendix 3)
The first failure information generated by the BIOS is recorded in a storage area according to a predetermined address and data size of a storage device of the computer,
The failure information processing program according to claim 1 or 2, wherein the acquisition step is a step in which the computer acquires information stored in the storage area of the storage device.

（付記４）
前記ＢＩＯＳによって生成された前記第１障害情報は、障害に係るログデータと、当該ログデータに基づいて前記第２障害情報が作成されたか否かを示す情報とを関連付けたものであり、
前記縮小ステップは、前記第１障害情報のうち前記第２障害情報作成されていないログデータに基づいて前記第２障害情報を作成するステップである
付記３に記載の障害情報処理プログラム。 (Appendix 4)
The first failure information generated by the BIOS associates log data related to a failure with information indicating whether or not the second failure information is created based on the log data,
The failure information processing program according to appendix 3, wherein the reducing step is a step of creating the second failure information based on log data for which the second failure information is not created among the first failure information.

（付記５）
ＣＰＵと、
前記ＣＰＵに、自装置に発生した障害に関する第１の障害情報を生成する生成ステップと、前記第１の障害情報に基づいて生成された第２の障害情報を管理装置に送信する送信ステップとを実行させるためのプログラムを含むＢＩＯＳを記憶する第１記憶装置と、
付記１から付記４の何れか１つに記載の障害情報処理プログラムを記憶する第２記憶装置と
を備えるコンピュータ。 (Appendix 5)
CPU,
A generation step of generating first failure information related to a failure that has occurred in the own device to the CPU, and a transmission step of transmitting second failure information generated based on the first failure information to the management device. A first storage device for storing a BIOS including a program for execution;
A computer comprising: a second storage device that stores the failure information processing program according to any one of appendix 1 to appendix 4.

（付記６）
前記第２記憶装置は、前記障害情報処理プログラムを実行可能なＯＳを記憶する
付記５に記載のコンピュータ。 (Appendix 6)
The computer according to claim 5, wherein the second storage device stores an OS capable of executing the failure information processing program.

（付記７）
前記ＣＰＵは、前記障害が検出されたときに前記ＢＩＯＳに制御権を移転し、前記生成ステップの実行後に前記ＢＩＯＳから前記ＯＳに制御権を移転する
付記６に記載のコンピュータ。 (Appendix 7)
The computer according to claim 6, wherein the CPU transfers control right to the BIOS when the failure is detected, and transfers control right from the BIOS to the OS after execution of the generation step.

（付記８）
第３記憶装置をさらに備え、
前記ＣＰＵは、
前記生成ステップにおいて前記第１の障害情報を前記第３記憶装置の所定のアドレスおよびデータサイズに係る第１記憶領域に記録し、
前記縮小ステップにおいて前記第３記憶装置の前記第１記憶領域に記憶された情報を読み出し、当該情報に基づいて前記第２障害情報を生成する
付記７に記載のコンピュータ。 (Appendix 8)
A third storage device;
The CPU
In the generation step, the first failure information is recorded in a first storage area according to a predetermined address and data size of the third storage device,
The computer according to claim 7, wherein information stored in the first storage area of the third storage device is read in the reduction step, and the second failure information is generated based on the information.

（付記９）
前記ＣＰＵは、
前記縮小ステップにおいて前記第２の障害情報を前記第３記憶装置の所定のアドレスおよびデータサイズに係る第２記憶領域に記録し、
前記送信ステップにおいて前記第３記憶装置の前記第２記憶領域に記憶された情報を読み出し、読み出された第２障害情報を送信する
付記８に記載のコンピュータ。 (Appendix 9)
The CPU
In the reduction step, the second failure information is recorded in a second storage area according to a predetermined address and data size of the third storage device,
The computer according to claim 8, wherein in the transmission step, the information stored in the second storage area of the third storage device is read and the read second failure information is transmitted.

（付記１０）
コンピュータがＢＩＯＳの実行により、当該コンピュータに発生した障害に関する第１の障害情報を取得する取得ステップと、
前記第１障害情報のデータ量を縮小した障害情報である第２障害情報を生成する縮小ステップと、
前記第２の障害情報を管理装置に送信する送信ステップと
を含む障害通知方法。 (Appendix 10)
An acquisition step in which the computer acquires first failure information related to a failure that has occurred in the computer by executing the BIOS;
A reduction step of generating second failure information which is failure information obtained by reducing the data amount of the first failure information;
A failure notification method comprising: a transmission step of transmitting the second failure information to a management device.

（付記１１）
自装置に発生した障害に関する第１障害情報を取得する取得ステップと、前記第１障害情報のデータ量を解析することで第２障害情報を生成する解析ステップとを実行するコンピュータと、
前記第２障害情報を解析する管理装置と
を備えるコンピュータシステム。 (Appendix 11)
A computer that executes an acquisition step of acquiring first failure information relating to a failure that has occurred in the device itself, and an analysis step of generating second failure information by analyzing a data amount of the first failure information;
A computer system comprising: a management device that analyzes the second failure information.

１コンピュータ
０１ＢＩＯＳ
０２ＯＳ
０３障害情報処理プログラム
１００ＣＰＵ
１０１ログ収集部
１０２第１ログ記録部
１０３第２ログ取得部
１０４送信部
１０５第１ログ取得部
１０６解析部
１０７縮小部
１０８第２ログ記録部
２００メインメモリ
３００不揮発メモリ
４００ストレージ
Ｍ１第１記憶領域
Ｍ２第２記憶領域 1 Computer 01 BIOS
02 OS
03 Fault information processing program 100 CPU
101 log collection unit 102 first log recording unit 103 second log acquisition unit 104 transmission unit 105 first log acquisition unit 106 analysis unit 107 reduction unit 108 second log recording unit 200 main memory 300 nonvolatile memory 400 storage M1 first storage area M2 second storage area

Claims

On the computer,
An acquisition step of acquiring first failure information related to a failure that has occurred in the device, generated by the BIOS of the computer;
A failure information processing program for executing a reduction step of generating second failure information, which is failure information to be transmitted to the management device, by reducing the data amount of the first failure information.

Causing the computer to further execute an analysis step of analyzing the first failure information;
The failure information processing program according to claim 1, wherein the reduction step is a step in which the computer generates the second failure information based on a result of the analysis of the first failure information by the analysis step.

The first failure information generated by the BIOS is recorded in a storage area according to a predetermined address and data size of a storage device of the computer,
The failure information processing program according to claim 1, wherein the acquisition step is a step in which the computer acquires information stored in the storage area of the storage device.

The first failure information generated by the BIOS associates log data related to a failure with information indicating whether or not the second failure information is created based on the log data,
The failure information processing program according to claim 3, wherein the reduction step is a step of creating the second failure information based on log data for which the second failure information is not created among the first failure information.

CPU,
A generation step of generating first failure information related to a failure that has occurred in the own device to the CPU, and a transmission step of transmitting second failure information generated based on the first failure information to the management device. A first storage device for storing a BIOS including a program for execution;
A computer comprising: a second storage device that stores the failure information processing program according to any one of claims 1 to 4.

The computer according to claim 5, wherein the second storage device stores an OS capable of executing the failure information processing program.

The computer according to claim 6, wherein the CPU transfers the control right to the BIOS when the failure is detected, and transfers the control right from the BIOS to the OS after execution of the generation step.

A third storage device;
The CPU
In the generation step, the first failure information is recorded in a first storage area according to a predetermined address and data size of the third storage device,
The computer according to claim 7, wherein information stored in the first storage area of the third storage device is read in the reduction step, and the second failure information is generated based on the information.

An acquisition step in which the computer acquires first failure information related to a failure that has occurred in the computer by executing the BIOS;
A reduction step of generating second failure information which is failure information obtained by reducing the data amount of the first failure information;
A failure notification method comprising: a transmission step of transmitting the second failure information to a management device.

A computer that generates second failure information by analyzing first failure information related to a failure that has occurred in the device;
A computer system comprising: a management device that analyzes the second failure information.