WO2009110326A1

WO2009110326A1 - Error analysis device, error analysis method, and recording medium

Info

Publication number: WO2009110326A1
Application number: PCT/JP2009/052992
Authority: WO
Inventors: 慎二中台
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-03-07
Filing date: 2009-02-20
Publication date: 2009-09-11
Anticipated expiration: 2010-09-07
Also published as: JP2009217381A

Abstract

Provided is an error analysis method including steps of: successively receiving error degree information including a plurality of index values indicating an error degree of a device to be monitored together with an identifier of the error degree information; comparing the received error degree information to a predetermined judgment reference; classifying the error degree information into a corresponding type according to the comparison result; correlating each identifier of the error degree information with the corresponding type of the classified error degree information for output; receiving information indicating the true type for each of the identifiers of the error degree information; storing the identifiers of the respective error degree information while correlating them with the true types; receiving an input of a setting parameter used for updating the judgment reference; and updating the judgment reference according to the received respective error degree information, the information indicating the true types stored while being correlated with the identifiers of the respective error degree information, and the setting parameter.

Description

Fault analysis apparatus, fault analysis method, and recording medium

　本発明は、障害分析装置、障害分析方法および記録媒体に関し、特に、ルールや閾値を設定することなく、システム障害を検出して分類できる障害分析装置、障害分析方法および記録媒体に関する。 The present invention relates to a failure analysis device, a failure analysis method, and a recording medium, and more particularly, to a failure analysis device, a failure analysis method, and a recording medium that can detect and classify system failures without setting rules or thresholds.

　図１は、障害分析装置の一例を示す図であり、特許第３５８１９３４号公報に開示されたものを示す。 FIG. 1 is a diagram showing an example of a failure analysis apparatus, which is disclosed in Japanese Patent No. 3581934.

　図１に示すように、この障害分析装置１００は、動作測定記録（ＯＭ）転送ユニットや障害記録転送ユニットといった異常呼量監視部１０１と、閾値判定部１１５と、判定結果表示部１１６とから構成されている。 As shown in FIG. 1, the failure analysis apparatus 100 includes an abnormal call volume monitoring unit 101 such as an operation measurement record (OM) transfer unit and a failure record transfer unit, a threshold determination unit 115, and a determination result display unit 116. Has been.

　上記のように構成された障害分析装置１００は、次のように動作する。 The failure analysis apparatus 100 configured as described above operates as follows.

　異常呼量監視部１０１が、監視対象装置１３１，１３２から異常の発生を示すログの有無を監視し、ログが存在する場合は、異常の種別に応じて、時間当たりのトラフィック量である呼量をカウントする。閾値判定部１１５は、一定時間内の呼量が所定の閾値以上になると、判定結果表示部１１６を通じて、保守運用者にその異常を障害として通知する。 The abnormal call volume monitoring unit 101 monitors the presence / absence of a log indicating the occurrence of an abnormality from the monitoring target devices 131 and 132, and if there is a log, the call volume that is the traffic volume per hour according to the type of abnormality. Count. The threshold determination unit 115 notifies the maintenance operator of the abnormality as a failure through the determination result display unit 116 when the call volume within a predetermined time exceeds a predetermined threshold.

　このような動作により、図１に示した障害分析装置１００では、自動で障害を検出することができる。 With this operation, the failure analysis apparatus 100 shown in FIG. 1 can automatically detect a failure.

　図２は、障害分析装置の他の例を示す図であり、文献“JING　WU,　JIAN-GUO　ZHOU,　PU-LIUYAN,　MING　WU、「A　STUDY　ON　NET　WORK　FAULT　KNOWLEDGE　ACQUISITION　BASED　ON　SUPPORTVECTOR　MACHINE」、Proceedings　of　the　Fourth　International　Conference　on　MachineLearning　and　Cybernetics,　Guangzhou,　18-21　August　2005”に開示されたものを示す。 Fig. 2 is a diagram showing another example of a failure analysis device, and the documents "JING WU, JIAN-GUO ZHOU, PU-LIUYAN, MING WU," A STUDY ON NET WORK FAULT KNOWLEDGE ACQUISITION BASED ON SUPPORTVECTOR MACHINE ", Proceedings This is what is disclosed in of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005.

　図２に示すように、この障害分析装置２００は、監視対象装置２３１～２３４からなる監視対象システム２３０を管理するために、異常度監視部２０１と、異常度格納部２１０と、障害事例登録部２１１と、事例格納部２１２と、パターン学習部２１３と、知識格納部２１４と、パターン判定部２１５と、判定結果表示部２１６と、判定修正入力部２１７とから構成されている。 As shown in FIG. 2, the failure analysis apparatus 200 manages the monitoring target system 230 including the monitoring target apparatuses 231 to 234 in order to manage the abnormality level monitoring unit 201, the abnormality level storage unit 210, and the failure case registration unit. 211, a case storage unit 212, a pattern learning unit 213, a knowledge storage unit 214, a pattern determination unit 215, a determination result display unit 216, and a determination correction input unit 217.

　上記のように構成された障害分析装置２００は、監視対象装置２３１～２３４に対する監視結果から、装置や回線単位の故障の可能性を表す指標である異常度を収集する。 The failure analysis apparatus 200 configured as described above collects the degree of abnormality, which is an index indicating the possibility of failure in units of devices and lines, from the monitoring results for the monitoring target devices 231 to 234.

　図３は、図２に示した障害分析装置２００で用いられる異常度の値を示す図である。 FIG. 3 is a diagram showing the value of the degree of abnormality used in the failure analysis apparatus 200 shown in FIG.

　図２に示した障害分析装置２００で用いられる異常度は、図３に示すように、リンクが落ちているか否か、エラー率、輻輳率、棄却率、利用率といった値が挙げられる。 As shown in FIG. 3, the degree of abnormality used in the failure analysis apparatus 200 shown in FIG. 2 includes values such as whether or not the link is down, an error rate, a congestion rate, a rejection rate, and a utilization rate.

　得られた異常度の組み合わせを、パターン判定部２１５は、知識格納部２１４に格納された知識情報を用いて、監視対象システム２３０において障害が発生したか否かを判定し、判定結果表示部２１６を通して、判定結果を保守運用者に提示する。 The pattern determination unit 215 uses the knowledge information stored in the knowledge storage unit 214 to determine whether or not a failure has occurred in the combination of the obtained abnormalities, and the determination result display unit 216. Through this, the judgment result is presented to the maintenance operator.

　知識格納部２１４に格納される知識情報は、以下の手順で生成される。 Knowledge information stored in the knowledge storage unit 214 is generated by the following procedure.

　まず、保守運用者が障害事例登録部２１１を用いて、過去の障害事例を事例格納部２１２に登録する。 First, the maintenance operator uses the failure case registration unit 211 to register past failure cases in the case storage unit 212.

　パターン学習部２１３は、事例格納部２１２に格納されている障害事例と、異常度格納部２１０に格納された異常度の組み合わせとから知識情報を生成し、知識格納部２１４に格納する。ここで、障害事例とは、いつどこでどのような障害が発生したかを表す情報である。なお、パターン学習手段２１３は、Support　Vector　Machine(SVM)というパターン識別器を用いて行われるパターン学習によって知識情報を生成する。 The pattern learning unit 213 generates knowledge information from the combination of the failure case stored in the case storage unit 212 and the abnormality degree stored in the abnormality degree storage unit 210, and stores the knowledge information in the knowledge storage unit 214. Here, the failure case is information indicating when and where a failure has occurred. The pattern learning unit 213 generates knowledge information by pattern learning performed using a pattern classifier called Support Vector Machine (SVM).

　このＳＶＭは、“麻生英樹,　津田宏治,　村田昇,「パターン認識と学習の統計学」、岩波書店,pp.107-123,　2005”に詳細に記載されている。一般に、パターン学習においては、まず、多次元の変数から一次元のクラス（パターン）を推定する。この多次元の変数として用いる変数を特徴と呼ぶ。またｄ個からなる特徴が張るｄ次元空間を特徴空間Ｒｄと呼ぶ。また、入力変数を、この特徴空間における特徴変数ｘ（∈Ｒｄ）とし、出力変数をクラスｙ（∈｛１，－１｝）とすると、特徴空間内でｘがある領域を超えるとｙが変化する。このような変化を生む領域の境界を超平面と呼ぶ。 This SVM is described in detail in “Hideki Aso, Koji Tsuda, Noboru Murata,“ Statistics of Pattern Recognition and Learning ”, Iwanami Shoten, pp.107-123, 2005. In general, in pattern learning, First, a one-dimensional class (pattern) is estimated from a multi-dimensional variable, the variable used as the multi-dimensional variable is called a feature, and a d-dimensional space with d features is called a feature space Rd. When the input variable is a feature variable x (∈Rd) in this feature space and the output variable is a class y (∈ {1, −1}), y changes when x exceeds a certain region in the feature space. The boundary of the region that causes such a change is called a hyperplane.

　この超平面は、ｎ個の入力値ｘｉ（ｉ＝１，２，．．．，ｎ）に対する出力値ｙｉが与えられると、パターン学習により生成することができる。パターン学習の際、出力値ｙの異なる入力値間の距離をマージンと呼ぶ。 This hyperplane can be generated by pattern learning given output values yi for n input values xi (i = 1, 2,..., N). During pattern learning, a distance between input values having different output values y is called a margin.

　パターン学習手段２１３にて得られる知識情報とは、この障害を検出し分類するための閾値であり、異常度の組み合わせからなる特徴空間においては、複数のクラスを分類する超平面となる。 The knowledge information obtained by the pattern learning means 213 is a threshold for detecting and classifying this fault, and in a feature space composed of combinations of abnormalities, it is a hyperplane for classifying a plurality of classes.

　判定結果表示部２１６が保守運用者に対して示した障害判定結果が、実際には障害ではなかった場合には、判定修正入力部２１７を用いて、事例格納部２１２に入力される。 When the failure determination result shown to the maintenance operator by the determination result display unit 216 is not actually a failure, it is input to the case storage unit 212 using the determination correction input unit 217.

　このような動作により、図２に示した障害分析装置２００では、図１に示した障害分析装置１００とは異なり、障害検出および分類のための閾値を設定することなく、障害を検出することができる。 With such an operation, unlike the failure analysis device 100 shown in FIG. 1, the failure analysis device 200 shown in FIG. 2 can detect a failure without setting a threshold for failure detection and classification. it can.

　しかしながら、上述した障害分析装置では、事例から障害検出閾値を生成する際に、保守運用者が望む障害検出感度を反映していないため、保守運用者の方針が、正常な状態を障害と誤検出しても構わないので障害の見落としを減らしたいという方針であったとしても、生成される閾値は誤検出が少ない代わりに、障害の見落としが多い閾値であることもあり得るという問題点がある。 However, since the failure analysis device described above does not reflect the failure detection sensitivity desired by the maintenance operator when generating the failure detection threshold from the case, the maintenance operator's policy mistakenly detects a normal state as a failure. Even if the policy is to reduce the number of oversights of faults, there is a problem that the generated threshold value may be a threshold value where there are many oversights of faults instead of few false detections.

　本発明は、上述した問題点に鑑みてなされたものであって、保守運用者が望む障害検出感度を反映した障害検出、または分類ができる障害分析装置、障害分析方法および記録媒体を提供することを目的とする。 The present invention has been made in view of the above-described problems, and provides a failure analysis apparatus, a failure analysis method, and a recording medium capable of performing failure detection or classification reflecting failure detection sensitivity desired by a maintenance operator. With the goal.

　上記目的を達成するために本発明は、
　監視対象装置の異常度を示す複数の指標値を含む異常度情報を前記異常度情報の識別情報とともに順次出力する監視対象装置から、前記異常度情報および前記異常度情報の識別情報を順次受信する異常度情報受信手段と、
　前記異常度情報受信手段が受信した前記各異常度情報を所定の判定基準と比較し、比較の結果に基づいて前記各異常度情報を種別毎に分類する種別判定手段と、
　前記各異常度情報の識別情報と、前記各異常度情報が分類された各種別を示す情報とを対応付けて出力する判定結果出力手段と、
　前記各異常度情報の識別情報についてそれぞれ真の種別を示す情報の入力を受ける障害事例登録手段と、
　前記各異常度情報の識別情報を前記真の種別と対応付けて記憶する事例格納部と、
　前記判定基準を更新するための設定パラメータの入力を受ける検出感度入力手段と、
　前記異常度情報受信手段が受信した各異常度情報と、前記各異常度情報の識別情報に対応付けて記憶されている真の種別を示す情報と、前記設定パラメータとに基づいて、前記判定基準を更新するパターン学習手段とを有する。 In order to achieve the above object, the present invention provides:
The abnormality degree information and the identification information of the abnormality degree information are sequentially received from the monitoring target apparatus that sequentially outputs the abnormality degree information including a plurality of index values indicating the abnormality degree of the monitoring target apparatus together with the identification information of the abnormality degree information. Anomaly information receiving means;
A type determination unit that compares each degree of abnormality information received by the degree of abnormality information reception unit with a predetermined determination criterion, and classifies each degree of abnormality information for each type based on a comparison result;
A determination result output means for associating and outputting identification information of each abnormality degree information and information indicating various types into which each abnormality degree information is classified;
Failure case registration means for receiving input of information indicating the true type for the identification information of each abnormality degree information,
A case storage unit that stores the identification information of each degree of abnormality information in association with the true type;
A detection sensitivity input means for receiving an input of a setting parameter for updating the determination criterion;
Based on each abnormality degree information received by the abnormality degree information receiving means, information indicating a true type stored in association with identification information of each abnormality degree information, and the setting parameter, the determination criterion And pattern learning means for updating.

　また、情報処理装置を用いた障害分析方法であって、
　前記情報処理装置が、監視対象装置の異常度を示す複数の指標値を含む異常度情報を前記異常度情報の識別情報とともに順次出力する監視対象装置から、前記異常度情報および前記異常度情報の識別情報を順次受信するステップと、
　前記情報処理装置が、受信した前記各異常度情報を所定の判定基準と比較し、比較の結果に基づいて前記各異常度情報を種別毎に分類するステップと、
　前記情報処理装置が、前記各異常度情報の識別情報と、前記各異常度情報が分類された各種別を示す情報とを対応付けて出力するステップと、
　前記情報処理装置が、前記各異常度情報の識別情報についてそれぞれ真の種別を示す情報の入力を受け付けるステップと、
　前記情報処理装置が、前記各異常度情報の識別情報を前記真の種別と対応付けて記憶するステップと、
　前記情報処理装置が、前記判定基準を更新するための設定パラメータの入力を受け付けるステップと、
　前記情報処理装置が、受信した各異常度情報と、前記各異常度情報の識別情報に対応付けて記憶されている真の種別を示す情報と、前記設定パラメータとに基づいて、前記判定基準を更新するステップとを有する。 A failure analysis method using an information processing apparatus,
The information processing apparatus sequentially outputs abnormality degree information including a plurality of index values indicating the degree of abnormality of the monitoring target apparatus together with identification information of the abnormality degree information from the monitoring target apparatus. Sequentially receiving identification information;
The information processing device compares the received abnormality degree information with a predetermined criterion, and classifies the abnormality degree information for each type based on a comparison result;
The information processing apparatus outputs the identification information of each degree of abnormality information and information indicating various types into which each degree of abnormality information is classified,
The information processing apparatus receiving an input of information indicating a true type for each identification information of the degree of abnormality information; and
The information processing apparatus storing the identification information of each abnormality degree information in association with the true type;
The information processing apparatus accepting an input of a setting parameter for updating the determination criterion;
The information processing apparatus determines the determination criterion based on each received abnormality level information, information indicating a true type stored in association with identification information of each abnormality level information, and the setting parameter. Updating.

　また、コンピュータを動作させるためのプログラムが書き込まれた記録媒体であって、
　前記コンピュータに、
　監視対象装置の異常度を示す複数の指標値を含む異常度情報を前記異常度情報の識別情報とともに順次出力する監視対象装置から、前記異常度情報および前記異常度情報の識別情報を順次受信する手順と、
　受信した前記各異常度情報を所定の判定基準と比較し、比較の結果に基づいて前記各異常度情報を種別毎に分類する手順と、
　前記各異常度情報の識別情報と、前記各異常度情報が分類された各種別を示す情報と、を対応付けて出力する手順と、
　前記各異常度情報の識別情報についてそれぞれ真の種別を示す情報の入力を受け付ける手順と、
　前記各異常度情報の識別情報を前記真の種別と対応付けて記憶する手順と、
　前記判定基準を更新するための設定パラメータの入力を受け付ける手順と、
　受信した各異常度情報と、前記各異常度情報の識別情報に対応付けて記憶されている真の種別を示す情報と、前記設定パラメータとに基づいて、前記判定基準を更新する手順とを実行させるためのプログラムが書き込まれている。 Further, a recording medium on which a program for operating a computer is written,
In the computer,
The abnormality degree information and the identification information of the abnormality degree information are sequentially received from the monitoring target apparatus that sequentially outputs the abnormality degree information including a plurality of index values indicating the abnormality degree of the monitoring target apparatus together with the identification information of the abnormality degree information. Procedure and
Comparing each received degree of abnormality information with a predetermined criterion, and classifying each degree of abnormality information for each type based on the result of comparison;
A procedure for outputting the identification information of each degree of abnormality information and information indicating each type of classification of each degree of abnormality information in association with each other,
A procedure for accepting input of information indicating a true type for identification information of each degree of abnormality information;
A procedure for storing the identification information of each degree of abnormality information in association with the true type;
A procedure for receiving an input of a setting parameter for updating the determination criterion;
A procedure for updating the determination criterion based on each received abnormality degree information, information indicating a true type stored in association with identification information of each abnormality degree information, and the setting parameter is executed. The program to make it have been written.

　本発明は、保守運用者が望む障害検出感度を反映した障害検出、または分類ができる。 The present invention can perform fault detection or classification reflecting the fault detection sensitivity desired by the maintenance operator.

障害分析装置の一例を示す図である。It is a figure which shows an example of a failure analyzer. 障害分析装置の他の例を示す図である。It is a figure which shows the other example of a failure analyzer. 図２に示した障害分析装置で用いられる異常度の値を示す図である。It is a figure which shows the value of the abnormality degree used with the failure analyzer shown in FIG. 本発明の障害分析装置の実施の一形態を示すブロック図である。It is a block diagram which shows one Embodiment of the failure analysis apparatus of this invention. 図４に示した事例格納部内のテーブルを示す図である。It is a figure which shows the table in the example storage part shown in FIG. 図４に示した障害分析装置の動作を説明するためのフローチャートである。5 is a flowchart for explaining the operation of the failure analysis apparatus shown in FIG. 4. 図４に示した障害分析装置の動作を説明するためのフローチャートである。5 is a flowchart for explaining the operation of the failure analysis apparatus shown in FIG. 4. 図４に示した障害分析装置の動作を説明するためのフローチャートである。5 is a flowchart for explaining the operation of the failure analysis apparatus shown in FIG. 4. 図４に示した障害分析装置の動作を説明するためのフローチャートである。5 is a flowchart for explaining the operation of the failure analysis apparatus shown in FIG. 4. 図４に示した障害分析装置の動作の一実施例を説明するための監視対象の構成図である。FIG. 5 is a configuration diagram of a monitoring target for explaining an embodiment of the operation of the failure analysis apparatus shown in FIG. 4. 図４に示した障害分析装置の動作の一実施例を説明するための特徴空間を示す図である。It is a figure which shows the feature space for demonstrating one Example of operation | movement of the failure analyzer shown in FIG. 図４に示した障害分析装置の動作の一実施例を説明するための特徴空間を示す図である。It is a figure which shows the feature space for demonstrating one Example of operation | movement of the failure analyzer shown in FIG. 図４に示した障害分析装置の動作の一実施例を説明するための特徴空間を示す図である。It is a figure which shows the feature space for demonstrating one Example of operation | movement of the failure analyzer shown in FIG. 図４に示した障害分析装置の動作の他の実施例を説明するための特徴空間を示す図である。It is a figure which shows the feature space for demonstrating the other Example of operation | movement of the failure analyzer shown in FIG. 図４に示した障害分析装置の動作の他の実施例を説明するための特徴空間を示す図である。It is a figure which shows the feature space for demonstrating the other Example of operation | movement of the failure analyzer shown in FIG. 図４に示した障害分析装置の動作の他の実施例を説明するための特徴空間を示す図である。It is a figure which shows the feature space for demonstrating the other Example of operation | movement of the failure analyzer shown in FIG.

　以下に、本発明の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

　図４は、本発明の障害分析装置の実施の一形態を示すブロック図である。 FIG. 4 is a block diagram showing an embodiment of the failure analysis apparatus of the present invention.

　本形態は図４に示すように、監視対象装置３３１～３３４を備えるシステム３３０と通信可能に接続され、プログラム制御により動作する情報処理装置であるコンピュータ（中央処理装置とプロセッサとデータ処理装置とを少なくとも備える）３００である。 In this embodiment, as shown in FIG. 4, a computer (a central processing unit, a processor, and a data processing unit is connected to a system 330 including monitoring target devices 331 to 334 and is an information processing device that operates under program control. 300).

　コンピュータ３００は、障害事例登録部３１１と、事例格納部３１２と、異常度情報受信手段である異常度監視部３０１と、異常度格納部３１０と、パターン学習部３１３と、知識格納部３１４と、種別判定手段であるパターン判定部３１５と、判定結果出力手段である判定結果表示部３１６と、判定修正入力部３１７と、検出感度入力部３１８と、検出感度格納部３１９とを含む。 The computer 300 includes a failure case registration unit 311, a case storage unit 312, an abnormality degree monitoring unit 301 that is an abnormality degree information receiving unit, an abnormality degree storage unit 310, a pattern learning unit 313, a knowledge storage unit 314, A pattern determination unit 315 that is a type determination unit, a determination result display unit 316 that is a determination result output unit, a determination correction input unit 317, a detection sensitivity input unit 318, and a detection sensitivity storage unit 319 are included.

　障害事例登録部３１１は、事例格納部３１２と接続され、事例格納部３１２は、障害事例登録部３１１とパターン学習部３１３とそれぞれ接続され、検出感度入力部３１８は、検出感度格納部３１９と接続され、検出感度格納部３１９は、検出感度入力部３１８とパターン学習部３１３とそれぞれ接続され、パターン学習部３１３は、異常度格納部３１０と事例格納部３１２と検出感度格納部３１９と知識格納部３１４とそれぞれ接続され、異常度格納部３１０は、パターン学習部３１３と異常度監視部３０１とそれぞれ接続され、知識格納部３１４は、パターン学習部３１３とパターン判定部３１５とそれぞれ接続され、異常度監視部３０１は、異常度格納部３１０とパターン判定部３１５とそれぞれ接続され、パターン判定部３１５は、知識格納部３１４と異常度監視部３０１と判定結果表示部３１６とそれぞれ接続され、判定結果表示部３１６は、パターン判定部３１５と接続されている。 The failure case registration unit 311 is connected to the case storage unit 312, the case storage unit 312 is connected to the failure case registration unit 311 and the pattern learning unit 313, and the detection sensitivity input unit 318 is connected to the detection sensitivity storage unit 319. The detection sensitivity storage unit 319 is connected to the detection sensitivity input unit 318 and the pattern learning unit 313. The pattern learning unit 313 includes the abnormality degree storage unit 310, the case storage unit 312, the detection sensitivity storage unit 319, and the knowledge storage unit. 314, the abnormality degree storage unit 310 is connected to the pattern learning unit 313 and the abnormality degree monitoring unit 301, respectively, and the knowledge storage unit 314 is connected to the pattern learning unit 313 and the pattern determination unit 315, respectively. The monitoring unit 301 is connected to the abnormality degree storage unit 310 and the pattern determination unit 315, and the pattern determination unit 315 Is connected to the knowledge storage section 314 and the abnormality monitoring unit 301 and the determination result display unit 316, respectively, the determination result display section 316 is connected to the pattern determining unit 315.

　なお、本形態において、知識情報、閾値、境界面および超平面は同一のものを指し、本発明の判定基準に相当する。また、本形態における特徴は、本発明の指標値に相当する。また、本形態では、オペレータが入力する検出感度は、図１１～図１６の表中に示すコストに相当する。また、検出感度とは、上記閾値（判定基準）を変更するための設定パラメータであり、後述する各事例に対しそれぞれ設定されるコストである。検出感度は、本発明の設定パラメータに相当する。 In this embodiment, the knowledge information, threshold value, boundary surface, and hyperplane indicate the same thing and correspond to the determination criteria of the present invention. The feature in this embodiment corresponds to the index value of the present invention. In this embodiment, the detection sensitivity input by the operator corresponds to the cost shown in the tables of FIGS. The detection sensitivity is a setting parameter for changing the threshold value (determination criterion), and is a cost set for each case described later. The detection sensitivity corresponds to the setting parameter of the present invention.

　上述した構成要素は、それぞれ概略次のように動作する。 The components described above generally operate as follows.

　障害事例登録部３１１は、本発明におけるオペレータとなる保守運用者が使用する図示しない端末から、障害発生時間と場所との入力を受け付ける。この障害発生時間と場所との組を事例と呼ぶ。これには、障害の種類や根本原因の箇所も含めて良い。事例とは、上述した障害発生時間と場所とが、あるいは正常であった時間と場所とが、対応付けられている情報である。ここで、事例として記憶されている時間と場所とはともに、期間や範囲のように広がりを持っていても良い。また、事例には、実際に障害であった場合の事例を示す障害事例と、実際には正常であった場合の事例を示す正常事例とがある。障害事例には障害発生時間と場所とが、正常事例には正常であった時間と場所とがそれぞれ含まれている。また、事例には事例の種類（クラス、パターンに相当する。また、本発明における真の種別に相当する）が含まれていてもよい。事例の種類とは、当該事例が正常であることを示す情報または障害の種類を含む情報である。この場合、障害事例には障害発生時間と場所と障害の種類とが、正常事例には正常であった時間と場所と当該事例が正常であることを示す情報とがそれぞれ含まれている。あるいは、事例の種類は、事例とは独立した情報として構成されていてもよい。本形態においては、事例の種類を含まないものとして考える。もちろん、事例に事例の種類を含んでいてもよい。 The failure case registration unit 311 receives input of the failure occurrence time and location from a terminal (not shown) used by the maintenance operator who is an operator in the present invention. This pair of failure occurrence time and location is called an example. This may include the type of failure and the root cause location. The case is information in which the above-described failure occurrence time and place, or the time and place where the failure was normal, are associated with each other. Here, both the time and place stored as an example may have a spread like a period or a range. In addition, the cases include a failure case indicating a case where the failure is actually caused and a normal case indicating a case where the failure is actually normal. The failure case includes a failure occurrence time and place, and the normal case includes a normal time and place. The case may include a case type (corresponding to a class or a pattern. Also, corresponding to a true type in the present invention). The type of case is information indicating that the case is normal or information including the type of failure. In this case, the failure case includes a failure occurrence time and location, and the type of failure, and the normal case includes time and location where the case is normal and information indicating that the case is normal. Alternatively, the type of case may be configured as information independent of the case. In this embodiment, it is assumed that the types of cases are not included. Of course, the case type may be included in the case.

　障害事例登録部３１１は、事例とともに、当該事例の種類の入力を受け付けてもよい。場所とは、各監視対象装置３３１～３３４を識別する識別子であってもよいし、回線名や住所などのように障害発生の箇所を特定できるものであればよい。障害発生時間と場所とは、本発明の異常度情報の識別情報に含まれるものである。また、本形態では、異常度情報の識別情報は事例に相当する。なお、異常度情報の識別情報は、異常度情報が識別できる情報を含んでいればよく、一意に付される識別子などを含んでいればよい。 The failure case registration unit 311 may receive an input of the type of the case together with the case. The location may be an identifier for identifying each of the monitoring target devices 331 to 334, and may be any location that can identify the location where a failure has occurred, such as a line name or address. The failure occurrence time and location are included in the identification information of the abnormality level information of the present invention. In the present embodiment, the identification information of the abnormality degree information corresponds to a case. In addition, the identification information of abnormality degree information should just contain the information which can identify abnormality degree information, and should just contain the identifier etc. which are attached | subjected uniquely.

　事例格納部３１２は、障害事例登録部３１１または後述する判定修正入力部３１７から事例を受け取り、受け取った事例を格納する。 The case storage unit 312 receives a case from the failure case registration unit 311 or the determination correction input unit 317 described later, and stores the received case.

　図５は、図４に示した事例格納部３１２内のテーブルを示す図である。 FIG. 5 is a diagram showing a table in the case storage unit 312 shown in FIG.

　図５に示すように、事例格納部３１２は、事例番号と時刻と場所とパターンとを対応付けて記憶している。事例番号、時刻および場所は異常度情報の識別情報であり、パターンは事例の種類である。なお、事例番号、時刻、場所は、それぞれ必須ではなく、異常度情報を識別できる情報が少なくとも１つあればよい。 As shown in FIG. 5, the case storage unit 312 stores a case number, a time, a place, and a pattern in association with each other. The case number, time, and place are identification information of the degree of abnormality information, and the pattern is the type of case. Note that the case number, time, and location are not indispensable, and at least one piece of information that can identify the abnormality degree information is sufficient.

　異常度監視部３０１は、監視対象システム３３０における監視対象装置３３１～３３４から異常度を含む異常度情報を取得する。異常度監視部３０１は、取得した異常度情報を異常度格納部３１０に格納する。また、異常度監視部３０１は、異常度情報に含まれている時刻を示す情報もしくは異常度監視部３０１が異常度情報を受信した時刻を示す情報をパターン判定部３１５に渡す。 The abnormality level monitoring unit 301 acquires abnormality level information including the abnormality level from the monitoring target devices 331 to 334 in the monitoring target system 330. The abnormality degree monitoring unit 301 stores the obtained abnormality degree information in the abnormality degree storage unit 310. In addition, the abnormality degree monitoring unit 301 passes information indicating the time included in the abnormality degree information or information indicating the time when the abnormality degree monitoring unit 301 receives the abnormality degree information to the pattern determination unit 315.

　異常度格納部３１０は、過去に異常度監視部３０１が受信した異常度情報に含まれる異常度と時刻と場所と値とを対応付けて記憶している。また、例えば、時間と場所で識別できる異常度情報を返すことができるように格納してもよい。 The abnormality degree storage unit 310 stores the abnormality degree, time, place, and value included in the abnormality degree information received by the abnormality degree monitoring unit 301 in the past in association with each other. Further, for example, the degree of abnormality information that can be identified by time and place may be stored so as to be returned.

　パターン学習部３１３は、保守運用者から障害事例登録部３００あるいは判定修正入力部３１７に対して入力があったタイミングで、あるいは定期的に実行され、事例格納部３１２に格納された各事例に対応付けられている異常度情報を、異常度格納部３１０から読み出す。読み出された各異常度情報に含まれる各異常度（特徴）でパターン学習部３１３が用いる特徴空間を構成している。また、パターン学習部３１３は、後述する検出感度格納部３１９から障害事例の種類や正常といったラベル（本発明の種別に相当する）ごとの検出感度を読み出す。また、パターン学習部３１３は、異常度格納部３１０から読み出した異常度情報および検出感度格納部３１９から読み出した検出感度に基づいて障害を検出し分類するための閾値（超平面）を生成し、知識格納部３１４に格納する。 The pattern learning unit 313 corresponds to each case stored in the case storage unit 312 when the maintenance operator inputs the failure case registration unit 300 or the determination correction input unit 317 or periodically. The attached abnormality degree information is read from the abnormality degree storage unit 310. A feature space used by the pattern learning unit 313 is configured by each abnormality degree (feature) included in each read abnormality degree information. Also, the pattern learning unit 313 reads the detection sensitivity for each label (corresponding to the type of the present invention) such as the type of failure case and normality from the detection sensitivity storage unit 319 described later. Further, the pattern learning unit 313 generates a threshold (hyperplane) for detecting and classifying a fault based on the abnormality degree information read from the abnormality degree storage unit 310 and the detection sensitivity read from the detection sensitivity storage unit 319, Store in the knowledge storage unit 314.

　ここで、文献“Chih-ChungChang　and　Chih-Jen　Lin,　LIBSVM　:　a　library　for　support　vector　machines,　2001.Software　available　at　http://www.csie.ntu.edu.tw/~cjlin/libsvm”の記載にならい、パターン学習の具体例を示し、障害検出感度を反映させる様子を例示する。 Here, the document “Chih-ChungChang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001.Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm” In the meantime, a specific example of pattern learning is shown, and a state of reflecting the failure detection sensitivity is illustrated.

　超平面の導出には、特徴空間Ｒｄにおける、数２に記載の制約のもと数１の最適化を行うことで実現する。ここで、文献“麻生英樹,　津田宏治,村田昇,「パターン認識と学習の統計学」、岩波書店,　pp.107-123,　2005”でスラック変数として記載されるξiは、事例ｉが超平面を超えて学習されている程度を表し、ξiが事例ｉのラベルｙｉに対応して定められるコストＣｙｉで重み付けられることにより学習される超平面は、各ラベル間でのコストＣｙの比を反映したものとなる。このコストＣｙｉが検出感度である。 The derivation of the hyperplane is realized by performing the optimization of Equation 1 under the constraint described in Equation 2 in the feature space Rd. Here, ξi, which is described as a slack variable in the literature “Hideki Aso, Koji Tsuda, Noboru Murata,“ Statistics of Pattern Recognition and Learning ”, Iwanami Shoten, pp.107-123, 2005, is the case i is a hyperplane. The hyperplane learned by ξi being weighted by the cost Cyi determined corresponding to the label yi of the case i reflects the ratio of the cost Cy between the labels. This cost Cyi is the detection sensitivity.

　この例は、２クラスの分類のみを示しているが、複数の障害パターンのような多クラスの分類においても同様の方法で実現できる。 This example shows only two-class classification, but it can be realized in the same way even in multi-class classification such as multiple failure patterns.

　なお、上述した文献“Chih-ChungChang　and　Chih-Jen　Lin,　LIBSVM　:　a　library　for　support　vector　machines,　2001.Software　available　at　http://www.csie.ntu.edu.tw/~cjlin/libsvm”で提供されるＳＶＭでは、このＣｙｉを重みとして設定可能であるが、この文献に記載のような、従来のパターン学習を用いた障害検出システムでは、障害検出感度を可変とするためにこのＣｙｉを利用することには言及していない。

Provided in the above-mentioned document “Chih-ChungChang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001.Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm” In this SVM, this Cyi can be set as a weight. However, in a fault detection system using conventional pattern learning as described in this document, this Cyi is used to make the fault detection sensitivity variable. It does not mention that.

　知識格納部３１４は、パターン学習部３１３によって生成された閾値が格納される。 The knowledge storage unit 314 stores the threshold value generated by the pattern learning unit 313.

　パターン判定部３１５は、異常度取得部３０１から異常度情報を受信する。そしてパターン判定部３１５は、知識格納部３１４に格納された閾値を読み出して、異常度取得部３０１から受信した異常度情報が、どのような障害であるか、あるいは正常であるかを示しているかを判定する。さらに障害であると判定された場合はどのような障害であるかを判定し、異常度情報の識別情報と判定結果とを判定結果表示部３１６に渡す。 The pattern determination unit 315 receives the abnormality level information from the abnormality level acquisition unit 301. Then, the pattern determination unit 315 reads out the threshold value stored in the knowledge storage unit 314 and indicates whether the abnormality level information received from the abnormality level acquisition unit 301 indicates a failure or a normal state. Determine. Further, when it is determined that there is a failure, it is determined what type of failure it is, and the identification information of the abnormality level information and the determination result are passed to the determination result display unit 316.

　判定結果表示部３１６は、パターン判定部３１５から受け取った判定結果（パターン、事例の種類、本発明の種別に相当する）と異常度情報の識別情報（事例）とを保守運用者に対して表示する。 The determination result display unit 316 displays the determination result received from the pattern determination unit 315 (corresponding to the pattern, the type of case, and the type of the present invention) and the identification information (case) of the abnormality level information to the maintenance operator. To do.

　判定修正入力部３１７は、判定結果表示部３１６が保守運用者に対して提示した判定結果（パターン、事例の種類、本発明の種別に相当する）が間違いであった場合に、保守運用者が正しいと考える事例の種類（本発明の真の種別に相当する）と事例とを事例格納部３１２に登録する。たとえば、時間と場所（事例）に加え、事例の種類（真の種別）などを、事例格納部３１２に追加する、あるいは、事例格納部３１２に格納されている事例を保守運用者が正しいと考える事例に修正してもよい。 When the determination result (corresponding to the pattern, type of case, and type of the present invention) presented by the determination result display unit 316 to the maintenance operator is incorrect, the determination correction input unit 317 The type of case considered to be correct (corresponding to the true type of the present invention) and the case are registered in the case storage unit 312. For example, in addition to time and place (case), the type of the case (true type) is added to the case storage unit 312 or the case stored in the case storage unit 312 is considered correct by the maintenance operator. You may modify the case.

　検出感度入力部３１８は、保守運用者が使用する図示しない端末から、検出感度の入力を受け付ける。この検出感度に真の種別を対応付けて入力を受け付けてもよい。 The detection sensitivity input unit 318 receives detection sensitivity input from a terminal (not shown) used by the maintenance operator. An input may be received by associating this detection sensitivity with a true type.

　検出感度格納部３１９は、検出感度入力部３１８から検出感度を受け取り格納する。検出感度格納部３１９は、検出感度とともに真の種別を受け取り、受け取った検出感度と真の種別とを対応付けて記憶してもよい。 The detection sensitivity storage unit 319 receives the detection sensitivity from the detection sensitivity input unit 318 and stores it. The detection sensitivity storage unit 319 may receive the true type together with the detection sensitivity, and store the received detection sensitivity and the true type in association with each other.

　次に、図６～図９のフローチャートを参照して本形態の全体の動作について詳細に説明する。 Next, the overall operation of this embodiment will be described in detail with reference to the flowcharts of FIGS.

　図６～図９は、図４に示した障害分析装置３００の動作を説明するためのフローチャートである。 6 to 9 are flowcharts for explaining the operation of the failure analysis apparatus 300 shown in FIG.

　まず、異常度監視部３０１が監視対象システム３３０から異常度を含む異常度情報を取得し（ステップ４０１）、取得した異常度情報をパターン判別部３１５に渡す。 First, the abnormality level monitoring unit 301 acquires abnormality level information including the abnormality level from the monitoring target system 330 (step 401), and passes the acquired abnormality level information to the pattern determination unit 315.

　パターン判定部３１５が知識格納部３１４に含まれる閾値（超平面）を用いて、異常度監視部３０１から受け取った異常度情報から、監視対象システム３３０における事例の種類を判定し、判定結果（事例の種類、種別）と当該異常度情報の識別情報（事例）とを判定結果表示部３１６に渡す（ステップ４０２）。 The pattern determination unit 315 determines the type of case in the monitored system 330 from the abnormality level information received from the abnormality level monitoring unit 301 using the threshold value (hyperplane) included in the knowledge storage unit 314, and the determination result (example) And the identification information (example) of the abnormality level information are passed to the determination result display unit 316 (step 402).

　次に、ステップ４０２においてパターン判定部３１５が障害であると判定した場合には、判定結果表示部３１６は、パターン判定部３１５から受け取ったパターン（種別）と異常度の識別情報とを保守運用者に表示する（ステップ４０３）。 Next, when the pattern determination unit 315 determines that there is a failure in step 402, the determination result display unit 316 displays the pattern (type) and abnormality level identification information received from the pattern determination unit 315. (Step 403).

　次に、保守運用者は、障害事例登録部３１１あるいは判定修正部３１７に対して、事例および真の種別として障害発生時間または正常である時間、場所、事例の種類を入力する。障害事例登録部３１１あるいは判定修正部３１７は、入力された事例を事例格納部３１２に格納する（ステップ６０１）。また、保守運用者は、検出感度格納部３１９に種別毎の検出感度を設定し（ステップ５００）、検出感度入力部３１８を介して、設定した種別毎の検出感度を入力する（ステップ５０１）。ここで、正常に対する検出感度が高いことは、障害全般を検出しにくくなることと同様の意味を持つため、入力される情報が、種別毎の検出感度と、各種別に共通の検出感度とであっても良い。 Next, the maintenance operator inputs the failure occurrence time or normal time, location, and case type as the case and the true type to the failure case registration unit 311 or the determination correction unit 317. The failure case registration unit 311 or the determination correction unit 317 stores the input case in the case storage unit 312 (step 601). The maintenance operator sets detection sensitivity for each type in the detection sensitivity storage unit 319 (step 500), and inputs the set detection sensitivity for each type via the detection sensitivity input unit 318 (step 501). Here, a high detection sensitivity for normality has the same meaning as that it is difficult to detect all faults, and therefore the input information is the detection sensitivity for each type and the common detection sensitivity for each type. May be.

　次に、パターン学習部３１３は、パターン学習により障害判定を行うための閾値を生成する（ステップ６０２）。このステップは、別途保守運用者からの指示により実行されても良い。 Next, the pattern learning unit 313 generates a threshold for performing failure determination by pattern learning (step 602). This step may be executed separately by an instruction from the maintenance operator.

　事例から障害判定を行うための閾値を生成するために、パターン学習部４１３は、事例格納部３１２に含まれる全ての事例について、状況格納部３１０から当該事例に含まれる時間または場所に対応付けられているシステム情報を取得する（ステップ７０１，７０２）。 In order to generate a threshold value for determining a failure from a case, the pattern learning unit 413 associates all cases included in the case storage unit 312 with the time or place included in the case from the situation storage unit 310. System information is acquired (steps 701 and 702).

　パターン学習部３１３は、事例格納部３１２から得られた各事例に対応付けられている各システム情報に含まれる異常度および状況情報から構成される特徴ベクトルを用いて、各システム情報について、各システム情報の事例の種類というパターンに分類するための超平面を学習し（ステップ７０３）、超平面を生成する。このとき、パターン学習部３１３は、検出感度格納部３１９に格納された各検出感度が読み出し、各検出感度を、各事例が超平面を超えることに対して与えられるコストの重みとして用いることで、学習をおこなう。 The pattern learning unit 313 uses the feature vector composed of the degree of abnormality and the situation information included in each system information associated with each case obtained from the case storage unit 312 for each system information. The hyperplane for classifying the information into the pattern of the kind of information is learned (step 703), and the hyperplane is generated. At this time, the pattern learning unit 313 reads out each detection sensitivity stored in the detection sensitivity storage unit 319 and uses each detection sensitivity as a weight of the cost given to each case exceeding the hyperplane, Learn.

　パターン学習部３１３は、学習して生成した超平面を知識格納部３１４に格納し、パターン判定部３１５は、知識格納部３１４に格納された超平面を用いて異常度監視部３０１から受け取った各異常度情報についてパターンを分類する（ステップ７０４）。 The pattern learning unit 313 stores the learned hyperplane in the knowledge storage unit 314, and the pattern determination unit 315 uses the hyperplane stored in the knowledge storage unit 314 to receive each abnormality received from the abnormality level monitoring unit 301. Patterns are classified for the degree of abnormality information (step 704).

　次に、本形態の効果について説明する。 Next, the effect of this embodiment will be described.

　本形態では、保守運用者が考える各障害の種類や障害全般の検出感度の情報が、特徴空間において各障害や正常でラベル付けされた事例が超平面を超えるコストとして与えられるため、生成された超平面で表される閾値が、保守運用者の考える障害検出方針を反映したものとなる。それにより、誤検出が多いが見落としが少ない障害検出・分類を行ったり、逆に誤検出が少なく見落としが多い障害検出・分類を行うことができる。 In this form, the maintenance operator thinks about the type of each fault and the overall detection sensitivity information, because each fault or normal labeled case in the feature space is given as a cost that exceeds the hyperplane. The threshold value represented by the hyperplane reflects the failure detection policy considered by the maintenance operator. Thereby, fault detection / classification with many false detections but few oversights can be performed, and conversely fault detection / classification with few false detections and many oversights can be performed.

　また、本形態によれば、特徴空間内で複数の種類や場所の障害を隔てる超平面が生成される際に、保守運用者により入力された各障害の深刻度に基づいて検出感度の値を大きくし、その値が大きいほど他の障害種類の事例や正常事例に対する超平面を超えるコストを大きくすることで、生成される超平面が障害と判定する領域内により多くの障害事例が含まれるようになる。また、この超平面からなる閾値をシステム監視データに適用して障害検出に用いることで、障害と判定しやすくなる。また逆に、入力された障害検出感度の値を小さくするほど前記コストを小さくすることで、障害と判定しにくくなる。 Further, according to the present embodiment, when a hyperplane separating a plurality of types and places of obstacles is generated in the feature space, the detection sensitivity value is set based on the severity of each trouble inputted by the maintenance operator. By increasing the value and increasing the value, the cost exceeding the hyperplane for cases of other types of faults and normal cases is increased, so that more fault cases are included in the region where the generated hyperplane is determined to be a fault. become. In addition, it is easy to determine a failure by applying this hyperplane threshold to system monitoring data and using it for failure detection. Conversely, by reducing the cost as the inputted failure detection sensitivity value is reduced, it is difficult to determine a failure.

　以下に、上述した障害分析装置３００の動作について、具体的な実施例を用いて説明する。 Hereinafter, the operation of the above-described failure analysis apparatus 300 will be described using a specific example.

　図１０は、図４に示した障害分析装置３００の動作の一実施例を説明するための監視対象の構成図である。また、図１１～図１３は、図４に示した障害分析装置３００の動作の一実施例を説明するための特徴空間を示す図である。 FIG. 10 is a configuration diagram of a monitoring target for explaining an embodiment of the operation of the failure analysis apparatus 300 shown in FIG. 11 to 13 are diagrams showing a feature space for explaining an embodiment of the operation of the failure analysis apparatus 300 shown in FIG.

　図１０に示すように、本実施例では、監視対象のシステム３３０には監視対象装置９０１と９０２が存在し、それらの間で通信が行われており、本発明の管理システム３００は、監視対象装置９０１から監視対象装置９０２との通信の呼損率９０４および、監視対象装置９０２から監視対象装置９０２のＣＰＵ利用率９０５を異常度として取得し、これを特徴空間としてパターン判定部３１５が、障害の種類を特定するものとする。 As shown in FIG. 10, in this embodiment, the monitoring target system 330 includes monitoring target devices 901 and 902, and communication is performed between them, and the management system 300 of the present invention The device 901 acquires the call loss rate 904 of communication with the monitoring target device 902 and the CPU usage rate 905 of the monitoring target device 902 from the monitoring target device 902 as the degree of abnormality. Shall be specified.

　このとき、保守運用者からは、検出感度入力部３１８から検出感度が登録され、この情報から検出閾値を表す特徴空間内での超平面が生成される。 At this time, the maintenance operator registers the detection sensitivity from the detection sensitivity input unit 318, and a hyperplane in the feature space representing the detection threshold is generated from this information.

　保守運用者が、図１１の設定値１０１０に示すように正常と障害とが同一の検出感度の値として設定されると、生成される超平面１００３は、正常領域１００５側に存在する障害事例が超平面１００３を超えている割合と、障害領域側に存在する正常事例が超平面１００３を超えている割合とが同程度となる。 When the maintenance operator sets normality and failure as the same detection sensitivity value as indicated by a setting value 1010 in FIG. 11, the generated hyperplane 1003 has a failure case that exists on the normal region 1005 side. The ratio of exceeding the hyperplane 1003 and the ratio of normal cases existing on the obstacle region side exceeding the hyperplane 1003 are approximately the same.

　この超平面１００３で表される閾値で障害を検出すると、超平面１００３付近の監視結果では正常と判定されるものもあれば、異常と判定される場合もある。 If a fault is detected with the threshold value represented by the hyperplane 1003, some of the monitoring results near the hyperplane 1003 are determined to be normal, and some are determined to be abnormal.

　次に、仮に保守運用者が、超平面１００３付近のような正常と看做せるような監視結果も頻繁に障害検出するようでは、障害の発生確認作業等に時間が取られ煩わしいと考え、検出感度を落として本当に深刻そうな場合のみ検知すると判断し、図１２の設定値１１１０に示すように正常の検出感度を高く設定したとする。 Next, if the maintenance operator frequently detects faults that can be regarded as normal, such as the vicinity of the hyperplane 1003, it will be troublesome because it takes time to check the fault occurrence and so on. Assume that the sensitivity is lowered and it is determined that the detection is performed only when it is really serious, and the normal detection sensitivity is set high as indicated by a setting value 1110 in FIG.

　このとき、生成される超平面１１０３で表される検出閾値は、図１１の超平面１００３付近の特徴を持つデータを、全て正常と判断して、障害検出を少なくすることができる。 At this time, the detection threshold value represented by the generated hyperplane 1103 can determine that all the data having characteristics near the hyperplane 1003 in FIG.

　逆に、仮に保守運用者が、障害と少しでも疑われる場合は検出したいと考え、検出感度を高め、図１３の設定値１２１０に示すように障害の検出感度を高く設定したとする。 On the contrary, suppose that the maintenance operator wants to detect when a failure is suspected as much as possible, raises the detection sensitivity, and sets the failure detection sensitivity to a high value as shown by a setting value 1210 in FIG.

　このとき生成される超平面１２０３で表される検出閾値は、図１３の超平面１２０３付近の特徴を持つデータを、全て障害と判断して、障害と少しでも疑われる場合は全て検出させることができる。 The detection threshold value represented by the hyperplane 1203 generated at this time is such that all data having features near the hyperplane 1203 in FIG. it can.

　次に、図４に示した障害分析装置３００の動作の他の実施例について説明する。 Next, another embodiment of the operation of the failure analysis apparatus 300 shown in FIG. 4 will be described.

　図１４～図１６は、図４に示した障害分析装置３００の動作の他の実施例を説明するための特徴空間を示す図である。 FIGS. 14 to 16 are diagrams showing a feature space for explaining another embodiment of the operation of the failure analysis apparatus 300 shown in FIG.

　本実施例では、図１０と同様のシステムを監視するとするが、ここではＣＰＵ利用率と呼損率の上昇をおよぼす２種類の障害があるとする。 In this embodiment, it is assumed that a system similar to that shown in FIG. 10 is monitored. Here, it is assumed that there are two types of failures that cause an increase in the CPU usage rate and the call loss rate.

　保守運用者が、図１４の設定値１３１０のような検出感度を設定したとすると、障害１と判定される領域は超平面１３０３で囲まれた領域となり、障害２と判定される領域は超平面１３０４で囲まれた領域となる。 If the maintenance operator sets a detection sensitivity such as the setting value 1310 in FIG. 14, the region determined as failure 1 is a region surrounded by the hyperplane 1303, and the region determined as failure 2 is the hyperplane. An area surrounded by 1304 is obtained.

　次に、仮に保守運用者が、障害１は重大な障害であると判断し、この検出感度を図１５の設定値１４１０のように高く設定したとすると、生成される超平面１４０３は図１４の超平面１３０３の境界付近の監視結果も障害１と判定するようになり、この障害を見落とす割合が減る。 Next, if the maintenance operator determines that failure 1 is a serious failure and sets this detection sensitivity as high as the set value 1410 in FIG. 15, the generated hyperplane 1403 is shown in FIG. The monitoring result near the boundary of the hyperplane 1303 is also determined as the failure 1, and the rate of overlooking this failure is reduced.

　このとき他の障害についてはその検出感度はほとんど変化しない。 At this time, the detection sensitivity of other obstacles hardly changes.

　逆に、仮に保守運用者が、障害２が重要な障害であると判断し、この検出感度を図１６の設定値１５１０のように高く設定したとすると、生成される超平面１５０４は図１４の超平面１３０４の境界付近の監視結果も障害２と判定するようになり、この障害を見落とす割合が減る。 Conversely, if the maintenance operator determines that failure 2 is an important failure and sets this detection sensitivity as high as the setting value 1510 in FIG. 16, the generated hyperplane 1504 is as shown in FIG. The monitoring result in the vicinity of the boundary of the hyperplane 1304 is also determined as the failure 2, and the rate of overlooking this failure is reduced.

　ここでも、他の障害についてはその検出感度はほとんど変化しない。 Again, the detection sensitivity of other obstacles hardly changes.

　本発明によれば、コンピュータやネットワーク機器・通信装置からなるシステムを監視し、障害を検出・分類するといった用途に適用できる。 According to the present invention, the present invention can be applied to applications such as monitoring a system including a computer, a network device, and a communication device, and detecting and classifying a failure.

　なお、本発明においては、障害分析装置内の処理は上述の専用のハードウェアにより実現されるもの以外に、その機能を実現するためのプログラムを障害分析装置にて読取可能な記録媒体に記録し、この記録媒体に記録されたプログラムを障害分析装置に読み込ませ、実行するものであっても良い。障害分析装置にて読取可能な記録媒体とは、ＩＣカードやメモリカード、あるいは、フロッピーディスク（登録商標）、光磁気ディスク、ＤＶＤ、ＣＤなどの移設可能な記録媒体の他、障害分析装置に内蔵されたＨＤＤ等を指す。この記録媒体に記録されたプログラムは、例えば、制御ブロックにて読み込まれ、制御ブロックの制御によって、上述したものと同様の処理が行われる。 In the present invention, the processing in the failure analyzer is recorded on a recording medium readable by the failure analyzer, in addition to the above-described dedicated hardware. The program recorded on the recording medium may be read by the failure analysis apparatus and executed. Recording media that can be read by the failure analysis device include IC cards, memory cards, transferable storage media such as floppy disks (registered trademark), magneto-optical disks, DVDs, and CDs, as well as built-in failure analysis devices. Refers to the HDD or the like. The program recorded on this recording medium is read by a control block, for example, and the same processing as described above is performed under the control of the control block.

　以上、実施例を参照して本願発明を説明したが、本願発明は上記実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

　この出願は、２００８年３月７日に出願された日本出願特願２００８－０５８４４０を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2008-058440 filed on Mar. 7, 2008, the entire disclosure of which is incorporated herein.

Claims

The abnormality degree information and the identification information of the abnormality degree information are sequentially received from the monitoring target apparatus that sequentially outputs the abnormality degree information including a plurality of index values indicating the abnormality degree of the monitoring target apparatus together with the identification information of the abnormality degree information. Anomaly information receiving means;
A type determination unit that compares each degree of abnormality information received by the degree of abnormality information reception unit with a predetermined determination criterion, and classifies each degree of abnormality information for each type based on a comparison result;
A determination result output means for associating and outputting identification information of each abnormality degree information and information indicating various types into which each abnormality degree information is classified;
Failure case registration means for receiving input of information indicating the true type for the identification information of each abnormality degree information,
A case storage unit that stores the identification information of each degree of abnormality information in association with the true type;
A detection sensitivity input means for receiving an input of a setting parameter for updating the determination criterion;
Based on each abnormality degree information received by the abnormality degree information receiving means, information indicating a true type stored in association with identification information of each abnormality degree information, and the setting parameter, the determination criterion Failure analysis apparatus having pattern learning means for updating.

The failure analysis apparatus according to claim 1,
The information indicating the true type is a failure analysis device which is information indicating whether the monitoring target device is normal or abnormal.

The failure analysis apparatus according to claim 1,
The failure case registration unit is a failure analysis device that receives information indicating the true type from a terminal operated by an operator.

The failure analysis apparatus according to claim 1,
The detection sensitivity input means is a failure analysis device that receives the detection sensitivity from a terminal operated by an operator.

A failure analysis method using an information processing device,
The information processing apparatus sequentially outputs abnormality degree information including a plurality of index values indicating the degree of abnormality of the monitoring target apparatus together with identification information of the abnormality degree information from the monitoring target apparatus. Sequentially receiving identification information;
The information processing device compares the received abnormality degree information with a predetermined criterion, and classifies the abnormality degree information for each type based on a comparison result;
The information processing apparatus outputs the identification information of each degree of abnormality information and information indicating various types into which each degree of abnormality information is classified,
The information processing apparatus receiving an input of information indicating a true type for each identification information of the degree of abnormality information; and
The information processing apparatus storing the identification information of each abnormality degree information in association with the true type;
The information processing apparatus accepting an input of a setting parameter for updating the determination criterion;
The information processing apparatus determines the determination criterion based on each received abnormality level information, information indicating a true type stored in association with identification information of each abnormality level information, and the setting parameter. A failure analysis method comprising the steps of:

A recording medium on which a program for operating a computer is written,
In the computer,
The abnormality degree information and the identification information of the abnormality degree information are sequentially received from the monitoring target apparatus that sequentially outputs the abnormality degree information including a plurality of index values indicating the abnormality degree of the monitoring target apparatus together with the identification information of the abnormality degree information. Procedure and
Comparing each received degree of abnormality information with a predetermined criterion, and classifying each degree of abnormality information for each type based on the result of comparison;
A procedure for outputting the identification information of each degree of abnormality information and information indicating each type of classification of each degree of abnormality information in association with each other,
A procedure for accepting input of information indicating a true type for identification information of each degree of abnormality information;
A procedure for storing the identification information of each degree of abnormality information in association with the true type;
A procedure for receiving an input of a setting parameter for updating the determination criterion;
A procedure for updating the determination criterion based on each received abnormality degree information, information indicating a true type stored in association with identification information of each abnormality degree information, and the setting parameter is executed. A recording medium on which a program is written.