JP2000250833A

JP2000250833A - Operating information acquisition method in multiple server operation management and recording medium recording the program

Info

Publication number: JP2000250833A
Application number: JP11049422A
Authority: JP
Inventors: Kazuaki Takahashi; 一昭高橋; Takemasa Kajiwara; 孟正梶原
Original assignee: Hitachi Information Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 1999-02-26
Filing date: 1999-02-26
Publication date: 2000-09-14

Abstract

(57)【要約】【課題】サーバ毎に運用管理者を配置する場合と同じ程
度のきめ細かい管理を行うことと、ネットワークや管理
サーバの負荷を軽減して、運用管理の効率向上と信頼性
の大幅な向上を図る。【解決手段】複数サーバの集中管理システムにおいて、
ログデータの種類を特定する管理項目を当該のサーバ２
０，３０，４０に持たせ、エージェントモジュール２
０，３０，４０でセルフ確認できるようにし、またフィ
ルタリング機構２２，３２，４２の照合により必要な情
報のみ通信機構２３，３３，４３から管理サーバ１０に
送信する。管理サーバ１０は、定義された管理項目とそ
の条件とを管理対象サーバ２０，３０，４０に配布する
と同時に、当該管理項目に該当するログデータを取得
し、対象サーバ識別子とともに記憶装置１１０に格納す
る。 (57) [Summary] [Problem] To improve the efficiency and reliability of operation management by performing the same fine-grained management as in the case where an operation manager is assigned to each server, and by reducing the load on the network and the management server. Significant improvement. In a centralized management system of a plurality of servers,
The management item for specifying the type of log data is set in the
0, 30, 40, agent module 2
Self-confirmation is made at 0, 30, and 40, and only necessary information is transmitted from the communication mechanisms 23, 33, and 43 to the management server 10 by collation of the filtering mechanisms 22, 32, and 42. The management server 10 distributes the defined management items and their conditions to the managed servers 20, 30, and 40, acquires log data corresponding to the managed items, and stores the log data in the storage device 110 together with the target server identifier. .

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数のサーバコン
ピュータの運用を一元的に管理する方法に係り、特にサ
ーバ毎に管理項目を持たせて、あたかもサーバ毎に運用
管理者がいるようなきめの細かな管理ができる，複数サ
ーバ運用管理における稼動情報取得方法、およびそのプ
ログラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for centrally managing the operation of a plurality of server computers. In particular, the present invention relates to a method in which a management item is provided for each server and an operation manager is provided for each server. The present invention relates to a method for acquiring operation information in a multiple server operation management, and a recording medium on which the program is recorded.

【０００２】[0002]

【従来の技術】最近は、情報処理システムのダウンサイ
ジング化・オープン化等の進展で、企業情報システムを
クライアント／サーバ型システム（以下、Ｃ／Ｓ型シス
テムと略記する）で構築する場合が増えている。これに
伴い、分散したシステムの運用管理が重要課題となって
きた。一方、システムの運用管理を集中管理するソフト
ウエア技術は、既に実用化されている。この種の従来技
術に関しては、例えば、「情報処理学会研究報告９８−
ＤＳＭ−１０」に掲載されたものが知られている。この
技術について、以下に説明する。このシステムの全体構
成は、管理対象になる複数のサーバコンピュータ（以
下、サーバと略記する）とこれら複数のサーバと通信回
線で結ばれた監視用サーバで構成されている。管理対象
であるサーバ群は、エージェントと呼ばれるモジュール
で監視データを集める機能と、後述するマネージャと通
信する機能を持っている。管理用サーバは、マネージャ
と呼ばれるモジュールでエージェントとの通信により、
エージェントからのデータを集約する機能を持ってい
る。エージェントでは、当該サーバで発生した障害情報
や構成情報（例えば、ＣＰＵの型式，Ｉ／Ｏの構成
等）、稼働のロギング情報などのデータを採取し、マネ
ージャに送信している。また、マネージャは、エージェ
ントからのデータ以外にそのサーバの生死確認をＰＩＮ
Ｇコマンド（Packet Internet Groper：インターネット
などのTIP/IPネットワークで、相手のコンピュータに小
さなパケットデータを送り、その戻り時間により相手先
コンピュータや通信回線の状況をチェックするのに利用
するコマンド）を用いて一定のインターバルで実行して
いる。2. Description of the Related Art Recently, with the progress of downsizing and openness of information processing systems, the number of cases in which a corporate information system is constructed by a client / server type system (hereinafter abbreviated as C / S type system) has increased. ing. Along with this, operation management of distributed systems has become an important issue. On the other hand, software technology for centrally managing system operation management has already been put to practical use. Regarding this kind of conventional technology, for example, see “Information Processing Society of Japan 98-
DSM-10 "is known. This technique will be described below. The overall configuration of this system includes a plurality of server computers (hereinafter, abbreviated as servers) to be managed and a monitoring server connected to the plurality of servers via a communication line. The group of servers to be managed has a function of collecting monitoring data using a module called an agent, and a function of communicating with a manager described later. The management server communicates with agents using a module called a manager,
Has a function to aggregate data from agents. The agent collects data such as fault information and configuration information (for example, CPU type, I / O configuration, etc.) that has occurred in the server, and operation logging information, and transmits the data to the manager. In addition, the manager confirms whether the server is alive or dead in addition to the data from the agent by PIN.
Using the G command (Packet Internet Groper: a command used to send small packet data to the other computer over a TIP / IP network such as the Internet and use the return time to check the status of the other computer or communication line) Running at regular intervals.

【０００３】従来における装置等の集中管理方式として
は、例えば特開平９−１６７１２６号公報に記載された
ネットワーク管理システムがある。これは、障害が発生
したときに、どのクライアントで障害が発生し、何が原
因しているかを突き止めるため、サーバにＬＡＮ／ＷＡ
Ｎネットワーク上のクライアント上で実行されるアプリ
ケーションの識別子を登録し、その識別子を各クライア
ントに登録し、さらにログの取得を行う方式である。し
かし、この方式では、全ての情報を管理サーバに収集す
るので、トラヒック過多になる。As a conventional centralized management method for devices and the like, there is, for example, a network management system described in JP-A-9-167126. This is because when a failure occurs, the server can determine which client has failed and what is the cause of the failure.
In this method, an identifier of an application executed on a client on the N network is registered, the identifier is registered in each client, and a log is obtained. However, in this method, all information is collected in the management server, resulting in excessive traffic.

【０００４】[0004]

【発明が解決しようとする課題】（ａ）従来の集中監視
システムは、稼働しているサーバのＯＳ（Ｏｐｅｒａｔ
ｉｎｇＳｙｓｔｅｍ），ＡＰ（Ａｐｌｉｃａｔｉｏｎ
Ｐｒｏｇｒａｍ）が発行するロギング情報等をネット
ワークを利用して管理サーバに集約しているが、きめ細
かい運用管理を行うためには大量の（全ての）情報を管
理サーバに収集するのでトラヒック過多となり、その情
報の加工工数が増大し、回線障害等が発生するという問
題があった。（ｂ）また、ＰＩＮＧコマンドでは、サーバの物理的生
死の確認は可能であるが、サービスしているＡＰの生死
の確認はできないという問題もある。（ｃ）サービスＡＰが生きていても、ＡＰで使用するＤ
Ｂ（ＤａｔａＢａｓｅ）等のファイル容量の空き状態
の確認手段を管理対象サーバが有していないという問題
もある。（ｄ）管理対象サーバがそれぞれ業務の異なる運用をす
る場合には、業務毎に管理作業と監視作業が必要となる
ため、管理および監視用ＰＣ・端末が業務毎に必要と
なること、サーバの台数が多く、かつ監視端末が分散
設置されているため、人手では管理が困難であること、
集中管理ができる市販製品があるが、業務毎に異なる
管理項目を集中して（一元）管理することはできないこ
と、トラヒック過多のために、障害の検知が遅くなる
おそれがあること、等の各問題がある。(A) The conventional centralized monitoring system uses an OS (Operat) of an operating server.
ing System), AP (Application)
(Program) is aggregated in the management server using a network, but in order to perform detailed operation management, a large amount (all) of information is collected in the management server, resulting in excessive traffic. There has been a problem that the number of processing steps for information increases and a line failure or the like occurs. (B) In addition, although the PING command can confirm the physical life of the server, there is also a problem that the life of the serving AP cannot be confirmed. (C) Even if the service AP is alive, D used by the AP
There is also a problem that the managed server does not have a means for checking the free space of the file capacity such as B (Data Base). (D) When the managed servers operate in different tasks, management and monitoring are required for each task. Therefore, a management and monitoring PC / terminal is required for each task. Due to the large number and the distributed monitoring terminals, it is difficult to manage manually,
There are commercially available products that can be centrally managed, but it is not possible to centrally (integrally) manage different management items for each task, and there is a risk that fault detection may be delayed due to excessive traffic. There's a problem.

【０００５】そこで、本発明の目的は、上記従来技術の
問題点を解決し、管理対象の複数サーバに対して、あた
かもサーバ毎に運用管理者が存在するようなきめの細か
な管理ができ、かつネットワークや管理サーバの負荷を
少なくし、障害通知情報の収集ができ、運用管理の効率
化と信頼性の向上が可能な複数サーバ運用管理における
稼動情報取得方法及びそのプログラムを記録した記録媒
体を提供することにある。Accordingly, an object of the present invention is to solve the above-mentioned problems of the prior art, and to perform a fine-grained management of a plurality of servers to be managed as if an operation manager exists for each server. In addition, a method for acquiring operation information in a multi-server operation management capable of reducing the load on a network or a management server, collecting failure notification information, and improving the efficiency and reliability of operation management, and a recording medium storing the program are provided. To provide.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するた
め、本発明による複数サーバ運用管理における稼働情報
取得方法では、複数の管理対象サーバ毎に管理している
ログデータの中で、必要なデータのみを効率的に管理サ
ーバに送信するようにしている。すなわち、第１番目と
して、複数のクライアントコンピュータがサーバコンピ
ュータに接続されたクライアント/サーバシステムと、
複数のクライアント/サーバシステムのサーバコンピュ
ータに接続された管理サーバコンピュータとから構成さ
れるシステムの複数サーバ運用管理における稼働情報取
得方法であって、前記管理サーバコンピュータは、前記
サーバコンピュータで取得するログデータの種類を特定
する管理項目と該管理項目の条件とを、該サーバコンピ
ュータの識別子に対応させて定義するステップと、定義
された管理項目と該管理項目の条件とを前記管理サーバ
コンピュータから該当する前記サーバコンピュータに各
々配布するステップとを有し、複数の管理対象である前
記サーバコンピュータは、該サーバコンピュータの稼動
時に、前記配布された管理項目に該当するログデータを
取得するステップと、取得したログデータを格納するス
テップと、取得したログデータが前記管理項目の条件を
満たすか否かによって、前記管理サーバコンピュータに
対して、取得したログデータを自身のサーバコンピュー
タの識別子と共に送信するステップとを有することを特
徴としている。In order to achieve the above-mentioned object, in the operation information acquisition method in the multiple server operation management according to the present invention, necessary data among log data managed for each of a plurality of managed servers is provided. Only to the management server efficiently. That is, first, a client / server system in which a plurality of client computers are connected to a server computer,
An operation information acquisition method in a multiple server operation management of a system including a management server computer connected to a server computer of a plurality of client / server systems, wherein the management server computer includes log data acquired by the server computer. Defining a management item for specifying the type of the management item and the condition of the management item in association with the identifier of the server computer, and the defined management item and the condition of the management item correspond from the management server computer. Distributing to each of the server computers, wherein the plurality of managed server computers acquire log data corresponding to the distributed management items when the server computers operate, The step of storing log data and the acquired Transmitting the acquired log data to the management server computer together with the identifier of its own server computer depending on whether or not the log data satisfies the condition of the management item.

【０００７】第２番目として、複数サーバ運用管理にお
ける稼働情報取得方法を実現するためのプログラムを記
録した記録媒体であって、サーバコンピュータで取得す
るログデータの種類を特定する管理項目と該管理項目の
条件とを、該サーバコンピュータの識別子と対応させて
定義する処理と、定義された管理項目と該管理項目の条
件とを前記管理サーバコンピュータから該当する前記サ
ーバコンピュータに各々配布する処理を、管理サーバコ
ンピュータで行ない、前記サーバコンピュータの稼動時
に、前記配布された管理項目に該当するログデータを取
得する処理と、取得したログデータを格納する処理と、
取得したログデータが前記管理項目の条件を満たすか満
たさないかによって、前記管理サーバコンピュータに対
して、取得したログデータを自身のサーバコンピュータ
の識別子と共に送信する処理を、前記サーバコンピュー
タで行うプログラムを記録することを特徴とする。Second, a storage medium storing a program for realizing a method for obtaining operation information in the operation management of a plurality of servers, wherein a management item for specifying a type of log data to be obtained by a server computer and the management item And a process of distributing the defined management items and the conditions of the management items from the management server computer to the corresponding server computers, respectively. Performed by a server computer, when the server computer is operating, a process of acquiring log data corresponding to the distributed management item, a process of storing the acquired log data,
A program that causes the server computer to transmit the acquired log data to the management server computer together with its own server computer identifier depending on whether the acquired log data satisfies or does not satisfy the condition of the management item. It is characterized by recording.

【０００８】[0008]

【発明の実施の形態】以下、本発明の実施例を、図面に
より詳細に説明する。図１は、本発明の一実施例を示す
複数サーバ集中管理システムのブロック図である。図１
に於いて、１０はクライアント／サーバ型システムに接
続してこれを管理する管理サーバ、２０〜４０は管理対
象となるサーバ（Ａ）、（Ｂ），（Ｃ）、５０〜７０は
管理サーバ１０と管理対象サーバ（Ａ）２０，（Ｂ）３
０，（Ｃ）４０を結ぶ通信回線である。管理サーバ１０
は外部記憶装置１１０を持っており、これと同様に管理
対象サーバ２０、３０、４０もそれぞれ外部記憶装置１
２０、１３０、１４０を持っている。また、８０，９０
は管理対象サーバ２０に接続されたクライアントコンピ
ュータである。管理対象サーバ３０，４０にもクライア
ントコンピュータが複数台接続されているが、ここでは
図示省略している。管理サーバ１０は、管理項目登録・
配布機構１１、受信データ蓄積機構１２、通信機構１３
を有している。管理項目登録・配布機構１１は、管理対
象サーバ名（例えば管理対象サーバ２０、名称サーバ
（Ａ））とそのサーバ（Ａ）の管理項目をサーバ（Ａ）
２０を介して外部記憶装置１２０に登録する、併せて管
理対象サーバ２０の当該サーバ（Ａ）に登録情報のコピ
ーを配布する。この後、管理サーバ１０は、自分の外部
記憶装置１１０にも前記サーバ（Ａ）の管理項目を格納
するとともに、管理対象サーバ２０〜４０からその後に
送信された障害等の稼働情報を格納する。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram of a multiple server centralized management system according to an embodiment of the present invention. FIG.
In the figure, reference numeral 10 denotes a management server which connects to and manages a client / server type system; 20 to 40, servers (A), (B), (C) to be managed; and 50 to 70, management servers 10 And managed servers (A) 20, (B) 3
0, (C) 40. Management server 10
Has an external storage device 110, and similarly, the managed servers 20, 30, and 40 each have an external storage device 1
It has 20, 130, 140. Also, 80, 90
Is a client computer connected to the managed server 20. Although a plurality of client computers are also connected to the managed servers 30, 40, they are not shown here. The management server 10 registers management items
Distribution mechanism 11, received data storage mechanism 12, communication mechanism 13
have. The management item registration / distribution mechanism 11 stores the management target server name (for example, the management target server 20, the name server (A)) and the management items of the server (A) in the server (A).
In addition, a copy of the registration information is distributed to the server (A) of the managed server 20, which is registered in the external storage device 120 via the management server 20. Thereafter, the management server 10 stores the management items of the server (A) in its own external storage device 110 and also stores operation information such as failures transmitted from the managed servers 20 to 40 thereafter.

【０００９】管理対象サーバ２０〜４０は、監視情報取
得機構２１、３１、４１と、フィルタリング機構２２、
３２、４２と、通信機構２３、３３、４３を有してい
る。監視情報取得機構２１〜４１は、当該サーバ（たと
えば（Ａ））に登録されている（管理サーバ１０から配
布された）管理項目、例えば状態確認のための生死を確
認する重要ＡＰ名、応答状態を確認するためのＡＰから
の応答が正しいか、ＡＰで使用しているファイル容量の
しきい値、性能維持のしきい値、及びＯＳやＡＰが発行
するロギング情報を一定間隔で監視し、その情報をフィ
ルタリング機構（２２〜４２）に渡す機能をもつ。フィ
ルタリング機構２２〜４２は、管理項目のしきい値や障
害通知が登録されている管理項目と一致しているか否か
を照合し、一致している場合、例えばファイル容量の空
き状態がしきい値を越えた時に、通信機構２３〜４３に
依頼して、この旨を管理サーバ１０に送信してもらう。
外部記憶装置１２０〜１４０は、当該サーバに配布され
てきた管理項目の格納に使用される。The managed servers 20 to 40 include monitoring information acquisition mechanisms 21, 31, 41, and a filtering mechanism 22,
32, 42 and communication mechanisms 23, 33, 43. The monitoring information acquisition mechanisms 21 to 41 are management items registered in the server (for example, (A)) (distributed from the management server 10), for example, an important AP name for confirming life and death for status confirmation, a response status At regular intervals, monitor whether the response from the AP for confirming is correct, the threshold of the file capacity used by the AP, the threshold of the performance maintenance, and the logging information issued by the OS and the AP. It has a function of passing information to the filtering mechanism (22 to 42). The filtering mechanisms 22 to 42 check whether or not the threshold value of the management item and the failure notification match the registered management item. Is exceeded, the request is sent to the management server 10 by requesting the communication mechanisms 23 to 43.
The external storage devices 120 to 140 are used for storing management items distributed to the server.

【００１０】図３は、管理サーバから管理対象サーバに
配布される管理項目テーブルの一例を示す図である。図
３に示すテーブル２５は、管理サーバ１０から管理対象
サーバ２０に配布され、外部記憶装置１２０に格納され
た状態のテーブル内容を示している。同時に、管理サー
バ１０の外部記憶装置１１０内にも、同様の情報が各管
理対象サーバ毎に格納される。管理項目登録のテーブル
に記述される図３の事例では、管理項目、範囲、管理内
容またはしきい値、および備考が記述される。備考欄に
示すように、第１番目のｐｒｃ２６とは、サービスプロ
セス（サービスＡＰ）の監視を表しており、範囲とは、
監視するサービスプロセスの件数（サービスＡＰの件
数）であり、管理内容欄には管理するＡＰの名称が件数
分記入される。第２番目のＣＰＵ２７に対しては、ＣＰ
Ｕの使用率の設定としてしきい値８０％が設定されてい
る。第３番目のＭｅｍｏｒｙ２８に対しては、メモリー
の使用率の設定としてしきい値９０％が設定されてい
る。第４番目のＤｉｓｃ２９に対しては、ＤＩＳＫの空
き容量の設定として、パーティション１は使用率が８５
％を限界とし、パーティション２は使用率が７５％を限
界としている。FIG. 3 is a diagram showing an example of a management item table distributed from the management server to the managed server. The table 25 shown in FIG. 3 shows the contents of the table distributed from the management server 10 to the managed server 20 and stored in the external storage device 120. At the same time, similar information is stored in the external storage device 110 of the management server 10 for each managed server. In the case of FIG. 3 described in the management item registration table, management items, ranges, management contents or thresholds, and remarks are described. As shown in the remarks column, the first prc 26 represents monitoring of a service process (service AP), and the range is
This is the number of service processes to be monitored (the number of service APs), and the management content column is filled with the names of the APs to be managed. For the second CPU 27, CP
A threshold of 80% is set as the setting of the usage rate of U. For the third Memory 28, a threshold of 90% is set as the setting of the memory usage rate. For the fourth Disc 29, the partition 1 has a usage rate of 85
%, And the partition 2 has a usage rate of 75%.

【００１１】図２は、本発明の一実施例を示す稼働情報
取得の動作フローチャートである。ここでは、管理対象
サーバ２０でサービスＡＰ（Ｂ）と監視情報取得機構２
１が既に起動をしている場合の処理を示している。監視
情報取得機構２１は、配布されている当該サーバ２０の
管理項目を読み込み、あるいは外部記憶装置１２０から
読み込み、これをテーブルとして内部展開する（ステッ
プ１０１）。サービスＡＰ（Ｂ）の生死を確認するコマ
ンドを発行する（ステップ１０２）。このコマンドは、
監視情報取得機構２１がＯＳに対して現在動作している
ＡＰを列挙させるコマンドであり、例えば、「RegQuery
ValueEx関数で引数はProcess」である。異常の検知を含
め、結果を受け取ってフィルタリング機構２２にそのデ
ータと制御を渡す（ステップ１０３）。サービスＡＰ
（Ａ）が動作している場合、サービスＡＰ（Ａ）が正常
に稼働しているか否かを確認するために、疑似コマンド
を発行して応答を待つ（ステップ１０４）。この擬似コ
マンドは、例えばサービスＡＰ（Ａ）が基幹業務ＡＰの
場合、そのＡＰから定期的にイベントログを発行させ、
そのイベントログを読み込むようなコマンドである。ま
た、サービスＡＰ（Ａ）が電子メールプログラムの場合
には、管理者にメールを発行させるようなコマンドであ
る。FIG. 2 is an operation flowchart of operation information acquisition according to an embodiment of the present invention. Here, in the managed server 20, the service AP (B) and the monitoring information acquisition mechanism 2
No. 1 shows the processing when it has already been started. The monitoring information acquisition mechanism 21 reads the distributed management items of the server 20 or reads the management items from the external storage device 120, and internally develops them as a table (step 101). A command for confirming the life or death of the service AP (B) is issued (step 102). This command
This is a command by which the monitoring information acquisition mechanism 21 causes the OS to list the APs currently operating, for example, “RegQuery”.
In the ValueEx function, the argument is Process. The result including the abnormality detection is received and the data and control are passed to the filtering mechanism 22 (step 103). Service AP
If (A) is operating, a pseudo command is issued and a response is waited for in order to confirm whether the service AP (A) is operating normally (step 104). For example, when the service AP (A) is a core business AP, this pseudo command causes the AP to periodically issue an event log,
It is a command that reads the event log. When the service AP (A) is an e-mail program, the command is a command for causing the administrator to issue an e-mail.

【００１２】一定時間内に応答が無い場合、或いは応答
内容に不正がある場合等含め、その結果情報と制御をフ
ィルタリング機構２２に渡す（ステップ１０５）。サー
ビスＡＰ（Ａ）で使用しているファイル容量の空きが、
しきい値を越えているか否かを確認する（ステップ１０
６）。なお、使用しているファイルとしきい値は管理項
目としてテーブル２５内に展開されている。ファイルの
空き情報やしきい値を越えている場合を含め、結果の情
報と制御をフィルタリング機構２２に渡す（ステップ１
０７）。以下、同様にテーブル２５で展開されている全
ての管理項目を確認する（ステップ省略）。管理項目全
てを確認したか否かのチェックを行い、管理項目全てが
終了した場合は一定のインターバルで待つ（ステップ１
０８）。監視情報取得機構１１は、上述した処理を所定
時間間隔で繰り返し行なう。すなわち、インターバル終
了時の割り込みによってステップ１０２から再実行を行
う（ステップ１０９）。If there is no response within a certain period of time, or if the content of the response is incorrect, the result information and control are passed to the filtering mechanism 22 (step 105). The free space of the file used by the service AP (A)
Check whether the threshold is exceeded (Step 10)
6). The used files and thresholds are developed in the table 25 as management items. The result information and control, including the free space information of the file and the case where the threshold is exceeded, are passed to the filtering mechanism 22 (step 1).
07). Hereinafter, all the management items developed in the table 25 are similarly confirmed (step omitted). It is checked whether or not all the management items have been confirmed, and if all the management items have been completed, the process waits at certain intervals (step 1).
08). The monitoring information acquisition mechanism 11 repeats the above-described processing at predetermined time intervals. That is, re-execution is performed from step 102 by interruption at the end of the interval (step 109).

【００１３】フィルタリング機構２２は、渡された情報
をロギング情報として外部記憶装置１２０に記録する
（ステップ２０１）と共に、情報が正常か異常かを判断
し、正常の場合には制御を監視情報取得機構に戻す（ス
テップ２０２）。異常時のみ管理サーバ１０に通知する
データを作成し（ステップ２０３）、管理サーバ１０に
送信する（ステップ２０４）。管理サーバ１０は、管理
対象サーバ２０からの情報を受信する（ステップ３０
１）。受信した情報を外部記憶装置１１０に記録する
（ステップ３０２）とともに、モニタなどへの表示、音
声による報告、メールによる通報等の通知を行う（ステ
ップ３０３）The filtering mechanism 22 records the passed information as logging information in the external storage device 120 (step 201), and determines whether the information is normal or abnormal. (Step 202). Data to be notified to the management server 10 only at the time of abnormality is created (step 203) and transmitted to the management server 10 (step 204). The management server 10 receives the information from the managed server 20 (step 30).
1). The received information is recorded in the external storage device 110 (step 302), and a notification such as a display on a monitor, a report by voice, a report by mail, etc. is made (step 303).

【００１４】図４は、本発明の一実施例を示す管理サー
バに送信するデータフォーマット図である。フィルタリ
ング機構２２からの依頼により通信機構２３から管理サ
ーバ１０に送信されるデータ３５は、例えば図４に示す
ようなフォーマットの情報（異常時情報）である。デー
タフォーマットは項目毎に備考が記述されており、例え
ば、図４に示すように、サーバ種類３６に対しては、メ
ールサーバ、業務サーバ等のサーバの種類が記述され、
サーバ名称３７に対しては、サーバの種類の中でサーバ
を特定する名称が記載される。また、管理項目事象３８
に対しては、管理項目対応の日本語表示、サービスレベ
ルのしきい値を超えているか否か、またはサービスＡＰ
ダウン等の事象が記述される。事例としては、『メール
サーバ群Ｅｘｃｈａｎｇｅサーバ１０番のアプリケーシ
ョンがダウンしました。』あるいは『メールサーバ群Ｅ
ｘｃｈａｎｇｅサーバ１０番のパーティション２デスク
の容量が７６％を越しました。』等が送信される。FIG. 4 is a diagram showing a data format transmitted to a management server according to an embodiment of the present invention. The data 35 transmitted from the communication mechanism 23 to the management server 10 at the request of the filtering mechanism 22 is, for example, information (abnormality information) in a format as shown in FIG. In the data format, remarks are described for each item. For example, as shown in FIG. 4, for the server type 36, the types of servers such as a mail server and a business server are described.
For the server name 37, a name for specifying the server among the server types is described. Also, management item event 38
Is displayed in Japanese for the management item, whether the service level exceeds the threshold,
An event such as a down event is described. As an example, "The application of the 10th mail server group Exchange server went down. Or "Mail server group E
The capacity of partition 2 desk of xchange server 10 has exceeded 76%. Is transmitted.

【００１５】図２に示す実施例で示した各処理フロー
は、主に管理対象サーバ２０で実行されるプログラムの
動作を示しているが、本発明を実施するに際しては、こ
の処理を行うプログラムをCD-ROM等の記録媒体に記録し
ておき、管理対象サーバ２０のCD-ROMドライブが、この
CD-ROMに記録されたプログラムを読込んで管理対象サー
バ２０のディスクにインストールしてから実行されるの
が一般的である。さらに、最近の流通形態として、管理
対象サーバ２０のディスクにネットワークを介して接続
された他のコンピュータからプログラムをロードして実
行する形態も増えてきており、このような形態で本発明
のプログラムを管理対象サーバ２０にインストールた
後、実行しても本実施例と同一の効果が得られる。いず
れにしても、プログラムを記録媒体に格納しておけば、
任意の場所で任意の時間に実行させることにより、本発
明を実現することができる。Each processing flow shown in the embodiment shown in FIG. 2 mainly shows the operation of a program executed on the managed server 20, but when implementing the present invention, a program for performing this processing is executed by the program. It is recorded on a recording medium such as a CD-ROM, and the CD-ROM drive of the managed server 20
In general, the program is read from a CD-ROM, installed on a disk of the managed server 20, and then executed. Furthermore, as a recent distribution form, a form in which a program is loaded from another computer connected to the disk of the managed server 20 via a network and executed is increasing, and the program of the present invention is loaded in such a form. Even if the program is installed in the managed server 20 and executed, the same effect as that of the present embodiment can be obtained. In any case, if the program is stored on a recording medium,
The present invention can be realized by executing the program at any place and at any time.

【００１６】[0016]

【発明の効果】以上説明したように、本発明によれば、
複数サーバの集中管理システムに対して、管理項目を当
該のサーバ毎に持たせ、エージェントモジュールでセル
フ確認できるため、あたかもサーバ毎に運用管理者がい
るようなきめ細かい管理ができるとともに、フィルタリ
ング機構で必要な情報のみ管理サーバに送信するため、
ネットワークや管理サーバへの負荷を低減させることが
でき、かつ容易に障害通知情報等の収集ができるので、
運用管理の効率性及び信頼性を大幅に向上できる。As described above, according to the present invention,
In a centralized management system with multiple servers, management items are assigned to each server, and self-confirmation can be performed with the agent module, so that it is possible to perform detailed management as if there is an operation manager for each server, and it is necessary for the filtering mechanism Because only necessary information is sent to the management server,
Since the load on the network and management server can be reduced, and failure notification information can be easily collected.
The efficiency and reliability of operation management can be greatly improved.

[Brief description of the drawings]

【図１】本発明の一実施例を示す複数サーバ運用管理に
おける稼働情報取得システムの全体構成図である。FIG. 1 is an overall configuration diagram of an operation information acquisition system in a multiple server operation management according to an embodiment of the present invention.

【図２】本発明の一実施例を示す情報確認および収集の
動作フローチャートである。FIG. 2 is an operation flowchart of information confirmation and collection showing one embodiment of the present invention.

【図３】本発明の一実施例を示す管理項目登録のテーブ
ルフォーマット図である。FIG. 3 is a table format diagram of management item registration according to an embodiment of the present invention.

【図４】本発明の一実施例を示す管理サーバに送信する
データフォーマット図である。FIG. 4 is a diagram illustrating a data format transmitted to a management server according to an embodiment of the present invention.

[Explanation of symbols]

１０…管理サーバ、１１…管理項目登録・配布機構、１
２…データ蓄積機構、１３…通信機構、２１，３１，４
１…監視機構取得機構、２２，３２，４２…フィルタリ
ング機構、２３，３３，４３…通信機構、１１０，１２
０，１３０，１４０…外部記憶装置、５０，６０，７０
…通信回線、８０，９０…クライアントコンピュータ、
２５…管理項目登録テーブル、２６〜２９…管理項目、
３５…管理サーバに送信するデータフォーマット、３６
〜３８…送信項目。10 management server, 11 management item registration / distribution mechanism, 1
2 ... data storage mechanism, 13 ... communication mechanism, 21, 31, 4
1 ... monitoring mechanism acquisition mechanism, 22, 32, 42 ... filtering mechanism, 23, 33, 43 ... communication mechanism, 110, 12
0, 130, 140 ... external storage device, 50, 60, 70
... communication lines, 80, 90 ... client computers,
25: management item registration table, 26 to 29: management items,
35 ... data format to be transmitted to the management server, 36
-38: transmission items.

フロントページの続きＦターム(参考） 5B042 GA12 JJ03 MC40 5B045 BB02 BB03 BB12 BB28 BB49 JJ08 5B089 GA11 GB02 GB08 JA35 JB15 KA06 KA07 KA13 KB03 KC15 KC30 MC03 Continued on the front page F term (reference) 5B042 GA12 JJ03 MC40 5B045 BB02 BB03 BB12 BB28 BB49 JJ08 5B089 GA11 GB02 GB08 JA35 JB15 KA06 KA07 KA13 KB03 KC15 KC30 MC03 MC03

Claims

[Claims]

1. Operation of a multiple server operation management system including a client / server system in which a plurality of client computers are connected to a server computer, and a management server computer connected to a server computer in the plurality of client / server systems. An information acquisition method, wherein the management server computer defines a management item for specifying a type of log data to be obtained by the server computer and a condition of the management item in association with an identifier of the server computer. Distributing the defined management items and the conditions of the management items from the management server computer to the corresponding server computers, respectively, wherein each of the server computers first operates when the server computer operates. Management items distributed Obtaining the relevant log data; storing the obtained log data; collating whether the acquired log data satisfies the condition of the management item; Transmitting the log data together with the identifier of its own server computer.

2. A recording medium on which a program for realizing an operation information acquisition method in a multiple server operation management is recorded, wherein the management server computer includes a management item for specifying a type of log data to be acquired by the server computer. A process of defining a condition of the management item in association with the identifier of the server computer; and a process of distributing the defined management item and the condition of the management item from the management server computer to the corresponding server computer. The server computer, during operation of the server computer, a process of acquiring log data corresponding to the distributed management item, a process of storing the acquired log data, and the acquired log data is Whether or not the condition is satisfied is checked, and the management server computer is checked against Recording medium for recording a program characterized by storing a program for performing processing of transmitting the result of the log data together with an identifier of its own server computer.