WO2008007442A1 - System management program, system management device and system management method - Google Patents
System management program, system management device and system management method Download PDFInfo
- Publication number
- WO2008007442A1 WO2008007442A1 PCT/JP2006/314107 JP2006314107W WO2008007442A1 WO 2008007442 A1 WO2008007442 A1 WO 2008007442A1 JP 2006314107 W JP2006314107 W JP 2006314107W WO 2008007442 A1 WO2008007442 A1 WO 2008007442A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- symptom
- information
- countermeasure
- database
- management target
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
Definitions
- the present invention relates to a system management program, a system management apparatus, and a system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for solving the symptom.
- the present invention relates to a system management program, a system management apparatus, and a system management method that can be easily registered by specifying individual management targets, not just by type.
- Patent Document 1 discloses a technique for automatically discovering a performance degradation problem in a network system, identifying the cause, and notifying the system administrator of the countermeasure.
- Non-Patent Document 1 discloses a technique in which an autonomous manager refers to a problem solving database and solves problems autonomously when a problem occurs.
- Patent Document 1 Japanese Patent Application Laid-Open No. 2004-145536
- Non-Patent Literature 1 An architectural blueprint for autonomic computing., [Online], IBM Corporation, [Search June 30, 2006], Internet URL: http: ⁇ www- 03.ibm.com/autonomic/pdfs/ AC% 20Blueprint% 20White% 20Paper% 20V7.pdf> Disclosure of Invention Problems to be solved by the invention
- Non-Patent Document 1 is a symptom to be managed only for each resource type, that is, in general units such as server devices in general and applications in general. It was not possible to specify the conditions for identifying the symptom, and it was not possible to set specific conditions for identifying the symptoms individually for each server device or service.
- the present invention has been made in view of the above, and is a system capable of easily registering information for specifying a symptom by specifying an individual management target that is not limited to a unit of type. It is an object to provide a management program, a system management apparatus, and a system management method.
- a symptom of a problem occurring in a management target is specified, and a countermeasure for solving the symptom is determined.
- a system management program for acquiring information indicating the status of the management target and information acquired by the information acquisition procedure for each entry as an individual management target or management target type
- a symptom identification procedure for identifying a symptom occurring in the management target by collating with a symptom database in which a symptom that may occur in the application target and a condition for determining the symptom are registered, and the symptom
- the symptom identified by the specific procedure is collated with the countermeasure database in which the symptom that may occur in the management target is associated with the countermeasure for solving the symptom.
- a system management device that identifies a symptom of a problem occurring in a management target and determines a countermeasure for solving the symptom, wherein the management target Information acquisition means for acquiring information indicating the situation, and information acquired by the information acquisition means can be generated in the application target by applying individual management targets or types of management targets for each entry.
- the symptom identifying means for identifying the symptom occurring in the management target by collating with the symptom database in which the symptom and the judgment condition of the symptom are registered, and the symptom identified by the symptom identifying means, Symptoms occurring in the management target by collating with a countermeasure database in which symptoms that may occur in the management target and countermeasures for solving the symptoms are registered in association with each other Characterized in that a countermeasure determining unit for determining a countermeasure for order to overcome.
- a system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for eliminating the symptom.
- An information acquisition process for acquiring information indicating a situation, and the information acquired by the information acquisition process may be generated in the application target by applying an individual management target or a type of management target for each entry.
- the symptom database in which information for identifying the symptom of the problem occurring in the management target is registered, whether the individual management target is to be applied for each entry. Or, because it is configured to be able to set whether to apply all management targets of the same type, it is easy to specify information for identifying symptoms by specifying individual management targets rather than by type. can do.
- the information acquired by the information acquisition procedure is insufficient according to the aspect of the invention described above, the information is referred to the symptom database. Even if combined, the symptom identification procedure identifies the symptom occurring in the management target. If this is not possible, the computer is further caused to execute an information supplement procedure for acquiring the information to be managed, which is insufficient.
- the symptom specifying means further comprises an information complementing means for acquiring information that is insufficient from the management target if the symptom occurring in the management target cannot be specified.
- the symptom occurring in the management target cannot be specified only by the information acquired by the notification event or the like, the information necessary for the specification is actively acquired and complemented. Even if there is a small amount of information acquired by notification events, etc., it is possible to narrow down the symptoms that actually occur and determine appropriate countermeasures.
- the symptom specifying procedure lacks information acquired by the information acquiring procedure. If the symptom occurring in the managed object cannot be identified even after checking with the symptom database, the information related to the configuration of the managed object is insufficient with reference to the registered configuration management database. An information acquisition destination is specified, and the acquisition destination is specified to acquire information lacking in the information supplement procedure.
- the symptom specifying unit lacks information acquired by the information acquiring unit, and therefore the information is stored in the symptom database. If the symptom occurring in the management target cannot be identified even after collation, refer to the configuration management database in which information related to the configuration of the management target is registered, and obtain the missing information. A destination is specified, the acquisition destination is specified, and the information complementing means is made to acquire the missing information.
- the configuration is such that it is obtained by autonomously determining the acquisition destination of the information that is insufficient with reference to the configuration information, the content to be registered in the symptom database is stored. It can be simplified.
- the countermeasure determining procedure is as follows. Is a countermeasure that satisfies the application condition registered in association with the countermeasure when the countermeasure database includes a plurality of countermeasures corresponding to the symptom identified by the symptom identification procedure. Is determined as a countermeasure for resolving the symptom occurring in the management target.
- the countermeasure database a plurality of countermeasures and their application conditions are registered for one symptom, and a countermeasure that satisfies the application conditions is selected. Therefore, it is possible to select an appropriate countermeasure according to the situation.
- the information complementing procedure is for determining whether or not the application condition registered in association with the countermeasure is satisfied. If the necessary information is insufficient, the information is acquired from the management target.
- the countermeasure determining procedure determines whether or not an application condition registered in association with the countermeasure is satisfied. If there is not enough information required to identify the information to be acquired! It is characterized by having it acquired.
- the configuration is such that the acquisition destination of the information that is lacking with reference to the configuration information is autonomously determined and acquired, the content registered in the countermeasure database Can be simplified.
- an individual management target is set as an application target for each entry. Or all the management targets of the same type can be set as the application target. Therefore, it is possible to easily register by specifying individual management targets.
- the countermeasure database a plurality of countermeasures and their application conditions are registered for one symptom, and a countermeasure that satisfies the application conditions is selected.
- a countermeasure that satisfies the application conditions is selected.
- the configuration database is configured to autonomously determine and acquire the information acquisition destination that is insufficient with reference to the configuration information. There is an effect that the contents to be registered in can be simplified.
- FIG. 1 is a diagram illustrating an example of an information processing system to which a system management method according to the present embodiment is applied.
- FIG. 2 is a functional block diagram showing a configuration of the system management apparatus shown in FIG.
- FIG. 3 is a diagram showing an example of a symptom database.
- FIG. 4 is a diagram showing an example of a countermeasure database.
- FIG. 5 is a diagram showing an example of a performance requirement database.
- FIG. 6 is a diagram showing an example of a configuration management database.
- Fig. 7-1 shows an example of a notification event.
- FIG. 7-2 is a diagram showing another example of the notification event.
- FIG. 7-3 is a diagram showing another example of the notification event.
- FIG. 8 is a flowchart showing a processing procedure of the system management apparatus.
- FIG. 9 is a flowchart showing a processing procedure of countermeasure execution processing.
- FIG. 10 is a functional block diagram illustrating a computer that executes a system management program.
- FIG. 1 is a diagram illustrating an example of an information processing system to which the system management method according to the present embodiment is applied.
- the information processing system shown in the figure is configured by connecting a system management device 100, server devices 201 to 206, and server devices 301 to 303 via a network 10 such as 1 ⁇ 00 & 1 Area Network). ing.
- the system management device 100 is a device that executes the system management method according to the present embodiment.
- the system management device 100 monitors the server devices 201 to 206 and the server devices 301 to 303, and when a problem occurs in these devices or the services executed therein, the system management device 100 refers to the database provided by itself, Once you identify the symptoms, decide what to do about the identified symptoms, and take action! ⁇ ⁇ A series of processing is executed autonomously.
- Server apparatuses 201 to 206 are server apparatuses that execute assigned predetermined services.
- server apparatuses 201 and 202 execute business A service
- server apparatus 203 executes business B service
- server apparatuses 204 to 206 execute business C service. Is doing.
- the server devices 301 to 303 are server devices to which no specific service is assigned, and belong to the server pool 20.
- the server pool means a set of server devices whose usage is not specified and can be used as needed.
- the system management device 100 has some problem in any of the server devices 201 to 206, and needs to cope with the service assigned to the server device being executed by the other server device. If it is determined, one of the server devices belonging to the server pool 20 executes the service.
- the server devices 301 to 303 which are server devices whose usages are not specified, can be executed by using the grid technology, for example. Since the grid technology is a publicly known technology, detailed description thereof is omitted here.
- a group of server devices that execute various business services is the management target of the system management method according to the present embodiment, but the system management method according to the present embodiment is not limited to this.
- various devices such as client terminals and communication control devices can be managed.
- FIG. 2 is a functional block diagram showing the configuration of the system management apparatus 100 shown in FIG.
- the system management apparatus 100 includes a storage unit 110 and a control unit 120.
- the storage unit 110 is a storage unit that stores various types of information, and includes a symptom database 111, a countermeasure database 112, a performance requirement database 113, and a configuration management database 114.
- the symptom database 111 is a database in which information for identifying the symptom of the problem occurring in the management target is registered.
- An example of the symptom database 111 is shown in FIG.
- the symptom database 111 includes an entry number, a judgment condition, a symptom name, an applicable category and a name! It has two items.
- the entry number is an identification number for identifying an entry.
- the judgment condition is a condition for identifying a symptom, and the symptom database 111 has a plurality of conditions in one entry. It is configured to be set in combination.
- the symptom name is a symptom identification name specified by the determination condition of the same entry.
- the classification and name of the application target are information for limiting the target to which the determination condition of the same entry is applied, and the classification takes a value of either “type” or “instance”.
- the value of the category is “Type”
- the type of the target to which the judgment condition of the same entry is applied is set in the name.
- the value of the category is “instance”
- the name of a specific server device or service to which the same entry determination condition is applied is set as the name.
- the entry identified by the entry number "A001” includes "CPU temperature> 80 ° C” as the determination condition, "high fever” as the symptom name, "type” and “type” as the application target category and name. And “Sano” is set. This entry is used for any management target that falls under the category “Sano” and the CPU (Central Processing Unit) temperature exceeds 80 ° C. "Indicates that the symptom identified by the name is identified and identified.
- the entry identified by the entry number "B001” has "CPU usage rate> 70%” and “service response time> 0.5 seconds” as judgment conditions, and "Server A high load” as a symptom name. ”,“ Instance ”and“ Server A ”are set as the category and name of the application target. This entry is used to manage a specific management target named “Server A” when the CPU usage rate exceeds 70% and the service response time exceeds 0.5 seconds. “Server A high load” for the target ⁇ ⁇ Indicates that the symptom identified by name has occurred and is identified.
- the entry identified by the entry number "C001” has "authentication error> 100 times Z minutes” as the judgment condition, "unauthorized access” as the symptom name, and the applicable category and name. "Type” and “Service” are set. This entry is used for any of the management targets that fall under the category of “service”, and if the number of occurrences of authentication errors exceeds 100 per minute, the entry is “illegal access”. This indicates that the symptoms identified by the name are identified as occurring.
- the entry identified by the entry number "D001” includes "service response time> 1 second” as the judgment condition, "operation C high load” as the symptom name, and "instance” as the applicable target category and name.
- "Business C service” is set. This entry is for a specific managed object with the name “Business C Service” and the response time of the service exceeds 1 second. ⁇ ⁇ Indicates that a symptom identified by name has occurred!
- the symptom database 111 designates a type such as a server device or service as an application target, sets a determination condition common to management targets included in the type, and sets each server device or service individually. It is possible to specify both and set specific judgment conditions individually.
- the application target since the application target only needs to be set by the category and name of the application target, it can be set easily and setting errors are unlikely to occur.
- an entry can be provided for each determination pattern in the symptom database 111 and the determination condition can be registered.
- the same name is set as the symptom name of multiple entries.
- the countermeasure database 112 is a database in which countermeasures for solving the specified symptoms and countermeasure selection rules are registered.
- An example of the countermeasure database 112 is shown in FIG.
- the countermeasure database 112 has items such as symptom name, classification and name of application target, countermeasure, application condition, effectiveness, and side effects. It is configured to register multiple side effect combinations.
- the symptom name is an identification name indicating the symptom occurring in the management target, and corresponds to the symptom name in the symptom database 111.
- the category and name of the target of application indicate the subject where the symptom occurs, and the same value as the item of the same name in the symptom database 111 is set.
- the workaround is The countermeasures that can be applied to resolve the symptoms and return the management target to normal are shown, and the application conditions indicate the conditions for applying the countermeasures.
- the effectiveness indicates the effectiveness of the countermeasure, and the side effect indicates the magnitude of the effect produced by the countermeasure.
- the secondary effect means the effect on equipment and services other than the target where the problem occurs when the countermeasure is implemented.
- the side effect takes a positive value, it means that a favorable effect is produced by the countermeasure, and when the side effect takes a negative value, an undesirable effect is produced by the countermeasure. Means that.
- the first entry in Fig. 4 can apply the action of "slow clock” to the symptom identified by the name "high fever”, and the conditions for applying this action are in particular The effectiveness of Naguco's countermeasure is “10” and the side effect is “0”. If no application condition is set, it is interpreted that the application condition of the countermeasure is always satisfied in the process of determining the countermeasure.
- the third entry in the same figure is for the symptom identified by the name "Server B high load”, "add server” t, corrective action and "restrict transactions” t indicates that the countermeasure can be applied.
- the condition for applying the “add server” countermeasure is particularly low.
- the effectiveness of this countermeasure is “8” and the side effect is “1”.
- the effectiveness of this countermeasure is “7” and the side effect is “1”. ”
- the service performance requirement is a performance requirement that is required to satisfy the service operation, and is registered in the performance requirement database 113 for each service.
- the symptom database 111 and the countermeasure database 112 are configured independently, but it is possible to merge these two databases into one database.
- the performance requirement database 113 is a database in which performance requirements required to satisfy the service are registered. An example of the performance requirement database 113 is shown in FIG. As shown in the figure, the performance requirement database 113 has one item: service name, service content, and performance requirement.
- the service name is an identification name for identifying the service executed on the server device
- the service content is a comment indicating the content of the service
- the performance requirement may be satisfied by the service. This is a required performance requirement.
- the first entry in Figure 5 is the service identified by the name “Business A Service”, which is a web service, which processes more than 3000 transactions per minute. Indicates that it is required for performance.
- the content of the service identified by the name “Business B service” is a customer management service, and this service is required to respond to the request within 1 second in terms of performance. It shows that it is.
- the configuration management database 114 is a database in which information related to the configuration to be monitored is registered. An example of the configuration management database 114 is shown in FIG. As shown in the figure, the configuration management database 114 has items such as resource name, specification, and usage.
- the resource name is the name of the resource to be managed
- the specification is the specification of the resource
- the usage is the usage of the resource.
- the first entry in FIG. 6 indicates that the resource identified by the name “Server A” has a CPU of type A and 2 gigabytes of memory and is used for “Business A Service”. It is shown.
- the second entry also indicates that the resource identified by the name “Server B” has a Type B CPU and 512 MB of memory and is used for “Business B Service”. .
- the name of the server device is set as the resource name
- the name of the service executed on the server device is set as the usage
- the correspondence between the server device and the service is registered in the configuration management database 114.
- Force indicating the pipe connection for example, device connection
- Various types of information related to the configuration of the management target can be registered in the configuration management database 114.
- the control unit 120 is a control unit that controls the system management apparatus 100 as a whole, and includes an information acquisition unit 121, a symptom identification unit 122, a countermeasure determination unit 123, a countermeasure execution unit 124, and information supplement Unit 125 and configuration information updating unit 126.
- the information acquisition unit 121 is a processing unit that acquires information indicating the status of the management target by receiving a notification event transmitted from the management target.
- FIGS. 7-1 to 7-3 An example of a notification event to be transmitted is shown in FIGS. 7-1 to 7-3.
- the notification event has items such as event ID, management target type, management target name, and phenomenon.
- the event ID is an identification number for identifying the notification event.
- the management target type is the type of management target for which the notification event notifies the status, and the management target name is a specific name of the management target.
- the phenomenon indicates a specific situation occurring in the management target.
- the transmission of the notification event to the system management apparatus 100 may also be performed in order to notify the contents when a specific phenomenon occurs in the management target. It may also be a regular notification of a defined event.
- the information acquisition unit 121 may be configured to actively collect information by inquiring the status of the management target, or the system administrator may receive a notification event via an input device such as a keyboard. It may be configured to input corresponding information to the information acquisition unit 121! /.
- the symptom identification unit 122 is a processing unit that collates information indicating the status of the management target acquired by the information acquisition unit 121 with the symptom database 111 and identifies a symptom occurring in the management target. is there.
- the identification of the symptom is based on whether or not the information acquired by the information acquisition unit 121 indicates the status of the application target registered in the symptom database 111, and whether or not the information is registered in the same entry. By judging whether or not the power to satisfy It is done.
- the symptom database 111 is configured so that symptoms can be specified by a combination of a plurality of determination conditions, and a plurality of determinations can be made only by information included in the notification event. It may not be possible to determine whether a condition is satisfied for some of the condition combinations.
- the symptom specifying unit 122 is insufficient in some cases when the information acquisition unit 121 cannot determine the suitability Z nonconformity of a part of the combination of the plurality of judgment conditions only by the information acquired by the information acquisition unit 121. !, Request the information complementing unit 125 to complement the information.
- the symptom identification unit 122 In requesting supplementation of information, the symptom identification unit 122 refers to the configuration management database 114 as necessary, and determines where to obtain force information. For example, if the information acquisition unit 121 indicates the status of the information power server device, and if the information is insufficient, the information is executed on the server device! The symptom identification unit 122 is executed on the server device with reference to the configuration management database 114 !, acquires information on the service and requests the information completion unit 125 to acquire the status of the service. .
- the information may be obtained by referring to the configuration management database 114. In such a case, the information that the symptom specifying unit 122 itself lacks is obtained.
- the countermeasure determining unit 123 compares the symptom identified by the symptom identifying unit 122 with the countermeasure database 112 to determine a countermeasure for solving the symptom occurring in the management target. Is a processing unit.
- the countermeasure determining unit 123 sets the effectiveness and side effect for each countermeasure. The value is added, and the larger the added value, the higher the priority. Then, the countermeasure determining unit 123 verifies the application conditions of each countermeasure in descending order of priority, and first determines that the countermeasure satisfying the application conditions should be implemented.
- the countermeasure determining unit 123 acquires the requirements set in the performance requirements from the performance requirement database 113 and verifies whether or not the power is satisfied. To do.
- the countermeasure determining unit 123 may not be able to determine conformity Z nonconformity only by information included in the notification event when verifying the application condition. In such a case, the countermeasure determining unit 123 requests the information supplementing unit 125 to supplement the missing information.
- the countermeasure determining unit 123 refers to the configuration management database 114 as necessary to determine where to acquire information. For example, it indicates the status of the information power server device acquired by the information acquisition unit 121, and if the insufficient information indicates the status of the service executed by the server device, The symptom identification unit 122 refers to the configuration management database 114, acquires information on services executed on the server device, and requests the information complementing unit 125 to acquire the status of the services.
- the information may be obtained by referring to the configuration management database 114.
- the countermeasure determining unit 123 itself obtains the missing information. .
- the countermeasure execution unit 124 is a processing unit that executes the countermeasure determined by the countermeasure determination unit 123.
- the information complementing unit 125 is a processing unit that queries the management target for the information requested to be supplemented by the symptom specifying unit 122 or the countermeasure determining unit 123, and also dynamically acquires information indicating the status of the management target. is there.
- the configuration information update unit 126 is a processing unit that, when the configuration to be managed is changed by the countermeasure determined by the countermeasure determination unit 123, reflects the change contents in the configuration management database 114. .
- the configuration information updating unit 126 may newly execute the service. Set the name of the service to be executed in the use field.
- the information complementing unit 125 actively acquires information that is insufficient and necessary as necessary. Acquired Even if there is only a small amount of information, it is possible to correctly narrow down whether the symptoms are actually occurring and take appropriate measures. In addition, since symptoms are identified and countermeasures are determined using the minimum necessary information, even if the number of management targets increases, there will be no significant load due to information collection.
- FIG. 8 is a flowchart showing the processing procedure of the system management apparatus 100. This figure shows a processing procedure after the system management apparatus 100 receives a notification event of the management target power.
- the symptom identification unit 122 reads an entry in the symptom database 111. (Step S102). If all entries have been read (Yes at step S103), it is determined that there is no problem with the management target and the process ends.
- the management target information acquired by the information acquisition unit 121 indicates the status of the target of application of the read entry. Verify whether it is a thing. If it does not indicate the status of the application target (No at Step S104), the process returns to Step S102 and proceeds to the processing of the next entry.
- the symptom specifying unit 122 sends the information to the information acquisition unit 121. Check whether the management target information obtained in this way matches all or part of the judgment criteria of the read entry. If the determination condition is not met at all (Yes at Step S105), the process returns to Step S102 and proceeds to the processing of the next entry.
- step S105 if the information ability of the management target acquired by the information acquisition unit 121 matches all or part of the judgment conditions of the lead entry (No at step S105), the judgment conditions are satisfied. If there is a shortage of information necessary to determine whether or not it is possible (Yes in step S 1 06), obtain configuration information from the configuration management database 114 as necessary to obtain information. The destination is determined (step S107), and the information supplement unit 125 is instructed to actively acquire the missing information (step S108).
- step S109 the symptom identifying unit 122 identifies that the symptom indicated by the symptom name of the entry has occurred in the management target (step S110), and the system management apparatus 100 executes the countermeasure described later. Processing is executed (step S111).
- step S109 If it is confirmed that all the necessary information has been collected and the determination condition is not satisfied (No in step S109), the symptom specifying unit 122 returns to step S102 and returns to the next entry. Transition to processing.
- FIG. 9 is a flowchart showing a processing procedure for countermeasure execution processing.
- the countermeasure determining unit 123 reads the entry of the countermeasure database 112 corresponding to the symptom identified by the symptom identifying unit 122 (step S 201), and based on the effectiveness and the side effect value. Then, the priority of each countermeasure is calculated (step S202).
- the countermeasure determining unit 123 selects one countermeasure with the highest priority among unselected countermeasures (step S 203). Here, if all countermeasures have been selected (Yes in step S204), the process is terminated assuming that there is no effective countermeasure.
- the countermeasure determining unit 123 sets the countermeasure to the countermeasure executing unit 124 ⁇ . This is executed (step S209), and if necessary, the configuration information update unit 126 is made to update the configuration management database 114 (step S210). On the other hand, if it is confirmed that the application condition is not satisfied (No at Step S208), the countermeasure determining unit 123 returns to Step S203 and proceeds to the processing of the next countermeasure.
- the information acquisition unit 121 receives each notification event shown in FIGS. 7-1 to 7-3.
- the notification event shown in Figure 7-1 indicates that the CPU usage is 73% for server devices with host names “Server A” and “!”.
- the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition.
- the notification event shown in Fig. 7-1 corresponds to the condition to be applied because it applies to all server devices.
- the determination condition of the entry with the entry number “A001” is related to the CPU temperature and has nothing to do with the notified information. Judged not to exist.
- the determination condition of the entry “B001” relates to the CPU usage rate and the service response time, and the notified information satisfies the CPU usage rate condition. Therefore, the symptom specifying unit 122 instructs the information complementing unit 125 to provide the missing information, specifically, “response of the service executed on the server device having the server Aj t ⁇ ⁇ host name”.
- the symptom specifying unit 122 is executed on the server device having the host name “Server A”, and the service is executed. This information is checked and the service is designated to the information complementing unit 125.
- the symptom specifying unit 122 determines whether or not the determination condition for the entry with the entry number “B001” is satisfied.
- the determination condition of this entry is completely satisfied, and the symptom specifying unit 122 has a server having the host name “server A”. In the device, “Server A high load” ⁇ A symptom corresponding to the symptom name has occurred! / Specify to speak.
- the countermeasure determining unit 123 refers to the countermeasure database 112 and determines the countermeasure.
- the symptom name “Server A high load” in the example of the countermeasure database 112 shown in FIG. 4 “add server” t, only the countermeasure is registered, and no applicable condition is specified.
- the measure determining unit 123 determines to execute this countermeasure.
- the countermeasure determining unit 123 causes the countermeasure executing unit 124 to execute “add server”, and the configuration management database 1 indicates that the service is being executed on the added server.
- the configuration information update unit 126 is instructed to be reflected in 14.
- the notification event shown in Figure 7-2 shows that the server usage with the host name “Server B” is V, and the CPU usage rate is 88%! /,! /, The
- the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition.
- the notification event shown in Fig. 7-2 corresponds to the conditions to be applied.
- the entry criteria "A001" entry judgment condition is related to CPU temperature and has nothing to do with the notified information. Judged not to exist.
- the determination condition of the entry “B002” relates to the CPU usage rate, and the notified information satisfies the CPU usage rate condition. Therefore, as shown in this entry, the symptom identification unit 122 generates a symptom corresponding to the “server B high load” ⁇ ⁇ symptom name in the server device having the host name “server B”! / Specify to speak.
- the countermeasure determining unit 123 refers to the countermeasure database 112 and determines the countermeasure.
- the symptom name “Server B high load” in the example of the countermeasure database 112 shown in FIG. 4 two countermeasures are registered.
- the sum of the effectiveness and side effects of the countermeasure “Add server” is 5
- the sum of the effectiveness and side effects of the countermeasure “Restrict transactions” is 6. 123 determines that the latter countermeasure is higher and has priority.
- the countermeasure determining unit 123 determines the application condition of the countermeasure “limit the transaction”.
- the application condition is that the performance requirements of the service are not met. Therefore, the countermeasure determining unit 123 refers to the configuration management database 114, and the service executed by the server device having the host name “Server B” is “Business B service”. Recognize that it exists, and refer to the performance requirement database 113 to obtain the performance requirement of the service “Operation B service”.
- the performance requirement of “service response time ⁇ 1 second” is acquired. Since the service response time is not included in the notification event, this information is acquired by instructing the information supplement unit 125. Let If the acquired service response time does not satisfy the performance requirement, the countermeasure determining unit 123 determines “limit the transaction” as a countermeasure and causes the countermeasure executing unit 124 to execute it.
- the notification event shown in Figure 7-3 shows that the service response time of the service named “Business C service” is 2 seconds.
- the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition.
- the notification event shown in Fig. 7-3 corresponds to the applicable conditions. It is an entry of “C001” and an entry number “D001” that applies to the management target having the name “Business C Service”.
- the entry criteria "C001" entry judgment condition is related to the authentication error and has nothing to do with the notified information. It is judged that there is no.
- the determination condition of the entry “D001” relates to the service response time, and the notified information satisfies the service response time condition. Therefore, the symptom identification unit 122 identifies that the symptom corresponding to the symptom name “Business C high load” is occurring in the service named “Business C service”.
- the countermeasure determining unit 123 stores the countermeasure database 112. Refer to to determine the countermeasure. In the entry for the symptom name “Business C high load” in the example of the countermeasure database 112 shown in FIG. 4, only the countermeasure “add server” is registered, and no application condition is specified. The decision unit 123 decides to execute this countermeasure.
- the countermeasure determining unit 123 then causes the countermeasure executing unit 124 to execute "add server" and reflect that the service is being executed on the added server in the configuration management database 1 14
- the configuration information update unit 126 is instructed to do so.
- information is registered in the symptom database 111 in accordance with the symptom of the service executed on the server device that is not per server.
- appropriate problems can be dealt with for problems that occur in the service.
- the classification and name of the application target of the countermeasure database 112 are the classification of the application target of the identified symptom.
- the explanation of verifying whether or not the force matches the name is omitted.
- the same symptom name may be set for different application targets. It goes without saying that it is preferable to verify name matches.
- the configuration of the system management apparatus 100 according to the present embodiment shown in FIG. 2 can be variously changed without departing from the gist of the present invention.
- a function equivalent to that of the system management apparatus 100 can be realized by mounting the function of the control unit 120 of the system management apparatus 100 as software and executing the function by a computer.
- An example of a computer that executes a system management program 1071 in which the function of the control unit 120 is implemented as software is shown below.
- FIG. 10 is a functional block diagram showing the computer 1000 that executes the system management program 1071.
- the computer 1000 includes a CPU 1010 that executes various arithmetic processes, an input device 1020 that receives input of data from a user, a monitor 1030 that displays various information, a medium reader 1040 that reads a recording medium force program, and the like.
- a network interface that exchanges data with other computers via a network.
- An ace device 1050, a RAM (Random Access Memory) 1060 for temporarily storing various information, and a hard disk device 1070 are connected by a bus 1080.
- the hard disk device 1070 corresponds to a system management program 1071 having the same function as the control unit 120 shown in FIG. 2 and various databases stored in the storage unit 110 shown in FIG. System management data 1072 is stored. Note that the system management data 1072 can be appropriately distributed and stored in other computer connected via the network.
- the CPU 1010 reads the system management program 1071 from the hard disk device 1070 and expands it in the RAM 1060, whereby the system management program 1071 functions as the system management process 1061. Then, the system management process 1061 appropriately expands information read from the system management data 1072 to an area allocated to itself on the RAM 1060, and executes various data processing based on the expanded data. .
- system management program 1071 does not necessarily need to be stored in the hard disk device 1070 so that the computer 1000 reads out and executes this program stored in a storage medium such as a CD-ROM. Also good.
- the program is stored in another computer (or sano) connected to the computer 1000 via a public line, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), etc.
- the 1000 may also read and execute these programs.
- the system management program, the system management apparatus, and the system management method according to the present invention autonomously identify a symptom occurring in a management target and determine a countermeasure corresponding to the symptom. This is especially useful when it is necessary to easily register information for identifying symptoms by specifying individual management targets rather than by type.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
A subject matter is directed to a system management program, etc. that specify a symptom and autonomously perform processing to decide its measures in the case that a trouble occurs in a subject managed device, etc., wherein its problem to be solved is to easily register information for specifying the symptom not only by type but also by designating an individually managed subject. In order to solve this problem, items are provided to set up an applicable classification and a sort in a symptom database which is one for registering information to specify a symptoms of a managed subject, so that, in addition to designating “ a type” in the item of classification and setting a name to represent a sort of an applicable subject in the item of name, designating “an instance” in the item of classification and setting a name to represent an individually managed subject in the item of name can be set.
Description
明 細 書 Specification
システム管理プログラム、システム管理装置およびシステム管理方法 技術分野 System management program, system management apparatus, and system management method
[0001] 本発明は、管理対象において発生している問題の症状を特定し、その症状を解消 するための対処策を決定するシステム管理プログラム、システム管理装置およびシス テム管理方法に関し、特に、症状を特定するための情報を、種別単位だけではなぐ 個別の管理対象を指定して簡単に登録することができるシステム管理プログラム、シ ステム管理装置およびシステム管理方法に関する。 [0001] The present invention relates to a system management program, a system management apparatus, and a system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for solving the symptom. The present invention relates to a system management program, a system management apparatus, and a system management method that can be easily registered by specifying individual management targets, not just by type.
背景技術 Background art
[0002] 従来、情報処理システムにお 、て問題が発生した場合、システム管理者等の担当 者力 自身が保有するノウハウに基づいて対処をおこなってきた。これらのノウハウは 、個別の問題に対処した経験に基づいて各担当者ごとに蓄積され、十分に共有され ていないことが多かった。そのため、担当者によって、障害への対応レベルに大きな 差が生じることがあった。 [0002] Conventionally, when a problem occurs in an information processing system, measures have been taken based on the know-how possessed by the person in charge such as a system administrator. These know-hows were often accumulated by each person in charge based on experience dealing with individual issues, and were often not shared well. As a result, the level of response to disabilities could vary greatly depending on the person in charge.
[0003] このような個人的なスキルへの依存を解消し、問題への対処の迅速化と均質化を実 現するため、情報システムにおいて生じる各種問題に関するノウハウを定型化してデ ータベースに登録しておき、このデータベースを利用して、問題の症状を特定し、特 定した症状に対応する対処策を決定する技術が提案されている。 [0003] In order to eliminate such dependence on personal skills and to speed up and homogenize the handling of problems, standardize know-how related to various problems that occur in information systems and register them in the database. In addition, a technology has been proposed that uses this database to identify the symptoms of a problem and determine the corrective action corresponding to the specific symptoms.
[0004] 例えば、特許文献 1では、ネットワークシステムにおける性能劣化問題を自動的に 発見し、その要因を特定し、さらに、その対策をシステム管理者に通知する技術が開 示されている。また、非特許文献 1では、問題発生時に、自律マネージャが問題解決 用のデータベースを参照し、自律的に問題解決をおこなう技術が開示されている。 [0004] For example, Patent Document 1 discloses a technique for automatically discovering a performance degradation problem in a network system, identifying the cause, and notifying the system administrator of the countermeasure. Non-Patent Document 1 discloses a technique in which an autonomous manager refers to a problem solving database and solves problems autonomously when a problem occurs.
[0005] 特許文献 1 :特開 2004— 145536号公報 Patent Document 1: Japanese Patent Application Laid-Open No. 2004-145536
非特干文献 1: An architectural blueprint for autonomic computing. 、 [online] , IBM Corporation, [平成 18年 6月 30日検索]、インターネットく URL :http:〃 www- 03.ibm.com/autonomic/pdfs/AC%20Blueprint%20White%20Paper%20V7.pdf> 発明の開示
発明が解決しょうとする課題 Non-Patent Literature 1: An architectural blueprint for autonomic computing., [Online], IBM Corporation, [Search June 30, 2006], Internet URL: http: 〃 www- 03.ibm.com/autonomic/pdfs/ AC% 20Blueprint% 20White% 20Paper% 20V7.pdf> Disclosure of Invention Problems to be solved by the invention
[0006] し力しながら、上記の非特許文献 1で開示されて!、る技術は、リソース種別ごと、す なわち、サーバ装置一般やアプリケーション一般といったおおまかな単位でしか、管 理対象の症状を特定するための条件を指定することができず、個々のサーバ装置や サービスに対して、症状を特定するための条件を個別具体的に設定することができ なかった。 [0006] However, the technology disclosed in Non-Patent Document 1 described above is a symptom to be managed only for each resource type, that is, in general units such as server devices in general and applications in general. It was not possible to specify the conditions for identifying the symptom, and it was not possible to set specific conditions for identifying the symptoms individually for each server device or service.
[0007] 一方、上記の特許文献 1で開示されている技術では、「ロードバランサの配下にある サーバ装置」といった制約を付加することにより、対象を限定して、症状を特定するた めの条件を指定することができる。これは、管理対象に関する構成情報を備えること によって可能になっている。しかし、このように、症状を特定するための条件と、対象 を制限する制約とを組み合わせる手法では、設定内容が複雑化し、設定ミスが生じ 易くなるという問題があった。 [0007] On the other hand, in the technique disclosed in Patent Document 1 described above, by adding a constraint such as “a server device under the control of a load balancer”, a condition for specifying a symptom by limiting a target is specified. Can be specified. This is made possible by having configuration information about the managed objects. However, in this way, the method that combines the conditions for identifying the symptoms and the constraints that limit the target has a problem that the setting contents are complicated and setting errors are likely to occur.
[0008] 本発明は、上記に鑑みてなされたものであって、症状を特定するための情報を、種 別単位だけではなぐ個別の管理対象を指定して簡単に登録することができるシステ ム管理プログラム、システム管理装置およびシステム管理方法を提供することを目的 とする。 [0008] The present invention has been made in view of the above, and is a system capable of easily registering information for specifying a symptom by specifying an individual management target that is not limited to a unit of type. It is an object to provide a management program, a system management apparatus, and a system management method.
課題を解決するための手段 Means for solving the problem
[0009] 上述した課題を解決し、目的を達成するために、本発明の一つの態様では、管理 対象において発生している問題の症状を特定し、その症状を解消するための対処策 を決定するシステム管理プログラムであって、前記管理対象の状況を示す情報を取 得する情報取得手順と、前記情報取得手順によって取得された情報を、エントリごと に個別の管理対象もしくは管理対象の種別を適用対象として該適用対象にて発生し 得る症状とその症状の判定条件とが登録された症状データベースと照合することによ り、前記管理対象に発生している症状を特定する症状特定手順と、前記症状特定手 順によつて特定された症状を、前記管理対象にて発生し得る症状と該症状を解消す るための対処策とが対応付けて登録された対処策データベースと照合することにより 、前記管理対象に発生している症状を解消するための対処策を決定する対処策決 定手順とをコンピュータに実行させることを特徴とする。
[0010] また、本発明の他の態様では、管理対象において発生している問題の症状を特定 し、その症状を解消するための対処策を決定するシステム管理装置であって、前記 管理対象の状況を示す情報を取得する情報取得手段と、前記情報取得手段によつ て取得された情報を、エントリごとに個別の管理対象もしくは管理対象の種別を適用 対象として該適用対象にて発生し得る症状とその症状の判定条件とが登録された症 状データベースと照合することにより、前記管理対象に発生している症状を特定する 症状特定手段と、前記症状特定手段によって特定された症状を、前記管理対象にて 発生し得る症状と該症状を解消するための対処策とが対応付けて登録された対処策 データベースと照合することにより、前記管理対象に発生している症状を解消するた めの対処策を決定する対処策決定手段とを備えたことを特徴とする。 [0009] In order to solve the above-described problems and achieve the object, in one aspect of the present invention, a symptom of a problem occurring in a management target is specified, and a countermeasure for solving the symptom is determined. A system management program for acquiring information indicating the status of the management target and information acquired by the information acquisition procedure for each entry as an individual management target or management target type A symptom identification procedure for identifying a symptom occurring in the management target by collating with a symptom database in which a symptom that may occur in the application target and a condition for determining the symptom are registered, and the symptom The symptom identified by the specific procedure is collated with the countermeasure database in which the symptom that may occur in the management target is associated with the countermeasure for solving the symptom. By Rukoto, characterized in that to execute the countermeasure decision procedure for determining the countermeasure for eliminating the symptoms occurring in the management target computer. [0010] Further, in another aspect of the present invention, there is provided a system management device that identifies a symptom of a problem occurring in a management target and determines a countermeasure for solving the symptom, wherein the management target Information acquisition means for acquiring information indicating the situation, and information acquired by the information acquisition means can be generated in the application target by applying individual management targets or types of management targets for each entry. The symptom identifying means for identifying the symptom occurring in the management target by collating with the symptom database in which the symptom and the judgment condition of the symptom are registered, and the symptom identified by the symptom identifying means, Symptoms occurring in the management target by collating with a countermeasure database in which symptoms that may occur in the management target and countermeasures for solving the symptoms are registered in association with each other Characterized in that a countermeasure determining unit for determining a countermeasure for order to overcome.
[0011] また、本発明の他の態様では、管理対象において発生している問題の症状を特定 し、その症状を解消するための対処策を決定するシステム管理方法であって、前記 管理対象の状況を示す情報を取得する情報取得工程と、前記情報取得工程によつ て取得された情報を、エントリごとに個別の管理対象もしくは管理対象の種別を適用 対象として該適用対象にて発生し得る症状とその症状の判定条件とが登録された症 状データベースと照合することにより、前記管理対象に発生している症状を特定する 症状特定工程と、前記症状特定工程によって特定された症状を、前記管理対象にて 発生し得る症状と該症状を解消するための対処策とが対応付けて登録された対処策 データベースと照合することにより、前記管理対象に発生している症状を解消するた めの対処策を決定する対処策決定工程とを含んだことを特徴とする。 [0011] Further, in another aspect of the present invention, there is provided a system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for eliminating the symptom. An information acquisition process for acquiring information indicating a situation, and the information acquired by the information acquisition process may be generated in the application target by applying an individual management target or a type of management target for each entry. A symptom identifying step for identifying a symptom occurring in the management target by collating with a symptom database in which a symptom and a condition for determining the symptom are registered, and the symptom identified by the symptom identifying step are Symptoms occurring in the management target by collating with a countermeasure database in which symptoms that may occur in the management target and countermeasures for solving the symptoms are registered in association with each other That contains a countermeasure determining step of determining a countermeasure for order to eliminate characterized.
[0012] この発明の態様によれば、管理対象において発生している問題の症状を特定する ための情報が登録された症状データベースにおいて、エントリごとに、個別の管理対 象を適用対象とするか、あるいは、同一種別の管理対象全てを適用対象とするかを 設定できるように構成したので、症状を特定するための情報を、種別単位だけではな ぐ個別の管理対象を指定して簡単に登録することができる。 [0012] According to the aspect of the present invention, in the symptom database in which information for identifying the symptom of the problem occurring in the management target is registered, whether the individual management target is to be applied for each entry. Or, because it is configured to be able to set whether to apply all management targets of the same type, it is easy to specify information for identifying symptoms by specifying individual management targets rather than by type. can do.
[0013] また、本発明の他の態様では、上記の発明の態様にぉ 、て、前記情報取得手順に よって取得された情報が不足して 、るために、該情報を前記症状データベースと照 合しても、前記症状特定手順が、前記管理対象に発生している症状を特定すること
ができな 、場合に、不足して 、る情報を前記管理対象力 取得する情報補完手順を さらにコンピュータに実行させることを特徴とする。 [0013] Further, in another aspect of the present invention, since the information acquired by the information acquisition procedure is insufficient according to the aspect of the invention described above, the information is referred to the symptom database. Even if combined, the symptom identification procedure identifies the symptom occurring in the management target. If this is not possible, the computer is further caused to execute an information supplement procedure for acquiring the information to be managed, which is insufficient.
[0014] また、本発明の他の態様では、上記の発明の態様において、前記情報取得手段に よって取得された情報が不足して 、るために、該情報を前記症状データベースと照 合しても、前記症状特定手段が、前記管理対象に発生している症状を特定すること ができな 、場合に、不足して 、る情報を前記管理対象から取得する情報補完手段を さらに備えたことを特徴とする。 [0014] In another aspect of the present invention, in the above aspect of the invention, since the information acquired by the information acquisition unit is insufficient, the information is compared with the symptom database. In this case, the symptom specifying means further comprises an information complementing means for acquiring information that is insufficient from the management target if the symptom occurring in the management target cannot be specified. Features.
[0015] この発明の態様によれば、通知イベント等により取得された情報だけでは管理対象 で発生している症状を特定できない場合に、特定に必要な情報を能動的に取得して 補完するように構成したので、通知イベント等により取得された情報がわずかであつ ても、実際に生じている症状を正しく絞り込み、適切な対処策を決定することができる [0015] According to this aspect of the present invention, when the symptom occurring in the management target cannot be specified only by the information acquired by the notification event or the like, the information necessary for the specification is actively acquired and complemented. Even if there is a small amount of information acquired by notification events, etc., it is possible to narrow down the symptoms that actually occur and determine appropriate countermeasures.
[0016] また、本発明の他の態様では、上記の発明の態様にぉ 、て、前記症状特定手順は 、前記情報取得手順によって取得された情報が不足しているために、該情報を前記 症状データベースと照合しても、前記管理対象に発生している症状を特定することが できない場合に、前記管理対象の構成に関する情報が登録された構成管理データ ベースを参照して、不足している情報の取得先を特定し、該取得先を指定して前記 情報補完手順に不足している情報を取得させることを特徴とする。 [0016] Further, in another aspect of the present invention, in the above aspect of the invention, the symptom specifying procedure lacks information acquired by the information acquiring procedure. If the symptom occurring in the managed object cannot be identified even after checking with the symptom database, the information related to the configuration of the managed object is insufficient with reference to the registered configuration management database. An information acquisition destination is specified, and the acquisition destination is specified to acquire information lacking in the information supplement procedure.
[0017] また、本発明の他の態様では、上記の発明の態様において、前記症状特定手段は 、前記情報取得手段によって取得された情報が不足しているために、該情報を前記 症状データベースと照合しても、前記管理対象に発生している症状を特定することが できない場合に、前記管理対象の構成に関する情報が登録された構成管理データ ベースを参照して、不足している情報の取得先を特定し、該取得先を指定して前記 情報補完手段に不足している情報を取得させることを特徴とする。 [0017] Further, in another aspect of the present invention, in the above aspect of the invention, the symptom specifying unit lacks information acquired by the information acquiring unit, and therefore the information is stored in the symptom database. If the symptom occurring in the management target cannot be identified even after collation, refer to the configuration management database in which information related to the configuration of the management target is registered, and obtain the missing information. A destination is specified, the acquisition destination is specified, and the information complementing means is made to acquire the missing information.
[0018] この発明の態様によれば、構成情報を参照して不足して!/、る情報の取得先を自律 的に判断して取得するように構成したので、症状データベースに登録する内容を簡 略ィ匕することができる。 [0018] According to this aspect of the present invention, since the configuration is such that it is obtained by autonomously determining the acquisition destination of the information that is insufficient with reference to the configuration information, the content to be registered in the symptom database is stored. It can be simplified.
[0019] また、本発明の他の態様では、上記の発明の態様にぉ 、て、前記対処策決定手順
は、前記症状特定手順によって特定された症状に対応する対処策が前記対処策デ ータベースに複数含まれている場合に、該対処策と対応付けて登録されている適用 条件が充足される対処策を、前記管理対象に発生している症状を解消するための対 処策として決定することを特徴とする。 [0019] Further, in another aspect of the present invention, in the above aspect of the invention, the countermeasure determining procedure is as follows. Is a countermeasure that satisfies the application condition registered in association with the countermeasure when the countermeasure database includes a plurality of countermeasures corresponding to the symptom identified by the symptom identification procedure. Is determined as a countermeasure for resolving the symptom occurring in the management target.
[0020] この発明の態様によれば、対処策データベースにおいて、 1つの症状に対して複 数の対処策とその適用条件を登録し、適用条件が満たされる対処策が選択されるよ うに構成したので、状況に応じて適切な対処策を選択することができる。 [0020] According to this aspect of the present invention, in the countermeasure database, a plurality of countermeasures and their application conditions are registered for one symptom, and a countermeasure that satisfies the application conditions is selected. Therefore, it is possible to select an appropriate countermeasure according to the situation.
[0021] また、本発明の他の態様では、上記の発明の態様において、前記情報補完手順は 、前記対処策と対応付けて登録されている適用条件が充足される力否かを判定する ために必要な情報が不足して!/、る場合に、不足して!/、る情報を前記管理対象から取 得することを特徴とする。 [0021] Further, in another aspect of the present invention, in the above aspect of the invention, the information complementing procedure is for determining whether or not the application condition registered in association with the countermeasure is satisfied. If the necessary information is insufficient, the information is acquired from the management target.
[0022] この発明の態様によれば、通知イベント等により取得された情報だけでは対処策を 選択できない場合に、選択に必要な情報を能動的に取得して補完するように構成し たので、通知イベント等により取得された情報がわずかであっても、適切な対処策を 決定することができる。 [0022] According to this aspect of the present invention, when a countermeasure cannot be selected only by information acquired by a notification event or the like, information necessary for selection is actively acquired and supplemented. Appropriate countermeasures can be determined even if only a small amount of information is acquired through notification events.
[0023] また、本発明の他の態様では、上記の発明の態様において、前記対処策決定手順 は、前記対処策と対応付けて登録されている適用条件が充足されるか否かを判定す るために必要な情報が不足して!/、る場合に、不足して!/、る情報の取得先を特定し、 該取得先を指定して前記情報補完手順に不足している情報を取得させることを特徴 とする。 [0023] In another aspect of the present invention, in the above aspect of the invention, the countermeasure determining procedure determines whether or not an application condition registered in association with the countermeasure is satisfied. If there is not enough information required to identify the information to be acquired! It is characterized by having it acquired.
[0024] この発明の態様によれば、構成情報を参照して不足して!/、る情報の取得先を自律 的に判断して取得するように構成したので、対処策データベースに登録する内容を 簡略ィ匕することができる。 [0024] According to the aspect of the present invention, since the configuration is such that the acquisition destination of the information that is lacking with reference to the configuration information is autonomously determined and acquired, the content registered in the countermeasure database Can be simplified.
発明の効果 The invention's effect
[0025] 本発明の一つの態様によれば、管理対象において発生している問題の症状を特 定するための情報が登録された症状データベースにおいて、エントリごとに、個別の 管理対象を適用対象とするか、あるいは、同一種別の管理対象全てを適用対象とす るかを設定できるように構成したので、症状を特定するための情報を、種別単位だけ
ではなぐ個別の管理対象を指定して簡単に登録することができるという効果を奏す る。 [0025] According to one aspect of the present invention, in a symptom database in which information for specifying a symptom of a problem occurring in a management target is registered, an individual management target is set as an application target for each entry. Or all the management targets of the same type can be set as the application target. Therefore, it is possible to easily register by specifying individual management targets.
[0026] また、本発明の他の態様によれば、通知イベント等により取得された情報だけでは 管理対象で発生して!/、る症状を特定できな!、場合に、特定に必要な情報を能動的 に取得して補完するように構成したので、通知イベント等により取得された情報がわ ずかであっても、実際に生じている症状を正しく絞り込み、適切な対処策を決定する ことができると!/、う効果を奏する。 [0026] Further, according to another aspect of the present invention, only information acquired by a notification event or the like occurs in a management target! I can't identify the symptoms! In this case, it is configured to actively acquire and supplement the information necessary for specific identification, so even if there is only a small amount of information acquired by a notification event, etc., the symptoms that are actually occurring are narrowed down correctly. If you can decide on an appropriate countermeasure, it will be effective!
[0027] また、本発明の他の態様によれば、構成情報を参照して不足している情報の取得 先を自律的に判断して取得するように構成したので、症状データベースに登録する 内容を簡略ィ匕することができるという効果を奏する。 [0027] Further, according to another aspect of the present invention, since it is configured to autonomously determine and acquire the acquisition source of the missing information with reference to the configuration information, it is registered in the symptom database. There is an effect that can be simplified.
[0028] また、本発明の他の態様によれば、対処策データベースにおいて、 1つの症状に対 して複数の対処策とその適用条件を登録し、適用条件が満たされる対処策が選択さ れるように構成したので、状況に応じて適切な対処策を選択することができると 、う効 果を奏する。 [0028] Further, according to another aspect of the present invention, in the countermeasure database, a plurality of countermeasures and their application conditions are registered for one symptom, and a countermeasure that satisfies the application conditions is selected. Thus, when an appropriate countermeasure can be selected according to the situation, it is effective.
[0029] また、本発明の他の態様によれば、通知イベント等により取得された情報だけでは 対処策を選択できない場合に、選択に必要な情報を能動的に取得して補完するよう に構成したので、通知イベント等により取得された情報がわずかであっても、適切な 対処策を決定することができるという効果を奏する。 [0029] Further, according to another aspect of the present invention, when a countermeasure cannot be selected only by information acquired by a notification event or the like, information necessary for selection is actively acquired and complemented. As a result, even if the information acquired by the notification event or the like is small, it is possible to determine an appropriate countermeasure.
[0030] また、本発明の他の態様によれば、構成情報を参照して不足して!/、る情報の取得 先を自律的に判断して取得するように構成したので、対処策データベースに登録す る内容を簡略ィ匕することができるという効果を奏する。 [0030] Further, according to another aspect of the present invention, the configuration database is configured to autonomously determine and acquire the information acquisition destination that is insufficient with reference to the configuration information. There is an effect that the contents to be registered in can be simplified.
図面の簡単な説明 Brief Description of Drawings
[0031] [図 1]図 1は、本実施例に係るシステム管理方法が適用される情報処理システムの一 例を示す図である。 FIG. 1 is a diagram illustrating an example of an information processing system to which a system management method according to the present embodiment is applied.
[図 2]図 2は、図 1に示したシステム管理装置の構成を示す機能ブロック図である。 FIG. 2 is a functional block diagram showing a configuration of the system management apparatus shown in FIG.
[図 3]図 3は、症状データベースの一例を示す図である。 FIG. 3 is a diagram showing an example of a symptom database.
[図 4]図 4は、対処策データベースの一例を示す図である。 [FIG. 4] FIG. 4 is a diagram showing an example of a countermeasure database.
[図 5]図 5は、性能要件データベースの一例を示す図である。
[図 6]図 6は、構成管理データベースの一例を示す図である。 FIG. 5 is a diagram showing an example of a performance requirement database. FIG. 6 is a diagram showing an example of a configuration management database.
[図 7-1]図 7— 1は、通知イベントの一例を示す図である。 [Fig. 7-1] Fig. 7-1 shows an example of a notification event.
[図 7-2]図 7— 2は、通知イベントの他の一例を示す図である。 [FIG. 7-2] FIG. 7-2 is a diagram showing another example of the notification event.
[図 7-3]図 7— 3は、通知イベントの他の一例を示す図である。 [FIG. 7-3] FIG. 7-3 is a diagram showing another example of the notification event.
[図 8]図 8は、システム管理装置の処理手順を示すフローチャートである。 FIG. 8 is a flowchart showing a processing procedure of the system management apparatus.
[図 9]図 9は、対処策実行処理の処理手順を示すフローチャートである。 [FIG. 9] FIG. 9 is a flowchart showing a processing procedure of countermeasure execution processing.
[図 10]図 10は、システム管理プログラムを実行するコンピュータを示す機能ブロック 図である。 FIG. 10 is a functional block diagram illustrating a computer that executes a system management program.
符号の説明 Explanation of symbols
10 ネットワーク 10 network
20 サーバプーノレ 20 server punore
100 システム管理装置 100 System management unit
110 じ' 1 B'|5 110 '1 B' | 5
111 症状データベース 111 Symptom Database
112 対処策データベース 112 Solution database
113 性能要件データベース 113 Performance requirement database
114 構成管理データベース 114 Configuration management database
120 制御部 120 Control unit
121 情報取得部 121 Information Acquisition Department
122 症状特定部 122 Symptom identification part
123 対処策決定部 123 Countermeasure decision section
124 対処策実行部 124 Countermeasure execution part
125 情報補完部 125 Information Completion Department
126 構成情報更新部 126 Configuration information update part
201- 、206、 301〜303 サーノ装置 201-, 206, 301-303
1000 コンピュータ 1000 computers
1010 CPU 1010 CPU
1020 入力装置
1030 モニタ 1020 Input device 1030 monitor
1040 媒体読取り装置 1040 Media reader
1050 ネットワークインターフェース装置 1050 Network interface device
1060 RAM 1060 RAM
1061 システム管理プロセス 1061 System management process
1070 ハードディスク装置 1070 Hard disk device
1071 システム管理プログラム 1071 System management program
1072 システム管理用データ 1072 System management data
1080 パス 1080 pass
発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION
[0033] 以下に、本発明に係るシステム管理プログラム、システム管理装置およびシステム 管理方法の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの 発明が限定されるものではない。 Hereinafter, embodiments of a system management program, a system management apparatus, and a system management method according to the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.
実施例 Example
[0034] まず、本実施例に係るシステム管理方法が適用される情報処理システムの一例に ついて説明する。図 1は、本実施例に係るシステム管理方法が適用される情報処理 システムの一例を示す図である。 First, an example of an information processing system to which the system management method according to the present embodiment is applied will be described. FIG. 1 is a diagram illustrating an example of an information processing system to which the system management method according to the present embodiment is applied.
[0035] 同図に示す情報処理システムは、システム管理装置 100と、サーバ装置 201〜20 6と、サーバ装置301〜303とを1^^ 00&1 Area Network)等のネットワーク 10で 接続して構成されている。 [0035] The information processing system shown in the figure is configured by connecting a system management device 100, server devices 201 to 206, and server devices 301 to 303 via a network 10 such as 1 ^^ 00 & 1 Area Network). ing.
[0036] システム管理装置 100は、本実施例に係るシステム管理方法を実行する装置であ る。システム管理装置 100は、サーバ装置 201〜206およびサーバ装置 301〜303 を監視し、これらの装置、もしくは、そこで実行されるサービスに問題が発生した場合 には、自身が備えるデータベースを参照して、その症状を特定し、特定した症状に対 応した対処を決定し、その対処を実行すると!ヽぅ一連の処理を自律的に実行する。 The system management device 100 is a device that executes the system management method according to the present embodiment. The system management device 100 monitors the server devices 201 to 206 and the server devices 301 to 303, and when a problem occurs in these devices or the services executed therein, the system management device 100 refers to the database provided by itself, Once you identify the symptoms, decide what to do about the identified symptoms, and take action!ヽ ぅ A series of processing is executed autonomously.
[0037] サーバ装置 201〜206は、割り当てられた所定のサービスを実行するサーバ装置 である。この例では、サーバ装置 201と 202は、業務 Aサービスを実行し、サーバ装 置 203は、業務 Bサービスを実行し、サーバ装置 204〜206は、業務 Cサービスを実
行している。 Server apparatuses 201 to 206 are server apparatuses that execute assigned predetermined services. In this example, server apparatuses 201 and 202 execute business A service, server apparatus 203 executes business B service, and server apparatuses 204 to 206 execute business C service. Is doing.
[0038] サーバ装置 301〜303は、特定のサービスが割り当てられていないサーバ装置で あり、サーバプール 20に属する。サーバプールとは、用途が特定されておらず、必要 に応じて利用することが可能なサーバ装置の集合を意味する。 The server devices 301 to 303 are server devices to which no specific service is assigned, and belong to the server pool 20. The server pool means a set of server devices whose usage is not specified and can be used as needed.
[0039] システム管理装置 100は、サーバ装置 201〜206のいずれかのサーバ装置に何ら かの問題が生じ、そのサーバ装置に割り当てられているサービスを他のサーバ装置 で実行する対処が必要であると判断した場合、サーバプール 20に属して 、るサーバ 装置の一つにそのサービスを実行させる。 [0039] The system management device 100 has some problem in any of the server devices 201 to 206, and needs to cope with the service assigned to the server device being executed by the other server device. If it is determined, one of the server devices belonging to the server pool 20 executes the service.
[0040] 用途が特定されていないサーバ装置であるサーバ装置 301〜303に指定したサー ビスを実行させることは、例えば、グリッド技術を利用することにより実現することがで きる。グリッド技術は、公知技術であるので、ここでは詳しい説明を省略する。 [0040] The server devices 301 to 303, which are server devices whose usages are not specified, can be executed by using the grid technology, for example. Since the grid technology is a publicly known technology, detailed description thereof is omitted here.
[0041] なお、上記の例では、各種業務サービスを実行するサーバ装置群を本実施例に係 るシステム管理方法の管理対象として 、るが、本実施例に係るシステム管理方法は、 この他にもクライアント端末や通信制御装置等の各種装置を管理対象とすることがで きる。 In the above example, a group of server devices that execute various business services is the management target of the system management method according to the present embodiment, but the system management method according to the present embodiment is not limited to this. In addition, various devices such as client terminals and communication control devices can be managed.
[0042] 次に、本実施例に係るシステム管理方法を実行するシステム管理装置の構成につ いて説明する。図 2は、図 1に示したシステム管理装置 100の構成を示す機能ブロッ ク図である。同図に示すように、システム管理装置 100は、記憶部 110と、制御部 12 0とを有する。 Next, the configuration of the system management apparatus that executes the system management method according to the present embodiment will be described. FIG. 2 is a functional block diagram showing the configuration of the system management apparatus 100 shown in FIG. As shown in the figure, the system management apparatus 100 includes a storage unit 110 and a control unit 120.
[0043] 記憶部 110は、各種情報を記憶する記憶部であり、症状データベース 111と、対処 策データベース 112と、性能要件データベース 113と、構成管理データベース 114と を有する。 The storage unit 110 is a storage unit that stores various types of information, and includes a symptom database 111, a countermeasure database 112, a performance requirement database 113, and a configuration management database 114.
[0044] 症状データベース 111は、管理対象にて発生している問題の症状を特定するため の情報が登録されたデータベースである。症状データベース 111の一例を図 3に示 す。同図に示すように、症状データベース 111は、エントリ番号、判定条件、症状名、 適用対象の区分および名称と!ヽつた項目を有する。 [0044] The symptom database 111 is a database in which information for identifying the symptom of the problem occurring in the management target is registered. An example of the symptom database 111 is shown in FIG. As shown in the figure, the symptom database 111 includes an entry number, a judgment condition, a symptom name, an applicable category and a name! It has two items.
[0045] エントリ番号は、エントリを識別するための識別番号である。判定条件は、症状を特 定するための条件であり、症状データベース 111は、一つのエントリに複数の条件を
組み合わせて設定できるように構成されている。症状名は、同一エントリの判定条件 によって特定される症状の識別名である。 [0045] The entry number is an identification number for identifying an entry. The judgment condition is a condition for identifying a symptom, and the symptom database 111 has a plurality of conditions in one entry. It is configured to be set in combination. The symptom name is a symptom identification name specified by the determination condition of the same entry.
[0046] 適用対象の区分および名称は、同一エントリの判定条件を適用する対象を限定す るための情報であり、区分は、「タイプ」もしくは「インスタンス」のいずれかの値をとる。 区分の値が「タイプ」の場合、名称には、同一エントリの判定条件が適用される対象 の種別が設定される。一方、区分の値が「インスタンス」の場合、名称には、同一ェン トリの判定条件が適用される具体的なサーバ装置やサービスの名前が設定される。 The classification and name of the application target are information for limiting the target to which the determination condition of the same entry is applied, and the classification takes a value of either “type” or “instance”. When the value of the category is “Type”, the type of the target to which the judgment condition of the same entry is applied is set in the name. On the other hand, when the value of the category is “instance”, the name of a specific server device or service to which the same entry determination condition is applied is set as the name.
[0047] 例えば、「A001」というエントリ番号で識別されるエントリには、判定条件として「CP U温度 >80°C」、症状名として「高熱」、適用対象の区分および名称として「タイプ」お よび「サーノ」が設定されている。このエントリは、「サーノ」という種別に該当するいず れかの管理対象において、 CPU (Central Processing Unit)の温度が 80°Cを超え て 、る場合に、その管理対象にぉ 、て「高熱」と 、う名前で識別される症状が発生し て 、ると特定されることを示して 、る。 [0047] For example, the entry identified by the entry number "A001" includes "CPU temperature> 80 ° C" as the determination condition, "high fever" as the symptom name, "type" and "type" as the application target category and name. And “Sano” is set. This entry is used for any management target that falls under the category “Sano” and the CPU (Central Processing Unit) temperature exceeds 80 ° C. "Indicates that the symptom identified by the name is identified and identified.
[0048] また、「B001」というエントリ番号で識別されるエントリは、判定条件として「CPU使 用率〉 70%」と「サービス応答時間〉 0. 5秒」、症状名として「サーバ A高負荷」、適 用対象の区分および名称として「インスタンス」および「サーバ A」が設定されて!ヽる。 このエントリは、「サーバ A」という名前をもつ特定の管理対象において、 CPUの使用 率が 70%を超えており、かつ、サービスの応答時間が 0. 5秒を超えている場合に、 その管理対象にお ヽて「サーバ A高負荷」 ヽぅ名前で識別される症状が発生して ヽ ると特定されることを示して 、る。 [0048] In addition, the entry identified by the entry number "B001" has "CPU usage rate> 70%" and "service response time> 0.5 seconds" as judgment conditions, and "Server A high load" as a symptom name. ”,“ Instance ”and“ Server A ”are set as the category and name of the application target. This entry is used to manage a specific management target named “Server A” when the CPU usage rate exceeds 70% and the service response time exceeds 0.5 seconds. “Server A high load” for the target ヽ ぅ Indicates that the symptom identified by name has occurred and is identified.
[0049] また、「B002」 t 、うエントリ番号で識別されるエントリは、判定条件として「CPU使 用率〉 80%」、症状名として「サーバ B高負荷」、適用対象の区分および名称として「 インスタンス」および「サーバ B」が設定されている。このエントリは、 「サーバ B」という 名前をもつ特定の管理対象において、 CPUの使用率が 80%を超えている場合に、 その管理対象にお 、て「サーバ B高負荷」 、う名前で識別される症状が発生して ヽ ると特定されることを示して 、る。 [0049] In addition, the entry identified by the entry number "B002" t, "CPU usage rate> 80%" as the judgment condition, "Server B high load" as the symptom name, as the classification and name of the application target "Instance" and "Server B" are set. This entry is identified by the name “Server B high load” when the CPU usage rate exceeds 80% for a specific management target named “Server B”. Indicates that the symptom to be identified is identified as occurring.
[0050] また、「C001」というエントリ番号で識別されるエントリには、判定条件として「認証ェ ラー > 100回 Z分」、症状名として「不正アクセス」、適用対象の区分および名称とし
て「タイプ」および「サービス」が設定されている。このエントリは、「サービス」という種 別に該当するいずれかの管理対象において、認証エラーの発生回数が 1分当たり 1 00回を超えて 、る場合に、その管理対象にぉ 、て「不正アクセス」と 、う名前で識別 される症状が発生して ヽると特定されることを示して 、る。 [0050] In addition, the entry identified by the entry number "C001" has "authentication error> 100 times Z minutes" as the judgment condition, "unauthorized access" as the symptom name, and the applicable category and name. "Type" and "Service" are set. This entry is used for any of the management targets that fall under the category of “service”, and if the number of occurrences of authentication errors exceeds 100 per minute, the entry is “illegal access”. This indicates that the symptoms identified by the name are identified as occurring.
[0051] また、「D001」というエントリ番号で識別されるエントリは、判定条件として「サービス 応答時間 > 1秒」、症状名として「業務 C高負荷」、適用対象の区分および名称として 「インスタンス」および「業務 Cサービス」が設定されている。このエントリは、「業務 Cサ 一ビス」という名前をもつ特定の管理対象において、サービスの応答時間が 1秒を超 えて 、る場合に、その管理対象にお!、て「業務 C高負荷」 ヽぅ名前で識別される症 状が発生して!/、ると特定されることを示して 、る。 [0051] In addition, the entry identified by the entry number "D001" includes "service response time> 1 second" as the judgment condition, "operation C high load" as the symptom name, and "instance" as the applicable target category and name. And "Business C service" is set. This entry is for a specific managed object with the name “Business C Service” and the response time of the service exceeds 1 second.示 し Indicates that a symptom identified by name has occurred!
[0052] このように、症状データベース 111は、サーバ装置やサービスといった種別を適用 対象に指定して、その種別に含まれる管理対象に共通する判定条件を設定すること と、サーバ装置やサービスを個別に指定して、個別具体的に判定条件を設定するこ との両方が可能になっている。また、適用対象の指定は、適用対象の区分および名 称を設定するだけでよいので、容易に設定でき、設定ミスが発生し難い。 [0052] As described above, the symptom database 111 designates a type such as a server device or service as an application target, sets a determination condition common to management targets included in the type, and sets each server device or service individually. It is possible to specify both and set specific judgment conditions individually. In addition, since the application target only needs to be set by the category and name of the application target, it can be set easily and setting errors are unlikely to occur.
[0053] なお、同一の症状を特定するための判定条件が複数パターンある場合は、症状デ ータベース 111に、それぞれの判定パターンごとにエントリを設けて判定条件を登録 することができる。この場合、複数のエントリの症状名に、同一の名前が設定されるこ とになる。 If there are a plurality of determination conditions for specifying the same symptom, an entry can be provided for each determination pattern in the symptom database 111 and the determination condition can be registered. In this case, the same name is set as the symptom name of multiple entries.
[0054] 対処策データベース 112は、特定された症状を解消するための対処策と、対処策 の選択ルールとが登録されたデータベースである。対処策データベース 112の一例 を図 4に示す。同図に示すように、対処策データベース 112は、症状名、適用対象の 区分および名称、対処策、適用条件、有効度、副作用といった項目を有し、一つの 症状名に対して、対処策〜副作用の組み合わせを複数登録できる構成になっている [0054] The countermeasure database 112 is a database in which countermeasures for solving the specified symptoms and countermeasure selection rules are registered. An example of the countermeasure database 112 is shown in FIG. As shown in the figure, the countermeasure database 112 has items such as symptom name, classification and name of application target, countermeasure, application condition, effectiveness, and side effects. It is configured to register multiple side effect combinations.
[0055] 症状名は、管理対象にて発生している症状を示す識別名であり、症状データべ一 ス 111の症状名と対応する。適用対象の区分および名称は、その症状が発生する対 象を示し、症状データベース 111の同名の項目と同様の値が設定される。対処策は
、症状を解消して管理対象を正常に戻すために適用しうる対処策を示し、適用条件 は、その対処策を適用するための条件を示す。有効度は、その対処策の有効性の高 さを示し、副作用は、その対処策によって副次的に生じる効果の大きさを示す。副次 的に生じる効果とは、その対策を実行した場合に、問題が生じている対象以外の装 置やサービスに与える効果等を意味する。 [0055] The symptom name is an identification name indicating the symptom occurring in the management target, and corresponds to the symptom name in the symptom database 111. The category and name of the target of application indicate the subject where the symptom occurs, and the same value as the item of the same name in the symptom database 111 is set. The workaround is The countermeasures that can be applied to resolve the symptoms and return the management target to normal are shown, and the application conditions indicate the conditions for applying the countermeasures. The effectiveness indicates the effectiveness of the countermeasure, and the side effect indicates the magnitude of the effect produced by the countermeasure. The secondary effect means the effect on equipment and services other than the target where the problem occurs when the countermeasure is implemented.
[0056] 副作用が正の値をとる場合、対処策によって好ましい効果が副次的に生じることを 意味し、副作用が負の値をとる場合、対処策によって好ましくない効果が副次的に生 じることを意味する。対処策によって直接的に得られる効果の大きさを有効度として 登録し、副次的に生じる効果の大きさを副作用として別個に登録することにより、対処 策の評価意図が明確化され、対処策の評価を見直す際に、誤評価が生じにくくなる [0056] When the side effect takes a positive value, it means that a favorable effect is produced by the countermeasure, and when the side effect takes a negative value, an undesirable effect is produced by the countermeasure. Means that. By registering the magnitude of the effect directly obtained by the countermeasure as the effectiveness and separately registering the magnitude of the secondary effect as a side effect, the intention to evaluate the countermeasure is clarified, and the countermeasure Misassessment is less likely to occur when reviewing
[0057] 図 4の、 1番目のエントリは、「高熱」という名前で識別される症状に対して、「クロック を落とす」という対処策を適用でき、この対処策を適用するための条件は特になぐこ の対処策の有効度は「10」であり、副作用は「0」であることを示している。なお、適用 条件の設定がない場合、対処策の決定の過程において、その対処策の適用条件は 常に充足されるとものとして解釈される。 [0057] The first entry in Fig. 4 can apply the action of "slow clock" to the symptom identified by the name "high fever", and the conditions for applying this action are in particular The effectiveness of Naguco's countermeasure is “10” and the side effect is “0”. If no application condition is set, it is interpreted that the application condition of the countermeasure is always satisfied in the process of determining the countermeasure.
[0058] また、同図の、 3番目のエントリは、「サーバ B高負荷」という名前で識別される症状 に対して、「サーバを追加する」 t 、う対処策と「トランザクションを制限する」 t 、う対 処策を適用できることを示している。そして、「サーバを追加する」という対処策を適用 するための条件は特になぐこの対処策の有効度は「8」であり、副作用が「一 3」であ ることを示している。また、「トランザクションを制限する」という対処策を適用するため には、サービスの性能要件が満たされていないことが必要であり、この対処策の有効 度は「7」であり、副作用が「 1」であることを示して 、る。 [0058] In addition, the third entry in the same figure is for the symptom identified by the name "Server B high load", "add server" t, corrective action and "restrict transactions" t indicates that the countermeasure can be applied. The condition for applying the “add server” countermeasure is particularly low. The effectiveness of this countermeasure is “8” and the side effect is “1”. Also, in order to apply the “restrict transaction” countermeasure, it is necessary that the performance requirements of the service are not met. The effectiveness of this countermeasure is “7” and the side effect is “1”. ”
[0059] サービスの性能要件は、サービスを運用する上で満たすことが求められている性能 上の要件であり、性能要件データベース 113にサービスごとに登録される。なお、図 2においては、症状データベース 111と、対処策データベース 112とが独立した構成 となっているが、この 2つのデータベースを併合し、 1つのデータベースとして構成す ることちでさる。
[0060] 性能要件データベース 113は、サービスを運用する上で満たすことが求められてい る性能上の要件が登録されたデータベースである。性能要件データベース 113の一 例を図 5に示す。同図に示すように、性能要件データベース 113は、サービス名、サ 一ビス内容、性能要件と 1、つた項目を有する。 [0059] The service performance requirement is a performance requirement that is required to satisfy the service operation, and is registered in the performance requirement database 113 for each service. In FIG. 2, the symptom database 111 and the countermeasure database 112 are configured independently, but it is possible to merge these two databases into one database. [0060] The performance requirement database 113 is a database in which performance requirements required to satisfy the service are registered. An example of the performance requirement database 113 is shown in FIG. As shown in the figure, the performance requirement database 113 has one item: service name, service content, and performance requirement.
[0061] サービス名は、サーバ装置で実行されるサービスを識別するための識別名であり、 サービス内容は、当該のサービスの内容を示すコメントであり、性能要件は、当該の サービスが満たすことが求められている性能上の要件である。 [0061] The service name is an identification name for identifying the service executed on the server device, the service content is a comment indicating the content of the service, and the performance requirement may be satisfied by the service. This is a required performance requirement.
[0062] 図 5の、 1番目のエントリは、「業務 Aサービス」という名前で識別されるサービスの内 容は、 WEBサービスであり、このサービスは、毎分 3000件以上のトランザクションを 処理することが性能上求められていることを示している。また、 2番目のエントリは、「 業務 Bサービス」という名前で識別されるサービスの内容は、顧客管理サービスであり 、このサービスは、リクエストとに対して 1秒以内に応答することが性能上求められて 、ることを示して 、る。 [0062] The first entry in Figure 5 is the service identified by the name “Business A Service”, which is a web service, which processes more than 3000 transactions per minute. Indicates that it is required for performance. In the second entry, the content of the service identified by the name “Business B service” is a customer management service, and this service is required to respond to the request within 1 second in terms of performance. It shows that it is.
[0063] なお、この例では、各サービスに要件が 1つずつしか設定されていないが、 1つのサ 一ビスに対して複数の要件を設定することもできる。 [0063] In this example, only one requirement is set for each service. However, a plurality of requirements can be set for one service.
[0064] 構成管理データベース 114は、監視対象の構成に関する情報が登録されたデータ ベースである。構成管理データベース 114の一例を図 6に示す。同図に示すように、 構成管理データベース 114は、リソース名、仕様、用途といった項目を有する。 The configuration management database 114 is a database in which information related to the configuration to be monitored is registered. An example of the configuration management database 114 is shown in FIG. As shown in the figure, the configuration management database 114 has items such as resource name, specification, and usage.
[0065] リソース名は、管理対象のリソースの名前であり、仕様は、当該のリソースの仕様で あり、用途は、当該のリソースの用途である。 [0065] The resource name is the name of the resource to be managed, the specification is the specification of the resource, and the usage is the usage of the resource.
[0066] 図 6の、 1番目のエントリは、「サーバ A」という名前で識別されるリソースは、種別 A の CPUと 2ギガバイトのメモリを備え、「業務 Aサービス」のために用いられることを示 している。また、 2番目のエントリは、「サーバ B」という名前で識別されるリソースは、種 別 Bの CPUと 512メガバイトのメモリを備え、「業務 Bサービス」のために用いられるこ とを示している。 [0066] The first entry in FIG. 6 indicates that the resource identified by the name “Server A” has a CPU of type A and 2 gigabytes of memory and is used for “Business A Service”. It is shown. The second entry also indicates that the resource identified by the name “Server B” has a Type B CPU and 512 MB of memory and is used for “Business B Service”. .
[0067] なお、同図では、リソース名としてサーバ装置の名称を、用途としてそのサーバ装置 で実行されるサービスの名称を設定し、サーバ装置とサービスの対応を構成管理デ ータベース 114に登録した例を示している力 例えば、装置の接続関係のような、管
理対象の構成に関する各種情報を構成管理データベース 114に登録することができ る。 [0067] In the figure, the name of the server device is set as the resource name, the name of the service executed on the server device is set as the usage, and the correspondence between the server device and the service is registered in the configuration management database 114. Force indicating the pipe connection, for example, device connection Various types of information related to the configuration of the management target can be registered in the configuration management database 114.
[0068] 制御部 120は、システム管理装置 100を全体制御する制御部であり、情報取得部 1 21と、症状特定部 122と、対処策決定部 123と、対処策実行部 124と、情報補完部 1 25と、構成情報更新部 126とを有する。情報取得部 121は、管理対象から送信され る通知イベントを受信することにより、管理対象の状況を示す情報を取得する処理部 である。 The control unit 120 is a control unit that controls the system management apparatus 100 as a whole, and includes an information acquisition unit 121, a symptom identification unit 122, a countermeasure determination unit 123, a countermeasure execution unit 124, and information supplement Unit 125 and configuration information updating unit 126. The information acquisition unit 121 is a processing unit that acquires information indicating the status of the management target by receiving a notification event transmitted from the management target.
[0069] 管理対象力 送信される通知イベントの一例を図 7— 1〜7— 3に示す。同図に示 すように、通知イベントは、イベント ID、管理対象種別、管理対象名、現象といった項 目を有している。イベント IDは、通知イベントを識別すための識別番号である。管理 対象種別は、当該の通知イベントが状況を通知する管理対象の種別であり、管理対 象名は、その管理対象の具体的な名前である。現象は、管理対象に生じている具体 的な状況を示す。 [0069] Power to be managed An example of a notification event to be transmitted is shown in FIGS. 7-1 to 7-3. As shown in the figure, the notification event has items such as event ID, management target type, management target name, and phenomenon. The event ID is an identification number for identifying the notification event. The management target type is the type of management target for which the notification event notifies the status, and the management target name is a specific name of the management target. The phenomenon indicates a specific situation occurring in the management target.
[0070] 現象の項目が示す状況が、サーバ装置に関するものであれば、管理対象種別には 「サーノ が設定され、管理対象名にはそのサーバ装置のホスト名が設定される。ま た、現象の項目が示す状況が、サービスに関するものであれば、管理対象種別には 「サービス」が設定され、管理対象名にはそのサービスの名前が設定される。 [0070] If the situation indicated by the phenomenon item relates to the server device, "Sano" is set as the management target type, and the host name of the server device is set as the management target name. If the status indicated by the item is related to a service, “service” is set as the management target type, and the name of the service is set as the management target name.
[0071] なお、管理対象力もシステム管理装置 100への通知イベントの送信は、管理対象 において特異的な現象が発生した場合に、その内容を通知するためにおこなうもの であってもよいし、予め定められた事象について定期的に通知するものであってもよ い。また、情報取得部 121が、管理対象に対して状況を問い合せて能動的に情報収 集するように構成してもよいし、システム管理者が、キーボード等の入力装置を介して 、通知イベントに相当する情報を情報取得部 121に入力するように構成してもよ!/、。 [0071] It should be noted that the transmission of the notification event to the system management apparatus 100 may also be performed in order to notify the contents when a specific phenomenon occurs in the management target. It may also be a regular notification of a defined event. In addition, the information acquisition unit 121 may be configured to actively collect information by inquiring the status of the management target, or the system administrator may receive a notification event via an input device such as a keyboard. It may be configured to input corresponding information to the information acquisition unit 121! /.
[0072] 症状特定部 122は、情報取得部 121において取得された管理対象の状況を示す 情報と、症状データベース 111とを照合して、管理対象にて発生している症状を特定 する処理部である。症状の特定は、情報取得部 121において取得された情報が、症 状データベース 111に登録された適用対象の状況を示すものであるか否かと、その 情報が、同一エントリに登録された判定条件を充足させる力否かを判定することによ
りおこなわれる。 [0072] The symptom identification unit 122 is a processing unit that collates information indicating the status of the management target acquired by the information acquisition unit 121 with the symptom database 111 and identifies a symptom occurring in the management target. is there. The identification of the symptom is based on whether or not the information acquired by the information acquisition unit 121 indicates the status of the application target registered in the symptom database 111, and whether or not the information is registered in the same entry. By judging whether or not the power to satisfy It is done.
[0073] 既に説明したように、症状データベース 111は、複数の判定条件の組み合わせに よって症状を特定することができるように構成されており、通知イベントに含まれてい た情報だけでは、複数の判定条件の組み合わせの一部について条件が充足される か否かを判定できな 、場合がある。 [0073] As described above, the symptom database 111 is configured so that symptoms can be specified by a combination of a plurality of determination conditions, and a plurality of determinations can be made only by information included in the notification event. It may not be possible to determine whether a condition is satisfied for some of the condition combinations.
[0074] そこで、症状特定部 122は、情報取得部 121にお 、て取得された情報だけでは、 複数の判定条件の組み合わせの一部の適合 Z不適合を判定できな 、場合に、不足 して!、る情報の補完を情報補完部 125に依頼する。 [0074] Therefore, the symptom specifying unit 122 is insufficient in some cases when the information acquisition unit 121 cannot determine the suitability Z nonconformity of a part of the combination of the plurality of judgment conditions only by the information acquired by the information acquisition unit 121. !, Request the information complementing unit 125 to complement the information.
[0075] 情報の補完を依頼するにあたって、症状特定部 122は、必要に応じて構成管理デ ータベース 114を参照し、どこ力 情報を取得すべきかを決定する。例えば、情報取 得部 121において取得された情報力 サーバ装置の状況を示すものであり、不足し て 、る情報が、そのサーバ装置で実行されて!、るサービスの状況を示すものであれ ば、症状特定部 122は、構成管理データベース 114を参照してそのサーバ装置で実 行されて!、るサービスの情報を取得し、そのサービスの状況を取得するように情報補 完部 125に依頼する。 [0075] In requesting supplementation of information, the symptom identification unit 122 refers to the configuration management database 114 as necessary, and determines where to obtain force information. For example, if the information acquisition unit 121 indicates the status of the information power server device, and if the information is insufficient, the information is executed on the server device! The symptom identification unit 122 is executed on the server device with reference to the configuration management database 114 !, acquires information on the service and requests the information completion unit 125 to acquire the status of the service. .
[0076] なお、不足している情報によっては、構成管理データベース 114を参照することに よって取得できる場合もあり、そのような場合は、症状特定部 122自身が不足してい る情報を取得する。 [0076] Depending on the information that is lacking, the information may be obtained by referring to the configuration management database 114. In such a case, the information that the symptom specifying unit 122 itself lacks is obtained.
[0077] 対処策決定部 123は、症状特定部 122において特定された症状と、対処策データ ベース 112とを照合して、管理対象にて発生している症状を解消するための対処策 を決定する処理部である。 [0077] The countermeasure determining unit 123 compares the symptom identified by the symptom identifying unit 122 with the countermeasure database 112 to determine a countermeasure for solving the symptom occurring in the management target. Is a processing unit.
[0078] 対処策決定部 123は、症状特定部 122において特定された症状に対して、複数の 対処策が対処策データベース 112に登録されている場合、それぞれの対処策ごとに 有効度と副作用の値を加算し、この加算値が大きいものほど優先度の高い対処策で あると判定する。そして、対処策決定部 123は、優先度の高い順に、各対処策の適 用条件を検証し、最初に適用条件が充足された対処策を実施すべきものとして決定 する。 [0078] When a plurality of countermeasures are registered in the countermeasure database 112 for the symptom identified by the symptom identifying section 122, the countermeasure determining unit 123 sets the effectiveness and side effect for each countermeasure. The value is added, and the larger the added value, the higher the priority. Then, the countermeasure determining unit 123 verifies the application conditions of each countermeasure in descending order of priority, and first determines that the countermeasure satisfying the application conditions should be implemented.
[0079] 対処策データベース 112に登録されている適用条件力 サービスの性能要件を満
たす力否かを検証するものであった場合、対処策決定部 123は、性能要件データべ ース 113から性能要件に設定されている要件を取得し、これが充足される力否かを 検証する。 [0079] Meet the performance requirements of the service conditions registered in the countermeasure database 112 In the case of verifying whether or not the power is to be supported, the countermeasure determining unit 123 acquires the requirements set in the performance requirements from the performance requirement database 113 and verifies whether or not the power is satisfied. To do.
[0080] なお、対処策決定部 123は、適用条件を検証するにあたって、通知イベントに含ま れていた情報だけでは適合 Z不適合を判定できない場合がある。このような場合、対 処策決定部 123は、不足している情報の補完を情報補完部 125に依頼する。 [0080] Note that the countermeasure determining unit 123 may not be able to determine conformity Z nonconformity only by information included in the notification event when verifying the application condition. In such a case, the countermeasure determining unit 123 requests the information supplementing unit 125 to supplement the missing information.
[0081] 情報の補完を依頼するにあたって、対処策決定部 123は、必要に応じて構成管理 データベース 114を参照し、どこ力 情報を取得すべきかを決定する。例えば、情報 取得部 121において取得された情報力 サーバ装置の状況を示すものであり、不足 して ヽる情報が、そのサーバ装置で実行されて ヽるサービスの状況を示すものであ れば、症状特定部 122は、構成管理データベース 114を参照してそのサーバ装置で 実行されて!、るサービスの情報を取得し、そのサービスの状況を取得するように情報 補完部 125に依頼する。 [0081] In requesting supplementation of information, the countermeasure determining unit 123 refers to the configuration management database 114 as necessary to determine where to acquire information. For example, it indicates the status of the information power server device acquired by the information acquisition unit 121, and if the insufficient information indicates the status of the service executed by the server device, The symptom identification unit 122 refers to the configuration management database 114, acquires information on services executed on the server device, and requests the information complementing unit 125 to acquire the status of the services.
[0082] なお、不足している情報によっては、構成管理データベース 114を参照することに よって取得できる場合もあり、そのような場合は、対処策決定部 123自身が不足して いる情報を取得する。 [0082] Depending on the missing information, the information may be obtained by referring to the configuration management database 114. In such a case, the countermeasure determining unit 123 itself obtains the missing information. .
[0083] 対処策実行部 124は、対処策決定部 123において決定された対処策を実行する 処理部である。情報補完部 125は、症状特定部 122もしくは対処策決定部 123から 補完を依頼された情報を管理対象に問!、合せて、管理対象の状況を示す情報を能 動的に取得する処理部である。 The countermeasure execution unit 124 is a processing unit that executes the countermeasure determined by the countermeasure determination unit 123. The information complementing unit 125 is a processing unit that queries the management target for the information requested to be supplemented by the symptom specifying unit 122 or the countermeasure determining unit 123, and also dynamically acquires information indicating the status of the management target. is there.
[0084] 構成情報更新部 126は、対処策決定部 123によって決定された対処策によって管 理対象の構成が変更される場合に、その変更内容を構成管理データベース 114〖こ 反映させる処理部である。例えば、対処策決定部 123によって決定された対処策が 、サービスを実行するサーバ装置を追加するものであった場合、構成情報更新部 12 6は、新たにサービスを実行することになつたサーバ装置の用途の項目に、実行され るサービス名を設定する。 [0084] The configuration information update unit 126 is a processing unit that, when the configuration to be managed is changed by the countermeasure determined by the countermeasure determination unit 123, reflects the change contents in the configuration management database 114. . For example, when the countermeasure determined by the countermeasure determining unit 123 is to add a server device that executes a service, the configuration information updating unit 126 may newly execute the service. Set the name of the service to be executed in the use field.
[0085] このように、本実施例に係るシステム管理方法では、情報補完部 125が、不足して V、る情報を必要に応じて能動的に取得するので、情報取得部 121にお 、て取得され
た情報がわずかであっても、実際に生じている症状であるかを正しく絞り込み、適切 な対処を実施することができる。また、必要最小限の情報を用いて症状の特定と対処 策の決定をおこなうので、管理対象が増加した場合であっても、情報収集による大き な負荷が発生することはな 、。 As described above, in the system management method according to the present embodiment, the information complementing unit 125 actively acquires information that is insufficient and necessary as necessary. Acquired Even if there is only a small amount of information, it is possible to correctly narrow down whether the symptoms are actually occurring and take appropriate measures. In addition, since symptoms are identified and countermeasures are determined using the minimum necessary information, even if the number of management targets increases, there will be no significant load due to information collection.
[0086] 次に、図 2に示したシステム管理装置 100の処理手順について説明する。図 8は、 システム管理装置 100の処理手順を示すフローチャートである。同図は、システム管 理装置 100が管理対象力も通知イベントを受信した後の処理手順を示している。 Next, the processing procedure of the system management apparatus 100 shown in FIG. 2 will be described. FIG. 8 is a flowchart showing the processing procedure of the system management apparatus 100. This figure shows a processing procedure after the system management apparatus 100 receives a notification event of the management target power.
[0087] 同図に示すように、情報取得部 121が、受信した通知イベントから管理対象の状況 を示す情報を取得すると (ステップ S101)、症状特定部 122が、症状データベース 1 11のエントリをリードする (ステップ S 102)。ここで、全エントリをリード済みであれば( ステップ S103肯定)、管理対象に問題はないものとして処理を終了する。 [0087] As shown in the figure, when the information acquisition unit 121 acquires information indicating the status of the management target from the received notification event (step S101), the symptom identification unit 122 reads an entry in the symptom database 111. (Step S102). If all entries have been read (Yes at step S103), it is determined that there is no problem with the management target and the process ends.
[0088] 症状データベース 111から未処理のエントリをリードできた場合は (ステップ S 103 否定)、情報取得部 121にて取得された管理対象の情報が、リードしたエントリの適 用対象の状況を示すものであるかを検証する。そして、適用対象の状況を示すもの でない場合は(ステップ S104否定)、ステップ S102に戻って次のエントリの処理に移 行する。 [0088] When an unprocessed entry can be read from the symptom database 111 (No at step S103), the management target information acquired by the information acquisition unit 121 indicates the status of the target of application of the read entry. Verify whether it is a thing. If it does not indicate the status of the application target (No at Step S104), the process returns to Step S102 and proceeds to the processing of the next entry.
[0089] 情報取得部 121にて取得された管理対象の情報が、リードしたエントリの適用対象 の状況を示すものであれば (ステップ S 104肯定)、症状特定部 122は、情報取得部 121にて取得された管理対象の情報が、リードしたエントリの判定条件の全体もしく は一部と合致するかを検査する。そして、判定条件と全く合致しない場合は (ステップ S 105肯定)、ステップ S 102に戻って次のエントリの処理に移行する。 [0089] If the information of the management target acquired by the information acquisition unit 121 indicates the status of the application target of the read entry (Yes in step S104), the symptom specifying unit 122 sends the information to the information acquisition unit 121. Check whether the management target information obtained in this way matches all or part of the judgment criteria of the read entry. If the determination condition is not met at all (Yes at Step S105), the process returns to Step S102 and proceeds to the processing of the next entry.
[0090] 一方、情報取得部 121にて取得された管理対象の情報力 リードしたエントリの判 定条件の全体もしくは一部と合致する場合は (ステップ S 105否定)、判定条件が充 足されて!/、る力否かを判定するのに必要な情報が一部不足して 、れば (ステップ S 1 06肯定)、必要に応じて構成管理データベース 114から構成情報を取得して情報の 取得先を決定し (ステップ S107)、情報補完部 125に指示して、不足している情報を 能動的に取得させる (ステップ S 108)。 [0090] On the other hand, if the information ability of the management target acquired by the information acquisition unit 121 matches all or part of the judgment conditions of the lead entry (No at step S105), the judgment conditions are satisfied. If there is a shortage of information necessary to determine whether or not it is possible (Yes in step S 1 06), obtain configuration information from the configuration management database 114 as necessary to obtain information. The destination is determined (step S107), and the information supplement unit 125 is instructed to actively acquire the missing information (step S108).
[0091] そして、必要な情報が全て揃った状態で、判定条件が充足されることが確認されれ
ば (ステップ S109肯定)、症状特定部 122は、当該のエントリの症状名が示す症状が 管理対象に発生していると特定し (ステップ S 110)、システム管理装置 100は、後述 する対処策実行処理を実行する (ステップ S111)。 [0091] Then, it is confirmed that the determination condition is satisfied with all necessary information. If this is the case (Yes in step S109), the symptom identifying unit 122 identifies that the symptom indicated by the symptom name of the entry has occurred in the management target (step S110), and the system management apparatus 100 executes the countermeasure described later. Processing is executed (step S111).
[0092] また、必要な情報が全て揃った状態で、判定条件が充足されな!ヽことが確認されれ ば (ステップ S109否定)、症状特定部 122は、ステップ S102に戻って次のエントリの 処理に移行する。 [0092] If it is confirmed that all the necessary information has been collected and the determination condition is not satisfied (No in step S109), the symptom specifying unit 122 returns to step S102 and returns to the next entry. Transition to processing.
[0093] 図 9は、対処策実行処理の処理手順を示すフローチャートである。同図に示すよう に、対処策決定部 123は、症状特定部 122が特定した症状に対応する対処策デー タベース 112のエントリをリードし (ステップ S 201 )、有効度と副作用の値に基づ 、て 各対処策の優先度を計算する (ステップ S 202)。 FIG. 9 is a flowchart showing a processing procedure for countermeasure execution processing. As shown in the figure, the countermeasure determining unit 123 reads the entry of the countermeasure database 112 corresponding to the symptom identified by the symptom identifying unit 122 (step S 201), and based on the effectiveness and the side effect value. Then, the priority of each countermeasure is calculated (step S202).
[0094] そして、対処策決定部 123は、未選択の対処策の中で最も優先度の高い対処策を 一つ選択する (ステップ S 203)。ここで、全対処策を選択済みであれば (ステップ S2 04肯定)、有効な対処策はないものとして処理を終了する。 Then, the countermeasure determining unit 123 selects one countermeasure with the highest priority among unselected countermeasures (step S 203). Here, if all countermeasures have been selected (Yes in step S204), the process is terminated assuming that there is no effective countermeasure.
[0095] 未選択の対処策を選択できた場合は (ステップ S 204否定)、適用条件が充足され る力否かを判定するのに必要な情報が不足していれば (ステップ S205肯定)、必要 に応じて構成管理データベース 114から構成情報を取得して情報の取得先を決定し (ステップ S206)、情報補完部 125に指示して、不足している情報を能動的に取得さ せる(ステップ S 207)。 [0095] If an unselected countermeasure can be selected (No at Step S204), if there is not enough information (Yes at Step S205) to determine whether the applicable condition is satisfied or not, If necessary, obtain the configuration information from the configuration management database 114 to determine the information acquisition destination (step S206), and instruct the information complementing unit 125 to actively acquire the missing information (step S206). S 207).
[0096] そして、必要な情報が全て揃った状態で、適用条件が充足されることが確認されれ ば (ステップ S208肯定)、対処策決定部 123は、その対処策を対処策実行部 124〖こ 実行させ (ステップ S 209)、必要であれば構成情報更新部 126に構成管理データべ ース 114を更新させる (ステップ S210)。一方、適用条件が充足されないことが確認 されれば (ステップ S208否定)、対処策決定部 123は、ステップ S203に戻って次の 対処策の処理に移行する。 [0096] Then, when it is confirmed that the application condition is satisfied with all necessary information being prepared (Yes at Step S208), the countermeasure determining unit 123 sets the countermeasure to the countermeasure executing unit 124 〖. This is executed (step S209), and if necessary, the configuration information update unit 126 is made to update the configuration management database 114 (step S210). On the other hand, if it is confirmed that the application condition is not satisfied (No at Step S208), the countermeasure determining unit 123 returns to Step S203 and proceeds to the processing of the next countermeasure.
[0097] 次に、情報取得部 121が、図 7—1〜7— 3に示した各通知イベントを受信した場合 を例にして、システム管理装置 100の具体的な動作例について説明する。図 7—1に 示した通知イベントは、「サーバ A」と!、うホスト名をもつサーバ装置にぉ 、て CPU使 用率が 73%になっていることを示している。
[0098] 症状特定部 122は、この情報が示す症状を特定するため、症状データベース 111 のエントリをリードし、適用対象と判定条件を検査する。図 3に示した症状データべ一 ス 111の例の各エントリのうち、図 7—1に示した通知イベントが適用対象の条件に該 当するのは、全てのサーバ装置を適用対象とするエントリ番号「A001」のエントリと、 「サーバ A」という名前をもつ管理対象を適用対象とするエントリ番号「B001」のェント リである。 Next, a specific operation example of the system management apparatus 100 will be described by taking as an example the case where the information acquisition unit 121 receives each notification event shown in FIGS. 7-1 to 7-3. The notification event shown in Figure 7-1 indicates that the CPU usage is 73% for server devices with host names “Server A” and “!”. [0098] In order to identify the symptom indicated by this information, the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition. Among the entries in the example of the symptom data base 111 shown in Fig. 3, the notification event shown in Fig. 7-1 corresponds to the condition to be applied because it applies to all server devices. An entry with the number “A001” and an entry with the entry number “B001” that applies to the management target having the name “Server A”.
[0099] これらの 2つのエントリのうち、エントリ番号「A001」のエントリの判定条件は、 CPU 温度に関するものであり、通知された情報とは全く関係がないため、管理対象の症状 とは関係がないものと判断される。 [0099] Of these two entries, the determination condition of the entry with the entry number “A001” is related to the CPU temperature and has nothing to do with the notified information. Judged not to exist.
[0100] 一方、「B001」のエントリの判定条件は、 CPU使用率とサービス応答時間に関する ものであり、通知された情報は、 CPU使用率の条件を満たしている。そこで、症状特 定部 122は、情報補完部 125に指示して、不足している情報、具体的には、「サーバ Aj t ヽぅホスト名をもつサーバ装置で実行されて ヽるサービスの応答時間を取得させ る。このとき、症状特定部 122は、構成管理データベース 114を参照して、「サーバ A 」と 、うホスト名をもつサーバ装置で実行されて 、るどのサービスが実行されて 、るか を調べ、そのサービスを情報補完部 125に指定する。 On the other hand, the determination condition of the entry “B001” relates to the CPU usage rate and the service response time, and the notified information satisfies the CPU usage rate condition. Therefore, the symptom specifying unit 122 instructs the information complementing unit 125 to provide the missing information, specifically, “response of the service executed on the server device having the server Aj t ヽ ぅ host name”. At this time, referring to the configuration management database 114, the symptom specifying unit 122 is executed on the server device having the host name “Server A”, and the service is executed. This information is checked and the service is designated to the information complementing unit 125.
[0101] そして、不足している情報が取得されたならば、症状特定部 122は、エントリ番号「 B001」のエントリの判定条件が充足されるか否かを判定する。ここで、取得されたサ 一ビス応答時間が、例えば 2秒であれば、このエントリの判定条件が完全に充足され ることとなり、症状特定部 122は、「サーバ A」というホスト名をもつサーバ装置におい て、「サーバ A高負荷」 ヽぅ症状名に対応する症状が発生して!/ヽると特定する。 If the missing information is acquired, the symptom specifying unit 122 determines whether or not the determination condition for the entry with the entry number “B001” is satisfied. Here, if the acquired service response time is 2 seconds, for example, the determination condition of this entry is completely satisfied, and the symptom specifying unit 122 has a server having the host name “server A”. In the device, “Server A high load” 症状 A symptom corresponding to the symptom name has occurred! / Specify to speak.
[0102] こうして、症状が特定された後、対処策決定部 123が、対処策データベース 112を 参照して対処策を決定する。図 4に示した対処策データベース 112の例の症状名「 サーバ A高負荷」のエントリには、「サーバを追加する」 t 、う対処策のみが登録され、 適用条件の指定はないので、対処策決定部 123は、この対処策を実行することに決 定する。 [0102] After the symptom is identified in this manner, the countermeasure determining unit 123 refers to the countermeasure database 112 and determines the countermeasure. In the entry of the symptom name “Server A high load” in the example of the countermeasure database 112 shown in FIG. 4, “add server” t, only the countermeasure is registered, and no applicable condition is specified. The measure determining unit 123 determines to execute this countermeasure.
[0103] そして、対処策決定部 123は、「サーバを追加する」を対処策実行部 124に実行さ せ、追加されたサーバにてサービスが実行されていることを構成管理データベース 1
14に反映するように構成情報更新部 126に指示する。 Then, the countermeasure determining unit 123 causes the countermeasure executing unit 124 to execute “add server”, and the configuration management database 1 indicates that the service is being executed on the added server. The configuration information update unit 126 is instructed to be reflected in 14.
[0104] このように、本実施例に係るシステム管理方法では、種別単位ではなぐ個別のサ ーバ装置等に関する特異な症状にっ 、ての情報を症状データベース 111に登録し ておき、個々の装置等の仕様や利用形態に応じて適切な問題対処をおこなうことが できる。 As described above, in the system management method according to the present embodiment, information on the specific symptoms related to individual server devices, etc. that are not classified by type is registered in the symptom database 111, and individual information is registered. Appropriate problems can be dealt with according to the specifications of the equipment and usage.
[0105] 図 7— 2に示した通知イベントは、「サーバ B」というホスト名をもつサーバ装置にお V、て CPU使用率が 88%になって!/、ることを示して!/、る。 [0105] The notification event shown in Figure 7-2 shows that the server usage with the host name “Server B” is V, and the CPU usage rate is 88%! /,! /, The
[0106] 症状特定部 122は、この情報が示す症状を特定するため、症状データベース 111 のエントリをリードし、適用対象と判定条件を検査する。図 3に示した症状データべ一 ス 111の例の各エントリのうち、図 7— 2に示した通知イベントが適用対象の条件に該 当するのは、全てのサーバ装置を適用対象とするエントリ番号「A001」のエントリと、 「サーバ B」という名前をもつ管理対象を適用対象とするエントリ番号「B002」のェント リである。 [0106] In order to identify the symptom indicated by this information, the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition. Of the entries in the example of symptom data base 111 shown in Fig. 3, the notification event shown in Fig. 7-2 corresponds to the conditions to be applied. An entry with the number “A001” and an entry with the entry number “B002” to which the management target having the name “Server B” is applied.
[0107] これらの 2つのエントリのうち、エントリ番号「A001」のエントリの判定条件は、 CPU 温度に関するものであり、通知された情報とは全く関係がないため、管理対象の症状 とは関係がないものと判断される。 [0107] Of these two entries, the entry criteria "A001" entry judgment condition is related to CPU temperature and has nothing to do with the notified information. Judged not to exist.
[0108] 一方、「B002」のエントリの判定条件は、 CPU使用率に関するものであり、通知さ れた情報は、 CPU使用率の条件を満たしている。そこで、症状特定部 122は、この エントリが示すように、「サーバ B」というホスト名をもつサーバ装置において、「サーバ B高負荷」 ヽぅ症状名に対応する症状が発生して!/ヽると特定する。 On the other hand, the determination condition of the entry “B002” relates to the CPU usage rate, and the notified information satisfies the CPU usage rate condition. Therefore, as shown in this entry, the symptom identification unit 122 generates a symptom corresponding to the “server B high load” ヽ ぅ symptom name in the server device having the host name “server B”! / Specify to speak.
[0109] こうして、症状が特定された後、対処策決定部 123が、対処策データベース 112を 参照して対処策を決定する。図 4に示した対処策データベース 112の例の症状名「 サーバ B高負荷」のエントリには、 2つの対処策が登録されている。ここで、対処策「サ ーバを追加する」の有効度と副作用の和は 5であり、対処策「トランザクションを制限 する」の有効度と副作用の和は 6であるので、対処策決定部 123は、後者の対処策 の方が高 、優先度をもって 、ると判定する。 Thus, after the symptom is identified, the countermeasure determining unit 123 refers to the countermeasure database 112 and determines the countermeasure. In the entry of the symptom name “Server B high load” in the example of the countermeasure database 112 shown in FIG. 4, two countermeasures are registered. Here, the sum of the effectiveness and side effects of the countermeasure “Add server” is 5, and the sum of the effectiveness and side effects of the countermeasure “Restrict transactions” is 6. 123 determines that the latter countermeasure is higher and has priority.
[0110] そして、対処策決定部 123は、対処策「トランザクションを制限する」の適用条件の 判定をおこなう。ここで、適用条件は、サービスの性能要件が満たされていないことで
あるので、対処策決定部 123は、構成管理データベース 114を参照して、「サーバ B 」と 、うホスト名をもつサーバ装置にぉ 、て実行されて 、るサービスが「業務 Bサービ ス」であることを認識し、さらに、性能要件データベース 113を参照して、サービス「業 務 Bサービス」の性能要件を取得する。 Then, the countermeasure determining unit 123 determines the application condition of the countermeasure “limit the transaction”. Here, the application condition is that the performance requirements of the service are not met. Therefore, the countermeasure determining unit 123 refers to the configuration management database 114, and the service executed by the server device having the host name “Server B” is “Business B service”. Recognize that it exists, and refer to the performance requirement database 113 to obtain the performance requirement of the service “Operation B service”.
[0111] この結果、「サービス応答時間≤ 1秒」という性能要件が取得される力 サービス応 答時間は、通知イベントには含まれていないため、この情報を情報補完部 125に指 示して取得させる。そして、取得されたサービス応答時間が性能要件に満たないもの であれば、対処策決定部 123は、「トランザクションを制限する」を対処策として決定し 、これを対処策実行部 124に実行させる。 [0111] As a result, the performance requirement of “service response time ≤ 1 second” is acquired. Since the service response time is not included in the notification event, this information is acquired by instructing the information supplement unit 125. Let If the acquired service response time does not satisfy the performance requirement, the countermeasure determining unit 123 determines “limit the transaction” as a countermeasure and causes the countermeasure executing unit 124 to execute it.
[0112] このように、本実施例に係るシステム管理方法では、構成管理データベース 114や 性能要件データベース 113に登録されている情報を活用することにより、適切な対処 策を選択するためのルールを簡単に設定することができる。 [0112] As described above, in the system management method according to the present embodiment, by using the information registered in the configuration management database 114 and the performance requirement database 113, a rule for selecting an appropriate countermeasure can be simplified. Can be set to
[0113] 図 7— 3に示した通知イベントは、「業務 Cサービス」という名のサービスのサービス 応答時間が 2秒になっていることを示している。 [0113] The notification event shown in Figure 7-3 shows that the service response time of the service named “Business C service” is 2 seconds.
[0114] 症状特定部 122は、この情報が示す症状を特定するため、症状データベース 111 のエントリをリードし、適用対象と判定条件を検査する。図 3に示した症状データべ一 ス 111の例の各エントリのうち、図 7— 3に示した通知イベントが適用対象の条件に該 当するのは、全てのサービスを適用対象とするエントリ番号「C001」のエントリと、「業 務 Cサービス」という名前をもつ管理対象を適用対象とするエントリ番号「D001」のェ ントリである。 [0114] In order to identify the symptom indicated by this information, the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition. Among the entries in the example of the symptom data base 111 shown in Fig. 3, the notification event shown in Fig. 7-3 corresponds to the applicable conditions. It is an entry of “C001” and an entry number “D001” that applies to the management target having the name “Business C Service”.
[0115] これらの 2つのエントリのうち、エントリ番号「C001」のエントリの判定条件は、認証ェ ラーに関するものであり、通知された情報とは全く関係がないため、管理対象の症状 とは関係がないものと判断される。 [0115] Of these two entries, the entry criteria "C001" entry judgment condition is related to the authentication error and has nothing to do with the notified information. It is judged that there is no.
[0116] 一方、「D001」のエントリの判定条件は、サービス応答時間に関するものであり、通 知された情報は、サービス応答時間の条件を満たしている。そこで、症状特定部 122 は、「業務 Cサービス」という名のサービスにおいて、「業務 C高負荷」という症状名に 対応する症状が発生していると特定する。 On the other hand, the determination condition of the entry “D001” relates to the service response time, and the notified information satisfies the service response time condition. Therefore, the symptom identification unit 122 identifies that the symptom corresponding to the symptom name “Business C high load” is occurring in the service named “Business C service”.
[0117] こうして、症状が特定された後、対処策決定部 123が、対処策データベース 112を
参照して対処策を決定する。図 4に示した対処策データベース 112の例の症状名「 業務 C高負荷」のエントリには、「サーバを追加する」という対処策のみが登録され、適 用条件の指定はないので、対処策決定部 123は、この対処策を実行することに決定 する。 [0117] After the symptom is identified in this way, the countermeasure determining unit 123 stores the countermeasure database 112. Refer to to determine the countermeasure. In the entry for the symptom name “Business C high load” in the example of the countermeasure database 112 shown in FIG. 4, only the countermeasure “add server” is registered, and no application condition is specified. The decision unit 123 decides to execute this countermeasure.
[0118] そして、対処策決定部 123は、「サーバを追加する」を対処策実行部 124に実行さ せ、追加されたサーバにてサービスが実行されていることを構成管理データベース 1 14に反映するように構成情報更新部 126に指示する。 [0118] The countermeasure determining unit 123 then causes the countermeasure executing unit 124 to execute "add server" and reflect that the service is being executed on the added server in the configuration management database 1 14 The configuration information update unit 126 is instructed to do so.
[0119] このように、本実施例に係るシステム管理方法では、サーバ単位ではなぐサーバ 装置にお 、て実行されて 、るサービスの症状にっ 、ての情報を症状データベース 1 11に登録しておき、サービスに生じた問題に対して、適切な問題対処をおこなうこと ができる。 As described above, in the system management method according to the present embodiment, information is registered in the symptom database 111 in accordance with the symptom of the service executed on the server device that is not per server. In addition, appropriate problems can be dealt with for problems that occur in the service.
[0120] 上記の処理手順の説明においては、説明を簡単にするために、対処策を決定する 際に、対処策データベース 112の適用対象の区分と名称が、特定した症状の適用対 象の区分と名称と一致する力否かの検証をすることについての説明を省略している 力 異なる適用対象に同一の症状名が設定されることもありうるため、対処策決定時 に適用対象の区分と名称の一致の検証をおこなうことが好ましいのは言うまでもない [0120] In the description of the above processing procedure, for simplification of explanation, when the countermeasure is determined, the classification and name of the application target of the countermeasure database 112 are the classification of the application target of the identified symptom. The explanation of verifying whether or not the force matches the name is omitted. The same symptom name may be set for different application targets. It goes without saying that it is preferable to verify name matches.
[0121] なお、図 2に示した本実施例に係るシステム管理装置 100の構成は、本発明の要 旨を逸脱しない範囲で種々に変更することができる。例えば、システム管理装置 100 の制御部 120の機能をソフトウェアとして実装し、これをコンピュータで実行することに より、システム管理装置 100と同等の機能を実現することもできる。以下に、制御部 1 20の機能をソフトウエアとして実装したシステム管理プログラム 1071を実行するコン ピュータの一例を示す。 Note that the configuration of the system management apparatus 100 according to the present embodiment shown in FIG. 2 can be variously changed without departing from the gist of the present invention. For example, a function equivalent to that of the system management apparatus 100 can be realized by mounting the function of the control unit 120 of the system management apparatus 100 as software and executing the function by a computer. An example of a computer that executes a system management program 1071 in which the function of the control unit 120 is implemented as software is shown below.
[0122] 図 10は、システム管理プログラム 1071を実行するコンピュータ 1000を示す機能ブ ロック図である。このコンピュータ 1000は、各種演算処理を実行する CPU1010と、 ユーザ力ものデータの入力を受け付ける入力装置 1020と、各種情報を表示するモ ニタ 1030と、記録媒体力 プログラム等を読み取る媒体読取り装置 1040と、ネットヮ ークを介して他のコンピュータとの間でデータの授受をおこなうネットワークインターフ
エース装置 1050と、各種情報を一時記憶する RAM (Random Access Memory) 10 60と、ハードディスク装置 1070とをバス 1080で接続して構成される。 FIG. 10 is a functional block diagram showing the computer 1000 that executes the system management program 1071. The computer 1000 includes a CPU 1010 that executes various arithmetic processes, an input device 1020 that receives input of data from a user, a monitor 1030 that displays various information, a medium reader 1040 that reads a recording medium force program, and the like. A network interface that exchanges data with other computers via a network. An ace device 1050, a RAM (Random Access Memory) 1060 for temporarily storing various information, and a hard disk device 1070 are connected by a bus 1080.
[0123] そして、ハードディスク装置 1070には、図 2に示した制御部 120と同様の機能を有 するシステム管理プログラム 1071と、図 2に示した記憶部 110に記憶される各種デ ータベースに対応するシステム管理用データ 1072とが記憶される。なお、システム 管理用データ 1072を、適宜分散させ、ネットワークを介して接続された他のコンビュ ータに記憶させておくこともできる。 The hard disk device 1070 corresponds to a system management program 1071 having the same function as the control unit 120 shown in FIG. 2 and various databases stored in the storage unit 110 shown in FIG. System management data 1072 is stored. Note that the system management data 1072 can be appropriately distributed and stored in other computer connected via the network.
[0124] そして、 CPU1010がシステム管理プログラム 1071をハードディスク装置 1070から 読み出して RAM1060に展開することにより、システム管理プログラム 1071は、シス テム管理プロセス 1061として機能するようになる。そして、システム管理プロセス 106 1は、システム管理用データ 1072から読み出した情報等を適宜 RAM1060上の自 身に割り当てられた領域に展開し、この展開したデータ等に基づいて各種データ処 理を実行する。 Then, the CPU 1010 reads the system management program 1071 from the hard disk device 1070 and expands it in the RAM 1060, whereby the system management program 1071 functions as the system management process 1061. Then, the system management process 1061 appropriately expands information read from the system management data 1072 to an area allocated to itself on the RAM 1060, and executes various data processing based on the expanded data. .
[0125] なお、上記のシステム管理プログラム 1071は、必ずしもハードディスク装置 1070に 格納されている必要はなぐ CD— ROM等の記憶媒体に記憶されたこのプログラム を、コンピュータ 1000が読み出して実行するようにしてもよい。また、公衆回線、イン ターネット、 LAN (Local Area Network)、 WAN (Wide Area Network)等を介して コンピュータ 1000に接続される他のコンピュータ(またはサーノ)等にこのプログラム を記憶させておき、コンピュータ 1000がこれら力もプログラムを読み出して実行する ようにしてもよい。 Note that the system management program 1071 does not necessarily need to be stored in the hard disk device 1070 so that the computer 1000 reads out and executes this program stored in a storage medium such as a CD-ROM. Also good. The program is stored in another computer (or sano) connected to the computer 1000 via a public line, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), etc. The 1000 may also read and execute these programs.
産業上の利用可能性 Industrial applicability
[0126] 以上のように、本発明に係るシステム管理プログラム、システム管理装置およびシス テム管理方法は、管理対象に生じている症状を特定し、その症状に対応する対処策 を決定する処理を自律的におこなわせる場合に有用であり、特に、症状を特定する ための情報を、種別単位だけではなぐ個別の管理対象を指定して簡単に登録する ことが必要な場合に適して 、る。
[0126] As described above, the system management program, the system management apparatus, and the system management method according to the present invention autonomously identify a symptom occurring in a management target and determine a countermeasure corresponding to the symptom. This is especially useful when it is necessary to easily register information for identifying symptoms by specifying individual management targets rather than by type.
Claims
[1] 管理対象において発生している問題の症状を特定し、その症状を解消するための 対処策を決定するシステム管理プログラムであって、 [1] A system management program that identifies the symptoms of a problem that occurs in the management target and decides the corrective action to resolve the symptoms.
前記管理対象の状況を示す情報を取得する情報取得手順と、 An information acquisition procedure for acquiring information indicating the status of the management target;
前記情報取得手順によって取得された情報を、エントリごとに個別の管理対象もし くは管理対象の種別を適用対象として該適用対象にて発生し得る症状とその症状の 判定条件とが登録された症状データベースと照合することにより、前記管理対象に発 生して 、る症状を特定する症状特定手順と、 A symptom in which information acquired by the information acquisition procedure is registered for each entry, and the symptom that may occur in the application target and the criteria for determining the symptom are registered for each management target or the type of management target A symptom identification procedure for identifying a symptom occurring in the management object by collating with a database;
前記症状特定手順によって特定された症状を、前記管理対象にて発生し得る症状 と該症状を解消するための対処策とが対応付けて登録された対処策データベースと 照合することにより、前記管理対象に発生している症状を解消するための対処策を 決定する対処策決定手順と By comparing the symptom identified by the symptom identification procedure with a countermeasure database in which a symptom that may occur in the management target and a countermeasure for solving the symptom are associated and registered, the management target To determine the corrective action to resolve the symptoms
をコンピュータに実行させることを特徴とするシステム管理プログラム。 A system management program for causing a computer to execute.
[2] 前記情報取得手順によって取得された情報が不足しているために、該情報を前記 症状データベースと照合しても、前記症状特定手順が、前記管理対象に発生してい る症状を特定することができな 、場合に、不足して 、る情報を前記管理対象から取 得する情報補完手順をさらにコンピュータに実行させることを特徴とする請求項 1に 記載のシステム管理プログラム。 [2] Since the information acquired by the information acquisition procedure is insufficient, the symptom specifying procedure specifies the symptom occurring in the management target even if the information is collated with the symptom database. 2. The system management program according to claim 1, further causing a computer to execute an information supplement procedure for acquiring information that is insufficient from the management target in a case where the information cannot be obtained.
[3] 前記症状特定手順は、前記情報取得手順によって取得された情報が不足して!/、る ために、該情報を前記症状データベースと照合しても、前記管理対象に発生してい る症状を特定することができない場合に、前記管理対象の構成に関する情報が登録 された構成管理データベースを参照して、不足している情報の取得先を特定し、該 取得先を指定して前記情報補完手順に不足している情報を取得させることを特徴と する請求項 2に記載のシステム管理プログラム。 [3] Since the symptom identification procedure lacks information acquired by the information acquisition procedure! /, The symptom occurring in the management target even if the information is checked against the symptom database. When it is not possible to identify the information source, the configuration management database in which the information related to the configuration to be managed is registered is referred to. 3. The system management program according to claim 2, wherein information that is lacking in the procedure is acquired.
[4] 前記対処策決定手順は、前記症状特定手順によって特定された症状に対応する 対処策が前記対処策データベースに複数含まれている場合に、該対処策と対応付 けて登録されている適用条件が充足される対処策を、前記管理対象に発生している 症状を解消するための対処策として決定することを特徴とする請求項 2に記載のシス
テム管理プログラム。 [4] The countermeasure determining procedure is registered in association with the countermeasure when a plurality of countermeasures corresponding to the symptom identified by the symptom identifying procedure are included in the countermeasure database. The system according to claim 2, wherein a countermeasure that satisfies the application condition is determined as a countermeasure for solving the symptom occurring in the management target. System management program.
[5] 前記情報補完手順は、前記対処策と対応付けて登録されて!、る適用条件が充足さ れるか否かを判定するために必要な情報が不足している場合に、不足している情報 を前記管理対象力 取得することを特徴とする請求項 4に記載のシステム管理プログ ラム。 [5] The information supplement procedure is registered in association with the countermeasure, and is insufficient when information necessary to determine whether the applicable condition is satisfied is insufficient. 5. The system management program according to claim 4, wherein the management target power is acquired.
[6] 前記対処策決定手順は、前記対処策と対応付けて登録されて!、る適用条件が充 足される力否かを判定するために必要な情報が不足して 、る場合に、不足して 、る 情報の取得先を特定し、該取得先を指定して前記情報補完手順に不足して 、る情 報を取得させることを特徴とする請求項 5に記載のシステム管理プログラム。 [6] When the countermeasure determination procedure is registered in association with the countermeasure, and there is insufficient information necessary to determine whether or not the applicable condition is satisfied, 6. The system management program according to claim 5, wherein a shortage of information acquisition destination is specified, the acquisition destination is specified, and the information supplement procedure is insufficient to acquire the information.
[7] 前記症状特定手順は、前記情報取得手順によって取得された情報が不足して!/、る ために、該情報を前記症状データベースと照合しても、前記管理対象に発生してい る症状を特定することができない場合に、前記構成管理データベースを参照して、不 足して 、る情報取得することを特徴とする請求項 3に記載のシステム管理プログラム。 [7] Since the symptom identification procedure lacks information acquired by the information acquisition procedure! /, The symptom occurring in the management target even if the information is checked against the symptom database. 4. The system management program according to claim 3, wherein when the information cannot be specified, the system management program acquires a shortage of information by referring to the configuration management database.
[8] 管理対象において発生している問題の症状を特定し、その症状を解消するための 対処策を決定するシステム管理装置であって、 [8] A system management device that identifies the symptom of a problem that is occurring in the management target and decides a countermeasure to resolve the symptom.
前記管理対象の状況を示す情報を取得する情報取得手段と、 Information acquisition means for acquiring information indicating the status of the management target;
前記情報取得手段によって取得された情報を、エントリごとに個別の管理対象もし くは管理対象の種別を適用対象として該適用対象にて発生し得る症状とその症状の 判定条件とが登録された症状データベースと照合することにより、前記管理対象に発 生して!/、る症状を特定する症状特定手段と、 A symptom in which information acquired by the information acquisition means is registered for each entry, and the symptom that can occur in the application target and the criteria for determining the symptom are registered for each entry. A symptom identifying means for identifying a symptom occurring in the management target by collating with a database;
前記症状特定手段によって特定された症状を、前記管理対象にて発生し得る症状 と該症状を解消するための対処策とが対応付けて登録された対処策データベースと 照合することにより、前記管理対象に発生している症状を解消するための対処策を 決定する対処策決定手段と By comparing the symptom specified by the symptom specifying means with a countermeasure database in which a symptom that may occur in the management target and a countermeasure for solving the symptom are associated and registered, the management target Measures to determine measures to resolve the symptoms that occur
を備えたことを特徴とするシステム管理装置。 A system management apparatus comprising:
[9] 前記情報取得手段によって取得された情報が不足しているために、該情報を前記 症状データベースと照合しても、前記症状特定手段が、前記管理対象に発生してい る症状を特定することができな 、場合に、不足して 、る情報を前記管理対象から取
得する情報補完手段をさらに備えたことを特徴とする請求項 8に記載のシステム管理 装置。 [9] Since the information acquired by the information acquisition unit is insufficient, the symptom specifying unit specifies the symptom occurring in the management target even if the information is collated with the symptom database. If this is not possible, the information that is missing is collected from the management target. 9. The system management apparatus according to claim 8, further comprising information complementing means to obtain.
[10] 前記症状特定手段は、前記情報取得手段によって取得された情報が不足している ために、該情報を前記症状データベースと照合しても、前記管理対象に発生してい る症状を特定することができない場合に、前記管理対象の構成に関する情報が登録 された構成管理データベースを参照して、不足している情報の取得先を特定し、該 取得先を指定して前記情報補完手段に不足している情報を取得させることを特徴と する請求項 9に記載のシステム管理装置。 [10] Since the information acquired by the information acquisition unit is insufficient, the symptom specifying unit specifies a symptom occurring in the management target even if the information is checked against the symptom database. If it is not possible, refer to the configuration management database in which the information related to the configuration to be managed is registered, identify the acquisition source of the missing information, specify the acquisition destination, and the information supplement means is insufficient 10. The system management apparatus according to claim 9, wherein the information is acquired.
[11] 前記対処策決定手段は、前記症状特定手段によって特定された症状に対応する 対処策が前記対処策データベースに複数含まれている場合に、該対処策と対応付 けて登録されている適用条件が充足される対処策を、前記管理対象に発生している 症状を解消するための対処策として決定することを特徴とする請求項 9に記載のシス テム管理装置。 [11] The countermeasure determining means is registered in association with the countermeasure when a plurality of countermeasures corresponding to the symptom identified by the symptom identifying means are included in the countermeasure database. 10. The system management apparatus according to claim 9, wherein a countermeasure that satisfies an application condition is determined as a countermeasure for resolving a symptom occurring in the management target.
[12] 前記情報補完手段は、前記対処策と対応付けて登録されている適用条件が充足さ れるか否かを判定するために必要な情報が不足している場合に、不足している情報 を前記管理対象から取得することを特徴とする請求項 11に記載のシステム管理装置 [12] The information complementing means may provide information that is insufficient when information necessary for determining whether or not an application condition registered in association with the countermeasure is satisfied is insufficient. 12. The system management apparatus according to claim 11, wherein the system management device is acquired from the management target.
[13] 前記対処策決定手段は、前記対処策と対応付けて登録されている適用条件が充 足される力否かを判定するために必要な情報が不足して 、る場合に、不足して 、る 情報の取得先を特定し、該取得先を指定して前記情報補完手段に不足して 、る情 報を取得させることを特徴とする請求項 12に記載のシステム管理装置。 [13] The countermeasure determining means is insufficient when information necessary for determining whether or not the application condition registered in association with the countermeasure is satisfied is insufficient. 13. The system management apparatus according to claim 12, wherein the information acquisition destination is specified, the acquisition destination is specified, and the information supplement means is insufficient to acquire the information.
[14] 管理対象において発生している問題の症状を特定し、その症状を解消するための 対処策を決定するシステム管理方法であって、 [14] A system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for solving the symptom,
前記管理対象の状況を示す情報を取得する情報取得工程と、 An information acquisition step of acquiring information indicating the status of the management target;
前記情報取得工程によって取得された情報を、エントリごとに個別の管理対象もし くは管理対象の種別を適用対象として該適用対象にて発生し得る症状とその症状の 判定条件とが登録された症状データベースと照合することにより、前記管理対象に発 生して!/ヽる症状を特定する症状特定工程と、
前記症状特定工程によって特定された症状を、前記管理対象にて発生し得る症状 と該症状を解消するための対処策とが対応付けて登録された対処策データベースと 照合することにより、前記管理対象に発生している症状を解消するための対処策を 決定する対処策決定工程と Symptoms in which the information acquired by the information acquisition process is registered for each entry, and the symptom that can occur in the application target and the criteria for determining the symptom are registered for each entry. A symptom identification step for identifying the symptoms that occur in the management target! By comparing the symptom identified in the symptom identification step with a countermeasure database in which a symptom that may occur in the management target and a countermeasure for solving the symptom are associated and registered, the management target A countermeasure determination process to determine a countermeasure to resolve the symptoms
を含んだことを特徴とするシステム管理方法。 System management method characterized by including.
[15] 前記情報取得工程によって取得された情報が不足しているために、該情報を前記 症状データベースと照合しても、前記症状特定工程が、前記管理対象に発生してい る症状を特定することができな 、場合に、不足して 、る情報を前記管理対象から取 得する情報補完工程をさらに含んだことを特徴とする請求項 14に記載のシステム管 理方法。 [15] Since the information acquired by the information acquisition step is insufficient, the symptom specifying step specifies the symptom occurring in the management target even if the information is collated with the symptom database. 15. The system management method according to claim 14, further comprising an information supplementing step of acquiring information that is insufficient in some cases from the management target.
[16] 前記症状特定工程は、前記情報取得工程によって取得された情報が不足して!/、る ために、該情報を前記症状データベースと照合しても、前記管理対象に発生してい る症状を特定することができない場合に、前記管理対象の構成に関する情報が登録 された構成管理データベースを参照して、不足している情報の取得先を特定し、該 取得先を指定して前記情報補完工程に不足している情報を取得させることを特徴と する請求項 15に記載のシステム管理方法。 [16] In the symptom specifying step, because the information acquired by the information acquiring step is insufficient! /, The symptom occurring in the management target even if the information is checked against the symptom database. When it is not possible to identify the information source, the configuration management database in which the information related to the configuration to be managed is registered is referred to. 16. The system management method according to claim 15, wherein information lacking in the process is acquired.
[17] 前記対処策決定工程は、前記症状特定工程によって特定された症状に対応する 対処策が前記対処策データベースに複数含まれている場合に、該対処策と対応付 けて登録されている適用条件が充足される対処策を、前記管理対象に発生している 症状を解消するための対処策として決定することを特徴とする請求項 15に記載のシ ステム管理方法。 [17] The countermeasure determining step is registered in association with the countermeasure when the countermeasure database includes a plurality of countermeasures corresponding to the symptom identified by the symptom identifying process. 16. The system management method according to claim 15, wherein a countermeasure that satisfies an application condition is determined as a countermeasure for resolving a symptom occurring in the management target.
[18] 前記情報補完工程は、前記対処策と対応付けて登録されている適用条件が充足さ れるか否かを判定するために必要な情報が不足している場合に、不足している情報 を前記管理対象力も取得することを特徴とする請求項 17に記載のシステム管理方法 [18] The information complementing step is performed when the information necessary for determining whether or not the application condition registered in association with the countermeasure is satisfied is insufficient. The system management method according to claim 17, wherein the management target power is also acquired.
[19] 前記対処策決定工程は、前記対処策と対応付けて登録されている適用条件が充 足される力否かを判定するために必要な情報が不足して 、る場合に、不足して 、る 情報の取得先を特定し、該取得先を指定して前記情報補完工程に不足して 、る情
報を取得させることを特徴とする請求項 18に記載のシステム管理方法。
[19] The countermeasure determining step is insufficient when information necessary for determining whether or not the application condition registered in association with the countermeasure is satisfied is insufficient. The information acquisition destination is specified, the acquisition destination is specified, and the information supplement process is insufficient. 19. The system management method according to claim 18, wherein information is acquired.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2006/314107 WO2008007442A1 (en) | 2006-07-14 | 2006-07-14 | System management program, system management device and system management method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2006/314107 WO2008007442A1 (en) | 2006-07-14 | 2006-07-14 | System management program, system management device and system management method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008007442A1 true WO2008007442A1 (en) | 2008-01-17 |
Family
ID=38923014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/314107 WO2008007442A1 (en) | 2006-07-14 | 2006-07-14 | System management program, system management device and system management method |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008007442A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012190378A (en) * | 2011-03-14 | 2012-10-04 | Kddi Corp | Server system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0239336A (en) * | 1988-07-29 | 1990-02-08 | Nippon Telegr & Teleph Corp <Ntt> | Information collecting system |
JPH02159636A (en) * | 1988-12-13 | 1990-06-19 | Nec Corp | Network fault diagnosis method |
JPH08179949A (en) * | 1994-12-27 | 1996-07-12 | Nec Corp | Expert system |
JPH1049219A (en) * | 1996-08-02 | 1998-02-20 | Mitsubishi Electric Corp | Failure avoidance device |
-
2006
- 2006-07-14 WO PCT/JP2006/314107 patent/WO2008007442A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0239336A (en) * | 1988-07-29 | 1990-02-08 | Nippon Telegr & Teleph Corp <Ntt> | Information collecting system |
JPH02159636A (en) * | 1988-12-13 | 1990-06-19 | Nec Corp | Network fault diagnosis method |
JPH08179949A (en) * | 1994-12-27 | 1996-07-12 | Nec Corp | Expert system |
JPH1049219A (en) * | 1996-08-02 | 1998-02-20 | Mitsubishi Electric Corp | Failure avoidance device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012190378A (en) * | 2011-03-14 | 2012-10-04 | Kddi Corp | Server system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7840517B2 (en) | Performance evaluating apparatus, method, and computer-readable medium | |
US11269718B1 (en) | Root cause detection and corrective action diagnosis system | |
JP4760491B2 (en) | Event processing system, event processing method, event processing apparatus, and event processing program | |
US8255355B2 (en) | Adaptive method and system with automatic scanner installation | |
JP4983795B2 (en) | System management program, system management apparatus, and system management method | |
EP2523115B1 (en) | Operation management device, operation management method, and program storage medium | |
JP4964220B2 (en) | Realization of security level in virtual machine failover | |
JP5422342B2 (en) | Incident management method and operation management server | |
US20090100419A1 (en) | Method for determining priority for installing a patch into multiple patch recipients of a network | |
US8171060B2 (en) | Storage system and method for operating storage system | |
US20090106844A1 (en) | System and method for vulnerability assessment of network based on business model | |
CN101535978A (en) | Message Forwarding Backup Manager in Distributed Server System | |
JP2004302937A (en) | Program arrangement method, its execution system, and its processing program | |
CN110032576B (en) | Service processing method and device | |
US7627662B2 (en) | Transaction request processing system and method | |
US20070174708A1 (en) | Method for controlling a policy | |
CN119211174B (en) | HTTP short message gateway processing method, device, equipment and storage medium | |
CN109412838A (en) | Server cluster host node selection method based on hash calculating and Performance Evaluation | |
US8370800B2 (en) | Determining application distribution based on application state tracking information | |
US20250284553A1 (en) | Priority-Based Load Shedding for Computing Systems | |
US20100287016A1 (en) | Method of monitoring a combined workflow with rejection determination function, device and recording medium therefor | |
CN118606089B (en) | Smart contract group operation and maintenance information management method and system based on blockchain | |
KR102188987B1 (en) | Operation method of cloud computing system for zero client device using cloud server having device for managing server and local server | |
WO2008007442A1 (en) | System management program, system management device and system management method | |
JP6916096B2 (en) | Instance utilization promotion system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 06781132 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
NENP | Non-entry into the national phase |
Ref country code: RU |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06781132 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |