Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of method of monitoring equipment fault and dresses
It sets.The technical solution is as follows:
In a first aspect, providing a kind of method of monitoring equipment fault, which comprises
Every the monitoring sleep time of target critical index, fitted described in the master tool monitoring for originally including by tool set
Target critical index;
If exception occurs in the target critical index, this detection is fitted currently with the presence or absence of event by the tool set
Barrier, otherwise adjusts the monitoring sleep time of the target critical index based on the first preset duration;
If there is currently failure, when adjusting the monitoring suspend mode of the target critical index based on the second preset duration
It is long, and fitted by the tool set and this determination and report the fault message of current failure;
If there is currently no failure, when adjusting the monitoring suspend mode of the target critical index based on third preset duration
It is long, wherein second preset duration is greater than first preset duration, and first preset duration is default greater than the third
Duration.
Optionally, the monitoring sleep time that the target critical index is otherwise adjusted based on the first preset duration, packet
It includes:
Otherwise the continuous normal number of the target critical index is counted, and by the monitoring suspend mode of the target critical index
Duration is adjusted to the product of the continuous normal number and the first preset duration.
Optionally, if it is described there is currently no failure, the target critical index is adjusted based on third preset duration
Monitoring sleep time, comprising:
If counting the continuous fault-free after continuously monitoring the target critical Indexes Abnormality there is currently no failure
Number, and the monitoring sleep time of the target critical index is adjusted to the continuous fault-free number and third preset duration
Product.
Optionally, it is described fitted by the tool set this determination and report the fault message of current failure, comprising:
Fitted the fault message of this determination current failure by the tool set, by repeating in short-term time for the current failure
Number plus one;
When the number of repetition in short-term, which is equal to the corresponding failure of the current failure, reports threshold value, the current event is reported
The fault message of barrier, and threshold value is reported by the failure that preset rules increase the current failure.
Optionally, described to be fitted the fault message of this determination current failure by the tool set, by the current failure
Number of repetition in short-term add one, comprising:
Being fitted by the tool set, this selects current event in the corresponding preset failure reason of the target critical index
The failure cause of barrier, and determine the fault signature of the failure cause;
If local record has the failure cause, and the fault signature of the failure cause locally recorded and this determination
The similarity of fault signature is greater than preset threshold, then the number of repetition in short-term of the failure cause of local record is added one, no
Then the record failure cause that this is determined and fault signature, and set the number of repetition in short-term of the failure cause to
One.
Optionally, the failure cause is recorded in the form of chained list, wherein the chained list includes multiple nodes, often
A corresponding key index of the node, each key index respectively correspond one or more child list, every subchain
Table includes multiple for recording the linked list head of failure cause, and each linked list head corresponds to multiple child nodes, the multiple sub- section
Fault signature, in short-term number of repetition and the failure that point is respectively used to store the failure cause report threshold value.
Optionally, the key index include at least CPU usage, memory usage, load value, I/O waiting time and
It is one or more in the CPU usage of each process.
Second aspect, provides a kind of device of monitoring equipment fault, and described device includes:
Monitoring module, for the monitoring sleep time every target critical index, fitted the base for originally including by tool set
Plinth tool monitors the target critical index;
Module is adjusted, if occurring for the target critical index abnormal, is fitted this detection by the tool set
It currently whether there is failure, the monitoring sleep time of the target critical index otherwise adjusted based on the first preset duration, if
There is currently failures, then the monitoring sleep time of the target critical index are adjusted based on the second preset duration, and by described
Tool set, which fits, this determination and reports the fault message of current failure, if there is currently no failure, when being preset based on third
The long monitoring sleep time for adjusting the target critical index;
Wherein, second preset duration is greater than first preset duration, and first preset duration is greater than described the
Three preset durations.
Optionally, the adjustment module, is specifically used for:
Otherwise the continuous normal number of the target critical index is counted, and by the monitoring suspend mode of the target critical index
Duration is adjusted to the product of the continuous normal number and the first preset duration.
Optionally, the adjustment module, is specifically used for:
If counting the continuous fault-free after continuously monitoring the target critical Indexes Abnormality there is currently no failure
Number, and the monitoring sleep time of the target critical index is adjusted to the continuous fault-free number and third preset duration
Product.
Optionally, the adjustment module, is specifically used for:
Fitted the fault message of this determination current failure by the tool set, by repeating in short-term time for the current failure
Number plus one;
When the number of repetition in short-term, which is equal to the corresponding failure of the current failure, reports threshold value, the current event is reported
The fault message of barrier, and threshold value is reported by the failure that preset rules increase the current failure.
Optionally, the adjustment module, is specifically used for:
Being fitted by the tool set, this selects current event in the corresponding preset failure reason of the target critical index
The failure cause of barrier, and determine the fault signature of the failure cause;
If local record has the failure cause, and the fault signature of the failure cause locally recorded and this determination
The similarity of fault signature is greater than preset threshold, then the number of repetition in short-term of the failure cause of local record is added one, no
Then the record failure cause that this is determined and fault signature, and set the number of repetition in short-term of the failure cause to
One.
Optionally, the failure cause is recorded in the form of chained list, wherein the chained list includes multiple nodes, often
A corresponding key index of the node, each key index respectively correspond one or more child list, every subchain
Table includes multiple for recording the linked list head of failure cause, and each linked list head corresponds to multiple child nodes, the multiple sub- section
Fault signature, in short-term number of repetition and the failure that point is respectively used to store the failure cause report threshold value.
Optionally, the key index include at least CPU usage, memory usage, load value, I/O waiting time and
It is one or more in the CPU usage of each process.
The third aspect provides a kind of equipment, and the equipment includes processor and memory, is stored in the memory
At least one instruction, at least one section of program, code set or instruction set, at least one instruction, at least one section of program, institute
State the side for the monitoring equipment fault that code set or instruction set are loaded by the processor and executed with realization as described in relation to the first aspect
Method.
Fourth aspect provides a kind of computer readable storage medium, at least one finger is stored in the storage medium
Enable, at least one section of program, code set or instruction set, at least one instruction, at least one section of program, the code set or
The method that instruction set is loaded by processor and executed to realize monitoring equipment fault as described in relation to the first aspect.
Technical solution provided in an embodiment of the present invention has the benefit that
In the embodiment of the present invention, every the monitoring sleep time of target critical index, being fitted by tool set originally includes
Master tool monitoring objective key index;If exception occurs in target critical index, this detection is fitted currently by tool set
Monitoring sleep time with the presence or absence of failure, otherwise based on the first preset duration adjustment target critical index;If there is currently
Failure, then the monitoring sleep time based on the second preset duration adjustment target critical index, and fitted this determination by tool set
And report the fault message of current failure;If referred to there is currently no failure based on third preset duration adjustment target critical
Target monitors sleep time, wherein the second preset duration is greater than the first preset duration, when the first preset duration is preset greater than third
It is long.In this way, fitting this when using tool set, different key indexes is arranged different monitoring sleep times, multiple keys
The monitoring processing of index is independent of each other, and based on different monitored results, is pointedly arranged and adjusts the different monitoring of length
Sleep time not only can repeatedly report to avoid the frequent monitoring to key index and to the frequent of same failure, but also can be compared with
For discovering device failure in time.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
The embodiment of the invention provides a kind of method of monitoring equipment fault, the executing subject of this method, which can be, has journey
The arbitrary equipment of sort run function can be server either terminal.Equipment can be loaded and be run in the technology that has powerful connections and mention
And tool set fit this, fitted by the tool set and originally can use different monitoring tools monitoring device from different perspectives
Operating status, so as to the hardware or software failure of the generation in timely discovering device operational process.Equipment may include place
Device, memory, transceiver are managed, processor can be used for carrying out the processing in following processes for monitoring equipment fault, memory
Can be used for storing the data of the data and generation that need in treatment process, such as store tool set fit this, recording equipment fortune
Row parameter etc., transceiver can be used for sending and receiving the related data in treatment process, such as receiving the finger of user's input
It enables, the fault message etc. of reporting equipment failure.Equipment can support multiple processes while run, different degrees of when process is run
It occupies the process resource of equipment CPU, using certain memory headroom, and generates magnetic disc i/o.
Below in conjunction with specific embodiment, process flow shown in FIG. 1 is described in detail, content can be as
Under:
Step 101, every the monitoring sleep time of target critical index, fitted the master tool for originally including by tool set
Monitoring objective key index.
In an implementation, after technical staff is mounted with that tool set fits originally in equipment, equipment can load and run this
Tool set fits this, and later, the equipment master tool for originally including that can be fitted by tool set monitors multiple key indexes.This
Place, key index can be it is preset, by multiple key indexes can it is relatively simple, in time on discovering device whether
It breaks down, and is directed to each key index, too small amount of key index that is able to reflect can be led to the presence or absence of exception information
Master tool monitored in real time, in this way, executing a small amount of master tool monitors key index, the equipment process resource of consumption
It is less, equipment performance is had an impact smaller.And for each key index, the monitoring that can individually set the key index is stopped
Dormancy duration, i.e., every monitoring sleep time, equipment can fit the master tool for originally including to corresponding key by tool set
Index is once monitored.Further, the monitoring sleep time of different key indexes can be different, correspondingly, different crucial
The monitoring moment of index can also be different.In this way, by taking target critical index as an example, after running tool set script,
Equipment can be every the monitoring sleep time of target critical index, and fitted the master tool monitoring objective for originally including by tool set
Key index.
Optionally, above-mentioned key index include at least CPU usage, memory usage, load value, I/O waiting time and
It is one or more in the CPU usage of each process.It is understood that in other embodiments, before key index is not limited to
State the these types enumerated.
In an implementation, CPU usage, memory usage, load value, the CPU of I/O waiting time and each process can be chosen
This five indices of utilization rate are as key index.Pointedly, for CPU usage, the progress of " mpstat " tool can be used
Detection;For memory usage, can be detected by checking " used " and " free " field of " free-m ";For load
Value, can the Load field in 1 minute by checking "/proc/load avg " file detected;When being waited for I/O
It is long, " mpstat " tool can be used and detected;For the CPU usage of each process, " top " tool can be used and examined
It surveys.
Step 102, if exception occurs in target critical index, this detection is fitted currently with the presence or absence of event by tool set
Barrier, the monitoring sleep time otherwise based on the first preset duration adjustment target critical index.
In an implementation, equipment is when fitting the master tool monitoring objective key index in originally by tool set, Ke Yitong
The mode for crossing threshold determination is tested according to some empirical datas used in routine analysis, judges the target monitored
Whether key index there is exception, and so as to judge whether it is necessary to triggering following processing, specific processing refers to Fig. 2 institute
Show.And if it find that target critical index occurs abnormal, equipment can be currently further then by tool set this detection of fitting
It is no there are failure, be preset with for different key indexes number when its exception in this respectively specifically, tool set fits
According to sampling instrument, equipment can first sampling instrument can collect set relevant to target critical Indexes Abnormality based on these data
Then standby operating parameter is further confirmed that currently by these equipment operating parameters with the presence or absence of failure.And if target critical
Index does not occur exception, then the monitoring sleep time of target critical index can be adjusted based on the first preset duration.
Optionally, if certain key index continuously detects normally, the monitoring suspend mode of the key index can be appropriately extended
Duration, correspondingly, the part processing of step 102 can be such that the continuous normal number for otherwise counting target critical index, and will
The monitoring sleep time of target critical index is adjusted to the product of continuous normal number and the first preset duration.
In an implementation, after being monitored to target critical index, if it find that target critical index does not occur exception,
Equipment can then count the continuous normal number of target critical index, then adjust the monitoring sleep time of target critical index
For the product of above-mentioned continuous normal number and the first preset duration.As an example it is assumed that the first preset duration is 1min, if on
Target critical index is abnormal in primary monitoring, and target critical index is normal when this monitoring, then continuous normal number is
1, the monitoring sleep time of target critical index is then adjusted to 1*1min;If target critical index is in the monitoring of preceding n times
Normally, target critical index is also normal when and this is monitored, then continuous normal number is N+1, and the monitoring of target critical index is stopped
Dormancy duration is then adjusted to (N+1) * 1min.Furthermore, it is possible to set target critical index monitoring sleep time maximum value, i.e., without
Why it is worth by continuous normal number, the monitoring sleep time of target critical index does not exceed the maximum value, this way it is possible to avoid
When continuous normal number value is larger, i.e., when target critical index is chronically at normal condition, the monitoring of target critical index is stopped
Dormancy duration is excessive, and the case where can not be monitored in time after target critical Indexes Abnormality.
Step 103, if there is currently failure, the monitoring suspend mode based on the second preset duration adjustment target critical index
Duration, and fitted by tool set and this determination and report the fault message of current failure.
In an implementation, if in a step 102 by the confirmation of equipment operating parameter there is currently failure, equipment if, can be with base
In the monitoring sleep time of the second preset duration adjustment target critical index.Meanwhile equipment can also fit this by tool set
Determine the fault message of simultaneously reporting equipment current failure.Herein, technical staff can the various failures that are likely to occur of pre- measurement equipment,
And the parameter attribute of equipment operating parameter when each failure occurs for recording equipment, it later can be by parameter attribute and fault message pair
Tool set should be written to fit in this source code, in this way, equipment can be according to above-mentioned interior after collecting equipment operating parameter
Hold, determines the corresponding fault message of equipment operating parameter acquired.
Optionally, if repeated detection has arrived same failure in a short time, corresponding event can intermittently be reported
Hinder information, therefore, the processing of the part of step 103 can be such that is fitted the fault message of this determination current failure by tool set,
The number of repetition in short-term of current failure is added one;When number of repetition reports threshold value equal to the corresponding failure of current failure in short-term,
The fault message of current failure is reported, and reports threshold value by the failure that preset rules increase current failure.
In an implementation, equipment can be fitted the fault message of this determination current failure by tool set, and by current failure
Number of repetition in short-term add one, it is readily appreciated that, number of repetition reflects equipment in a short time and repeats to detect the failure in short-term
Number.Later, the failure corresponding with current failure of number of repetition in short-term that equipment can compare after adding one reports the big of threshold value
Small, if number of repetition is equal to the corresponding failure of current failure and reports threshold value in short-term, equipment if, can report the event of current failure
Hinder information, while reporting threshold value according to the failure that preset rules increase current failure, otherwise reports place without fault message
Reason.As an example it is assumed that it is the increase with 3 for index that failure, which reports the regular i.e. preset rules of the increase of threshold value, then in failure
Report threshold value is then followed successively by 1,3,9,27 ..., represents in conjunction with number of repetition in short-term: when determining the fault message of the failure for the first time into
Row reports, and second does not report when determining the fault message of the failure, and third time reports when determining, does not report for the 4th time ... until
It reports again for 9th time, it is subsequent and so on.
Optionally, it above-mentioned determining fault message and updates the processing of number of repetition in short-term and specifically can be such that and pass through tool
Set script selects the failure cause of current failure in the corresponding preset failure reason of target critical index, and determines that failure is former
The fault signature of cause;If the fault signature for the failure cause for locally recording faulty reason, and locally recording and this determination
Fault signature similarity be greater than preset threshold, then the number of repetition in short-term of the failure cause of local record is added one, otherwise
The failure cause and fault signature of this determination are recorded, and sets one for the number of repetition in short-term of failure cause.
In an implementation, during determining the fault message of current failure, equipment can be fitted by tool set and originally be existed
The failure cause of current failure is selected in the corresponding preset failure reason of target critical index, and determines that the failure of failure cause is special
Sign.By key index be CPU usage, load value, the CPU usage of each process and for I/O waiting time, it is specific default
Failure cause and the method for determination of fault signature can refer to following table 1.Later, equipment may determine that locally whether recorded phase
Same failure cause can determine fault signature and this event determined of the failure cause of local record if record has
Hinder the similarity of feature, for example, fault signature there are 4, wherein only a fault signature is consistent, then similarity is 1/4.It
Afterwards, if similarity is greater than preset threshold, the number of repetition in short-term of the failure cause of local record can be added one by equipment if.And
Failure cause is not recorded or above-mentioned similarity is less than preset threshold if local, and equipment if can recorde the event of this determination
Hinder reason and fault signature, and sets one for the number of repetition in short-term of failure cause.It is noted that the event of local record
Barrier reason has certain storage duration, and after storing duration, equipment will be automatically deleted corresponding failure cause and event
Hinder feature.
Table 1
Optionally, above-mentioned failure cause is recorded in the form of chained list, wherein chained list includes multiple nodes, Mei Gejie
The corresponding key index of point, each key index respectively correspond one or more child list, and every child list includes multiple use
In the linked list head of record failure cause, each linked list head corresponds to multiple child nodes, and it is former that multiple child nodes are respectively used to storage failure
The fault signature of cause, in short-term number of repetition and failure report threshold value.
In an implementation, it is contemplated that in the data structure of programming, chained list is convenient for data traversal, while chained list shape
Formula is easy to extend (i.e. in chained list can unlimited nested child list), and chained list has stronger data type compatibility,
It can store the data under arbitrary data types, so above-mentioned failure cause can be recorded in the form of chained list.Equally to close
Key index is CPU usage, load value, the CPU usage of each process and for I/O waiting time, and chained list is as shown in figure 3, chain
Table trunk portion is made of, respectively CPU, LOAD, PROCESS, IO four nodes, and each key index is corresponding with one extremely
A plurality of child list, the child list of CPU branch include the preset failure reason linked list head equal in number with CPU usage exception;
The child list of LOAD branch can be divided into using disk (SDA, SDB ...), process (PROCESS_A, PROCESS_B ...), CPU
(CPU0, CPU1 ...) three child lists, wherein the child list of LOAD- disk includes that number of disks corresponding with equipment is equal
Linked list head, the child list of LOAD- process include N number of linked list head, and the child list of LOAD-CPU includes logic CPU corresponding with equipment
The equal linked list head of quantity;The child list of PROCESS branch includes N number of linked list head;The child list of IO branch include and equipment pair
The equal linked list head of the number of disks answered.Above-mentioned each linked list head can correspond to multiple events for being respectively used to storage failure cause
Barrier feature, in short-term number of repetition and failure report the child node of threshold value.
Step 104, if there is currently no failure, the monitoring based on third preset duration adjustment target critical index is stopped
Dormancy duration.
Wherein, the second preset duration is greater than the first preset duration, and the first preset duration is greater than third preset duration.
It in an implementation, can be with if in a step 102 by the confirmation of equipment operating parameter there is currently no failure, if equipment
Monitoring sleep time based on third preset duration adjustment target critical index.It should be noted that the second preset duration is greater than
First preset duration, the first preset duration are greater than third preset duration.It is appreciated that first, due to before failover, equipment
Failure can generally have certain time, and corresponding key index will also be continuously in exception, so, detecting that target critical refers to
Mark is abnormal, and successfully, in order to avoid frequently repeatedly detecting same failure, can control after the fault message of determining current failure
Interval longer period of time is again monitored target critical index, therefore selection is adjusted based on longer second preset duration
The monitoring sleep time of target critical index;Second, the probability is relatively small for the device fails under in operating status, equipment
Most of the time is at normal condition, so without frequently being monitored to key index, while in order to which equipment is going out
Can be detected in time after existing failure, the supervision interval of key index is not answered yet it is too long, so if monitoring target critical
Index is normal, then selects the first preset duration of moderate length to adjust the monitoring sleep time of target critical index;Third, right
The monitoring of key index primarily serves fault pre-alarming function, and when finding target critical Indexes Abnormality, equipment has greatly may be
There is failure, and if further detection fails to find that failure, very possible failure are in the initial stage, Yi Xieshe
Standby operating parameter is also not affected by influence, it is also possible to be therefore the other reasons such as target critical index Temporal fluctuations are set this
In the case that standby state can not determine, need in a short time to monitor target critical index again, that is, need selection compared with
Short third preset duration adjusts the monitoring sleep time of target critical index.
Optionally, if certain key index continuously detects exception, and be not further discovered that failure every time, then it can be appropriate
Extend the monitoring sleep time of the key index, correspondingly, the processing of step 104 can be such that if there is currently no failure,
Then statistics continuously monitors the continuous fault-free number after target critical Indexes Abnormality, and by the monitoring suspend mode of target critical index
Duration is adjusted to the product of continuous fault-free number and third preset duration.
In an implementation, if it find that exception occurs in target critical index, but failure is not found in further detection process,
Equipment, which can then count, continuously monitors the continuous fault-free number after target critical Indexes Abnormality, then by target critical index
Monitoring sleep time be adjusted to the product of above-mentioned continuous fault-free number and third preset duration.As an example it is assumed that third
Preset duration is 10s, if target critical index is normal in last monitoring, or target critical in last monitoring
Indexes Abnormality, and confirmed equipment fault in further detection process, and target critical Indexes Abnormality when this monitoring, but not
It was found that failure, then continuous fault-free number is 1, and the monitoring sleep time of target critical index is then adjusted to 1*10s;If preceding N
Target critical index is exception in secondary monitoring, and does not find failure in further detection, while mesh when this monitoring
It is also abnormal to mark key index, does not further also find failure in detection, then continuous fault-free number is N+1, target critical index
Monitoring sleep time be then adjusted to (N+1) * 10s.Furthermore, it is possible to set the maximum of the monitoring sleep time of target critical index
Value, i.e., no matter why continuous fault-free number is worth, and the monitoring sleep time of target critical index does not exceed the maximum value.
In the embodiment of the present invention, every the monitoring sleep time of target critical index, being fitted by tool set originally includes
Master tool monitoring objective key index;If exception occurs in target critical index, this detection is fitted currently by tool set
Monitoring sleep time with the presence or absence of failure, otherwise based on the first preset duration adjustment target critical index;If there is currently
Failure, then the monitoring sleep time based on the second preset duration adjustment target critical index, and fitted this determination by tool set
And report the fault message of current failure;If referred to there is currently no failure based on third preset duration adjustment target critical
Target monitors sleep time, wherein the second preset duration is greater than the first preset duration, when the first preset duration is preset greater than third
It is long.In this way, fitting this when using tool set, different key indexes is arranged different monitoring sleep times, multiple keys
The monitoring processing of index is independent of each other, and based on different monitored results, is pointedly arranged and adjusts the different monitoring of length
Sleep time not only can repeatedly report to avoid the frequent monitoring to key index and to the frequent of same failure, but also can be compared with
For discovering device failure in time.
Based on the same technical idea, the embodiment of the invention also provides a kind of devices of monitoring equipment fault, such as Fig. 4 institute
Show, described device includes:
Monitoring module 401, for the monitoring sleep time every target critical index, being fitted by tool set originally includes
Master tool monitors the target critical index;
Module 402 is adjusted, if occurring for the target critical index abnormal, is fitted this inspection by the tool set
It surveys and currently whether there is failure, the monitoring sleep time of the target critical index is otherwise adjusted based on the first preset duration, such as
There is currently failures for fruit, then the monitoring sleep time of the target critical index is adjusted based on the second preset duration, and pass through institute
It states tool set and fits and this determination and report the fault message of current failure, if there is currently no failure, it is default based on third
Duration adjusts the monitoring sleep time of the target critical index;
Wherein, second preset duration is greater than first preset duration, and first preset duration is greater than described the
Three preset durations.
Optionally, the adjustment module 402, is specifically used for:
Otherwise the continuous normal number of the target critical index is counted, and by the monitoring suspend mode of the target critical index
Duration is adjusted to the product of the continuous normal number and the first preset duration.
Optionally, the adjustment module 402, is specifically used for:
If counting the continuous fault-free after continuously monitoring the target critical Indexes Abnormality there is currently no failure
Number, and the monitoring sleep time of the target critical index is adjusted to the continuous fault-free number and third preset duration
Product.
Optionally, the adjustment module 402, is specifically used for:
Fitted the fault message of this determination current failure by the tool set, by repeating in short-term time for the current failure
Number plus one;
When the number of repetition in short-term, which is equal to the corresponding failure of the current failure, reports threshold value, the current event is reported
The fault message of barrier, and threshold value is reported by the failure that preset rules increase the current failure.
Optionally, the adjustment module 402, is specifically used for:
Being fitted by the tool set, this selects current event in the corresponding preset failure reason of the target critical index
The failure cause of barrier, and determine the fault signature of the failure cause;
If local record has the failure cause, and the fault signature of the failure cause locally recorded and this determination
The similarity of fault signature is greater than preset threshold, then the number of repetition in short-term of the failure cause of local record is added one, no
Then the record failure cause that this is determined and fault signature, and set the number of repetition in short-term of the failure cause to
One.
Optionally, the failure cause is recorded in the form of chained list, wherein the chained list includes multiple nodes, often
A corresponding key index of the node, each key index respectively correspond one or more child list, every subchain
Table includes multiple for recording the linked list head of failure cause, and each linked list head corresponds to multiple child nodes, the multiple sub- section
Fault signature, in short-term number of repetition and the failure that point is respectively used to store the failure cause report threshold value.
Optionally, the key index include at least CPU usage, memory usage, load value, I/O waiting time and
It is one or more in the CPU usage of each process.
In the embodiment of the present invention, every the monitoring sleep time of target critical index, being fitted by tool set originally includes
Master tool monitoring objective key index;If exception occurs in target critical index, this detection is fitted currently by tool set
Monitoring sleep time with the presence or absence of failure, otherwise based on the first preset duration adjustment target critical index;If there is currently
Failure, then the monitoring sleep time based on the second preset duration adjustment target critical index, and fitted this determination by tool set
And report the fault message of current failure;If referred to there is currently no failure based on third preset duration adjustment target critical
Target monitors sleep time, wherein the second preset duration is greater than the first preset duration, when the first preset duration is preset greater than third
It is long.In this way, fitting this when using tool set, different key indexes is arranged different monitoring sleep times, multiple keys
The monitoring processing of index is independent of each other, and based on different monitored results, is pointedly arranged and adjusts the different monitoring of length
Sleep time not only can repeatedly report to avoid the frequent monitoring to key index and to the frequent of same failure, but also can be compared with
For discovering device failure in time.
It should be understood that the device of monitoring equipment fault provided by the above embodiment is in monitoring equipment fault, only with
The division progress of above-mentioned each functional module can according to need and for example, in practical application by above-mentioned function distribution by not
Same functional module is completed, i.e., the internal structure of device is divided into different functional modules, to complete whole described above
Or partial function.In addition, the device of monitoring equipment fault provided by the above embodiment and the method for monitoring equipment fault are implemented
Example belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Fig. 5 is the structural schematic diagram of equipment provided in an embodiment of the present invention.The equipment 500 can be due to configuration or performance be different
Bigger difference is generated, may include one or more central processing units 522 (for example, one or more are handled
Device) and memory 532, one or more storage application programs 552 or data 554 storage medium 530 (such as one or
More than one mass memory unit).Wherein, memory 532 and storage medium 530 can be of short duration storage or persistent storage.It deposits
Storage may include one or more modules (diagram does not mark) in the program of storage medium 530, and each module may include
To the series of instructions operation in equipment.Further, central processing unit 522 can be set to communicate with storage medium 530,
The series of instructions operation in storage medium 530 is executed in equipment 500.
Equipment 500 can also include one or more power supplys 525, one or more wired or wireless networks connect
Mouthfuls 550, one or more input/output interfaces 558, one or more keyboards 555, and/or, one or one with
Upper operating system 551, such as Windows Server, Mac OS X, UnixTM, Linux, FreeBSD etc..
Equipment 500 may include have memory and one perhaps one of them or one of more than one program with
Upper program is stored in memory, and be configured to be executed by one or more than one processor it is one or one with
Upper program includes the instruction for carrying out above-mentioned monitoring equipment fault.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.