[go: up one dir, main page]

CN108446184B - Method and system for analyzing fault root cause - Google Patents

Method and system for analyzing fault root cause Download PDF

Info

Publication number
CN108446184B
CN108446184B CN201810155161.6A CN201810155161A CN108446184B CN 108446184 B CN108446184 B CN 108446184B CN 201810155161 A CN201810155161 A CN 201810155161A CN 108446184 B CN108446184 B CN 108446184B
Authority
CN
China
Prior art keywords
data
frequent
performance data
item set
alarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810155161.6A
Other languages
Chinese (zh)
Other versions
CN108446184A (en
Inventor
张银霞
付铁山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianyuan Innovation Technology Co ltd
Original Assignee
Beijing Tianyuan Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianyuan Innovation Technology Co ltd filed Critical Beijing Tianyuan Innovation Technology Co ltd
Priority to CN201810155161.6A priority Critical patent/CN108446184B/en
Publication of CN108446184A publication Critical patent/CN108446184A/en
Application granted granted Critical
Publication of CN108446184B publication Critical patent/CN108446184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a system for analyzing a fault root cause, wherein the method comprises the following steps: sorting according to the time attribute of each data in the data set, and segmenting the data set according to a preset time window to obtain a plurality of groups of sub data sets; acquiring a frequent item set and an association rule in a data set according to an Apriori algorithm, wherein the frequent item set comprises a certain amount of data with strong association; and sequencing according to the time attribute of the data in the frequent item set, sequentially matching the data in the front of the sequence with the accompanying alarm reason data prestored in the alarm reason database, if the matching is successful, removing the data in the front of the sequence from the data set, and continuously comparing the data in the next data item until the matching is unsuccessful and the data in the front of the sequence is used as the root reason of the data at the last time sequence in the frequent item set. The invention is suitable for the full-dimensional monitoring scene of the IT system, relieves the pressure of operation and maintenance personnel, and reduces the quality requirement of the operation and maintenance personnel.

Description

Method and system for analyzing fault root cause
Technical Field
The present invention relates to the field of data mining technologies, and in particular, to a method and a system for analyzing a root cause of a fault.
Background
Under the development trend of informatization and digitization, the complexity of an IT system is increasingly improved, an IT framework is increasingly complex, and the current IT framework at least comprises: once a service fails, a large amount of even massive alarm/event information can be reported, so that maintenance personnel cannot accurately perform fault location in the presence of a large amount of alarm/event data, and the fault location has high quality requirements on the maintenance personnel and needs to be participated by personnel with abundant operation, maintenance and development experiences.
The fault analysis root cause method based on association rule mining is an important method for diagnosing and positioning the fault of the IT system. Most of the existing fault analysis adopts threshold value judgment, threshold values or threshold range threshold values are allowed to be set on KPIs, and alarms are reported when the thresholds are out of limits, the judgment is not very suitable for performance problem analysis of an IT system, a large number of response time out-of-limit alarms are reported when user response is slow due to network or other reasons, and most users can not influence the faults of using the IT system by the users because the faults can not affect the use of the IT system, so that the system is normal and can not pay attention to or complain about, however, once the users complain about, the performance alarms are too many, and the analysis is difficult.
Disclosure of Invention
The present invention provides a method and system for analyzing root causes of faults that overcomes or at least partially addresses the above-mentioned problems.
According to an aspect of the present invention, there is provided a method of analyzing a root cause of a fault, comprising:
s1, sorting according to the time attribute of each data in the data set, and segmenting the data set according to a preset time window to obtain a plurality of groups of sub data sets;
s2, acquiring a frequent item set and an association rule in a data set according to an Apriori algorithm, wherein the frequent item set comprises a certain amount of data with strong association;
s3, sorting according to the time attribute of the data in the frequent item set, sequentially matching the data in the front sorting with the accompanying alarm reason data prestored in the alarm reason database, if the matching is successful, removing the data, continuously matching the next item, and finally taking the data which is unsuccessfully matched and is sorted in the front as the root reason of the data with the last time sequence in the frequent item set;
the data set comprises alarm data of each domain in the IT system in a preset time range, error data in a log and abnormal performance data in the performance data set.
Preferably, the step S1 is preceded by:
acquiring alarm data and performance data of each domain in the IT system within the preset time range through an APM probe, and acquiring error data of log data in the IT system through a log acquisition party;
screening abnormal performance data in the performance data by adopting a mean value and a multiple variance;
and forming the alarm data of each domain in the IT system, the error data in the log and the abnormal performance data in the preset time range into the data set.
Preferably, the step of screening abnormal performance data in the performance data by using the mean and the multiple variance specifically includes:
sorting the performance data in the performance data set according to a preset rule, and taking the performance data of the median as a mean value;
and calculating the variance of the performance data, filtering out the performance data with the numerical value in the range from the mean value to 3 times of the variance, and taking the residual performance data as the abnormal performance data.
Preferably, the step S2 specifically includes:
counting the support degrees of all data in the data set, then sorting the data from high to low to obtain a candidate 1-item set, removing the data which is less than the minimum support degree in the candidate 1-item set, and obtaining a frequent 1-item set;
using a layer-by-layer search technique according to an Apriori algorithm until a frequent m-term set is obtained, satisfying the condition: frequently the m-item set is not empty and the (m-1) -subset is frequent, m is no greater than the number of data in the subdata set with the most data, and the (m +1) -item set is empty;
all terms of the frequent m-term set are listed, and association rules are generated according to Apriori algorithm.
Preferably, the step S3 is followed by: and displaying the association rule and the root reason.
Preferably, the IT system comprises one or more of the following domains: services, networks, applications, databases, external interfaces, containers, virtual machines, and physical storage.
According to another aspect of the present invention, there is also provided a system for analyzing a root cause of a fault, including:
the segmentation module is used for sequencing according to the time attribute of each data in the data set and segmenting the data set according to a preset time window to obtain a plurality of groups of sub data sets;
the association module is used for acquiring a frequent item set and an association rule in a data set according to an Apriori algorithm, wherein the frequent item set comprises a certain amount of data with strong association;
the root cause searching module is used for sequencing according to the time attribute of the data in the frequent item set, sequentially matching the data in the front of the sequence with the accompanying alarm cause data prestored in the alarm cause database, removing the data from the data set if the matching is successful, continuing to match the next item, and finally taking the data which is unsuccessfully matched and is sequenced in the front as the root cause of the data at the end of the time sequence in the frequent item set;
the data set comprises alarm data of each domain in the IT system in a preset time range, error data in a log and abnormal performance data in the performance data set.
Preferably, the system further includes a data set obtaining module, where the data set obtaining module specifically includes:
the collection unit is used for acquiring alarm data and performance data of each domain in the IT system within the preset time range through the APM probe and acquiring error data of log data in the IT system through a log acquisition party;
the screening unit is used for screening abnormal performance data in the performance data by adopting a mean value and a multiple variance;
and the aggregation unit is used for forming the alarm data of each domain in the IT system, the error data in the log and the abnormal performance data in the performance data into the data set within the preset time range.
Preferably, the screening unit is specifically configured to:
sorting the performance data in the performance data set according to a preset rule, and taking a median as a mean value;
and calculating the variance of the performance data, filtering out the performance data with the numerical value in the range from the mean value to 3 times of the variance, and taking the residual performance data as the abnormal performance data.
Preferably, the association module is specifically configured to:
counting the support degrees of all data in the data set, then sorting the data from high to low to obtain a candidate 1-item set, removing the data which is less than the minimum support degree in the candidate 1-item set, and obtaining a frequent 1-item set;
using a layer-by-layer search technique according to an Apriori algorithm until a frequent m-term set is obtained, satisfying the condition: frequently the m-item set is not empty and the (m-1) -subset is frequent, m is no greater than the number of data in the subdata set with the most data, and the (m +1) -item set is empty;
all terms of the frequent m-term set are listed, and association rules are generated according to Apriori algorithm.
Drawings
FIG. 1 is a flow chart of a method for analyzing a root cause of a fault according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a system for analyzing root causes of faults according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In order to overcome the above problems in the prior art, an embodiment of the present invention provides a method for analyzing a root cause of a fault, where the design concept of the method is as follows: the alarm occurrence is caused by a certain root fault, the root fault can cause other alarms to be generated together, which is called accompanying alarm and causes alarm storm, so that the earliest fault data needs to be found from the time sequence, according to the association rule mining method, a plurality of fault data which meet the minimum support degree and have association relation are obtained as frequent items, the fault data which are ranked in the front of the frequent items are compared with a preset alarm reason database, if the comparison is successful, the fault data are the root reason data, and if the comparison is unsuccessful, the fault data which are ranked in the front are continuously compared with the alarm reason database until the matching is successful. Through practical tests, the method provided by the embodiment of the invention can accurately and quickly find out valuable alarm association rules and root causes, and provides decision support for system maintenance personnel.
Specifically, fig. 1 shows a flowchart of a method for analyzing a root cause of a fault according to an embodiment of the present invention, where as shown in the figure, the method includes:
101. sorting according to the time attribute of each data in the data set, and segmenting the data set according to a preset time window to obtain a plurality of groups of sub data sets; the data set comprises alarm data of each domain in the IT system in a preset time range, error data in a log and abnormal performance data in the performance data set.
IT should be noted that, the method firstly collects fault data within a certain time range, including alarm data of each domain (in the IT domain, an error is error, for example, a network is disconnected, and a network error comes out; an alarm is an unprocessed error, and an alarm is determined to mean that an error occurs but the erroneous IT system is unprocessed), error data in logs and abnormal performance data in a performance data set, and the fault data are stored in a database through data cleaning to form a data set for subsequent problem tracing and root cause positioning.
As is well known to those skilled in the art, the log data has levels, such as an info level, a debug level, an error level, and the like, and the log category is determined by the level according to the embodiment of the present invention. The Debug level data is log data with the lowest level, and generally, is not output in the actual operation process of the system. The Info-level log data is used to feed back the current state of the system to the end-user, so the information output here should have a practical meaning to the end-user, i.e. the end-user should be able to see what it means. The information output by the Info can be viewed as part of the software product (as is the text on those interactive interfaces) in some sense. Error level data, i.e. Error data, can be used for some repairable work, but it cannot be determined that the system will work normally, and at a later stage, the system may cause an unrepairable Error (e.g. a downtime) due to the current problem, but may also work until the system is stopped without serious problems.
In a computer system, each data has a time attribute, which indicates the start time, end time, etc. of the data. The data sets are sorted according to the starting time of each data in the data sets, so that the generation sequence of each data (namely, the fault) is obtained, the data sets are further segmented according to the time windows, and the data in the data sets can be classified into the sub-data sets corresponding to different time windows. For example, analyzing data from 10 to 12 points on a certain day for 120 minutes, dividing the data into 12 groups of data by window granularity of 10 minutes, wherein each group of data comprises a plurality of abnormal performance data and alarm data/error data, and the conception of the embodiment of the invention is as follows: if performance problems occur, many problems can be simultaneously developed at some time, but most of the problems are concentrated on several core problems, and the core problems are found out by performing probability analysis on the decomposed set. The data mining method provided by the embodiment of the invention is beneficial to finding out data with higher occurrence probability from a large amount of data.
102. Acquiring a frequent item set and an association rule in a data set according to an Apriori algorithm, wherein the frequent item set comprises a certain amount of data with strong association;
it should be noted that the Apriori algorithm is a representative algorithm for Association rule mining (Association rule mining), and is used for mining a boolean Association rule frequent item set, so-called frequent item set, which is a data set frequently appearing in a data set as the name suggests. The design concept of the embodiment of the invention is that the data with strong association relationship has higher probability of belonging to the relationship between the root alarm and the related accompanying alarm.
103. And sequencing according to the time attribute of the data in the frequent item set, sequentially matching the data in the front sequence with the accompanying alarm reason data prestored in the alarm reason database, if the matching is successful, removing the data, continuing to match the next item, and finally taking the data which is unsuccessfully matched and is in the front sequence as the root reason of the data with the last time sequence in the frequent item set.
It should be noted that the frequent item set is a set of several data with strong association, for example, the frequent item set (high memory usage, high CPU usage) is sorted according to the time attribute of the data in the frequent item set, and it can be known that the high memory usage occurs before the high CPU usage, and the high memory usage is matched with the possible root cause and the accompanying alarm cause data pre-stored in the alarm cause database. The alarm reason database is an operation maintenance management database created by the embodiment of the invention, and the database stores alarm data and relevant accompanying alarms. The alarm reason database in the embodiment of the invention can be updated by operation and maintenance personnel according to actual requirements. In the embodiment of the invention, the data is sorted according to the sequence of the time attributes from front to back, the data with the highest sorting is the data with the earliest time, if the data is successfully matched with the burst alarm reason data, the data is removed from the frequent item set, and then the data with the highest sorting is matched, so that the data which cannot be matched is sorted until the data which cannot be matched is found out.
On the basis of the above embodiment, step 101 further includes a process of acquiring a data set, specifically, the process includes:
001. alarm data and performance data of each domain in the IT system in a preset time range are obtained through the APM probe, and error data of log data in the IT system are obtained through a log collection method.
Table 1 shows the various domains in the IT system and the corresponding performance data that needs to be collected according to an embodiment of the present invention.
Figure GDA0003099439590000081
TABLE 1 Domain and Performance data sheet for IT systems
002. And acquiring a performance data set in the IT system within a preset time range, and screening abnormal performance data in the performance data by adopting the mean value and multiple variance.
It should be noted that, because the performance data is linear, under normal conditions, the performance data steadily tends to a straight line, and when an abnormality occurs, the performance data may have an uneven curve, so the embodiment of the present invention screens the abnormal performance data by using the median and the multiple variance.
003. And forming a data set by alarm data of each domain in the IT system, error data in the log and abnormal performance data in the performance data within a preset time range.
On the basis of the above embodiment, the step of screening abnormal performance data in the performance data by using the mean and the multiple variance specifically includes:
sorting the performance data in the performance data set according to a preset rule (for example, the sequence from big to small), and taking a median as a mean value;
it should be noted that the median is used as the mean value in the embodiments of the present invention, because the median is more sensitive to the abnormal value, for example, 1, 2, 3, 4, 100, the mean value is 22, but the median is 3 (the middle value), and it is obviously reasonable to find the abnormal value by using the median.
And calculating the variance of the performance data, filtering out the performance data with the numerical value in the range from the mean value to 3 times of the variance, and taking the residual performance data as abnormal performance data.
On the basis of the foregoing embodiment, step 102 specifically includes:
counting the support degrees of all data in the data set, then sorting the data from high to low to obtain a candidate 1-item set, removing the data which is less than the minimum support degree in the candidate 1-item set, and obtaining a frequent 1-item set;
using a layer-by-layer search technique according to an Apriori algorithm until a frequent m-term set is obtained, satisfying the condition: frequently the m-item set is not empty and the (m-1) -subset is frequent, m is no greater than the number of data in the subdata set with the most data, and the (m +1) -item set is empty;
all items of the m-frequent item set are listed, and association rules are generated according to Apriori algorithm.
It should be noted that, since the abnormal performance data itself is a discrete, non-linear data point, in general, the data in most applications is generated by one or more programs that reflect the system's function. When the underlying application runs in an abnormal manner, abnormal performance data is generated, and the abnormal performance data can be quickly and efficiently found to be very valuable. In IT systems, since the problem is cascading: the root cause is that the wind is generated firstly and can not be self-healed, so that other problems occur together to form an alarm storm. Therefore, when performing association analysis on abnormal performance data, the embodiment of the present invention performs the following process of association rule:
finding out all frequent item sets, namely the frequency spectrum item set, wherein the occurrence frequency of the set is not less than the minimum support degree; strong association rules are generated from the frequent set of items, which must satisfy a minimum support and a minimum confidence.
Specifically, the probability that both the memory usage and the CPU are higher than the preset threshold is calculated, that is, the number of times the above-mentioned problems occur simultaneously in one data set is divided by the total number of abnormal performance data in the data set. For example: support ({ memory high } - > { CPU occupied high }) -the number of times memory high and CPU occupied high co-occur/data record number 3/5-60%.
Finding strong association rules
It should be noted that, in the previous step, data with higher probability has been separated from massive performance data and abnormal performance data, and then the association rule is strengthened through probability analysis. In the embodiment of the present invention, a conditional probability analysis is adopted, for example, the probability that the CPU occupies a high level when the memory is high is calculated, whereas the memory occupies a low level and the CPU occupies a low level. For example: the number of times of simultaneous occurrence of a memory high and a CPU occupancy high/the number of times of occurrence of a memory high 3/3 is 100%; the number of times of simultaneous occurrence of memory high and CPU occupancy high/the number of times of occurrence of CPU occupancy high 3/4 is 75%.
To better understand the Apriori algorithm employed by embodiments of the present invention, the basic concept of the Apriori algorithm is first explained:
1. item set and K-item set
Let I ═ I1, I2, I3 … … id } be the set of all items (i.e. data) in the dataset, and T ═ T1, T2, T3 … tN } be the set of all transactions (i.e. time windows), each transaction ti containing a set of items that is a subset of I. In association analysis, a set containing 0 or more items is called an item set. If a set of items contains K items, it is referred to as a K-item set. An empty set refers to a set of items that do not contain any items. For example, { CPU occupancy is high, response time duration is high, memory usage is high } is a 3-entry set in one example of the invention. Table 2 shows a data set table of the embodiment of the present invention, where TID1 represents a subset corresponding to the first time window, and as can be seen from table 2, TID1 contains two sets of entries: CPU high and corresponding duration high.
Figure GDA0003099439590000101
Figure GDA0003099439590000111
TABLE 2 data set Table
2. Count of support counts
An important property of an item set is its support count, i.e., the number of transactions that contain a particular item set, mathematically, the support count σ (X) for item set X can be expressed as:
Figure GDA0003099439590000112
Figure GDA0003099439590000113
where the symbol | represents the number of elements in the set. In the embodiment described in table 2, the support count for the set of entries { latency is high, memory usage is high, response duration is high } is 2, since only 3 and 4 transactions contain these 3 entries simultaneously.
3. Association rules
An association rule is an implication expression shaped as X → Y, where X and Y are disjoint sets of terms, i.e.
Figure GDA0003099439590000114
The strength of an association rule may be measured in terms of its support (support) and confidence (confidence). The support determination rules may be used for how often a given data set occurs, while the confidence determines how often Y occurs in transactions containing X.
The two measures, support(s) and confidence (c), are formally defined as follows:
s(X→Y)=σ(X∪Y)/N
c(X→Y)=σ(X∪Y)/σ(X)
where σ (X U.Y) is the support count of (X U.Y), N is the total number of transactions, and σ (X) is the support count of X.
Example
In the embodiment described in table 2, consider the rule { response time high, memory usage high } → { latency high }. Since the support count for the set of entries { response time long, memory usage high, latency high } is 2, and the total number of transactions is 5, the support for the rule is 2/5 ═ 0.4.
The confidence of the rule is a quotient of the support counts of the item set { response time length is high, memory usage is high, latency is high } and the support techniques of the item set { response time length is high, memory usage is high }, and since there are 3 transactions that contain both response time length is high and memory usage is high, the confidence of the rule is 2/3 ═ 0.67.
Association rule discovery
Given a set of transactions T, the association rule discovery refers to finding all rules with a support degree greater than or equal to minsup (minimum support degree) and a confidence degree greater than or equal to minconf (minimum confidence degree), where minsup and minconf are corresponding support degree and confidence degree thresholds.
The mining of association rules is a two-step process:
(1) frequent item set generation: the goal is to find all sets of items (at least as many as the predefined minimum support count) that meet the minimum support threshold, which are called frequent sets of items.
(2) And (3) generating a rule: the goal is to extract all high confidence rules, called strong rules, from the set of frequent items found in the previous step. (minimum support and minimum confidence must be met)
The essence of the Apriori algorithm uses the candidate set to find a frequent item set. The Apriori algorithm is an algorithm for mining a frequent item set of boolean association rules, which has the most influence. The name of the algorithm is based on the fact that: the algorithm uses a priori knowledge of the nature of the frequent itemset, as we will see. Apriori uses an iterative approach called layer-by-layer search, where a set of k-terms is used to explore a set of (k +1) -terms. First, a set of frequent 1-item sets is found. This set is denoted L1. L1 is used to find the set of frequent 2-item sets, L2, and L2 is used to find L3, and so on until no frequent k-item sets can be found. One database scan is required to find each Lk.
Apriori properties: all non-empty subsets of the frequent item set must also be frequent. Apriori properties are based on the following observations: by definition, if the set of items I does not meet the minimum support threshold s, then I is not frequent, i.e., p (I) < s. If item A is added to I, the resulting set of items (i.e., I @ A) is unlikely to occur more frequently than I. Thus, also itou a is not frequent, i.e. P (itou a) < s. This property belongs to a special classification, called inverse monotonic, meaning that if a set fails the test, all its supersets also fail the same test. It is called inverse monotonic because the property is monotonic in the sense that it does not pass the test.
For the Apriori algorithm, if a set is a frequent item set, then all of its subsets are frequent item sets. Examples are: assuming that a set { memory high, CPU high } is a frequent item set, i.e. the number of times of simultaneous occurrence of memory high and CPU high in a record is greater than or equal to the minimum support min _ support, its subset { memory high }, and { CPU high } must be greater than or equal to min _ support, i.e. its subsets are frequent item sets. If a collection is not a frequent item set, then all of its supersets are not frequent item sets. Examples are: assuming that the set { memory high } is not a frequent item set, i.e., the number of occurrences of memory high is less than min _ support, then the number of occurrences of any superset thereof, e.g., { memory high, CPU occupied high } is necessarily less than min _ support, and thus its superset is necessarily neither a frequent item set.
The key to the Apriori algorithm is how to find Lk with Lk-1, which consists of the following two-step process:
a connecting step: to find Lk, a set of candidate k-term sets is generated by concatenating Lk-1 with itself. The set of candidates is denoted Ck. Let l1 and l2 be the set of items in Lk-1. The notation li [ j ] denotes the jth item of li (e.g., l1[ k-2] denotes the 3 rd last item of l 1). For convenience, it is assumed that the terms in the transaction or set of terms are ordered in lexicographic order. Performing a connection Lk-1; wherein the elements of Lk-1 are connectable if their first (k-2) entries are the same; that is, the elements l1 and l2 of Lk-1 are connectable if (l1[1] ═ l2[1]) Λ (l1[2] ═ l2[2]) Λ … Λ (l1[ k-2] ═ l2[ k-2]) Λ (l1[ k-1] < l2[ k-1 ]). The condition (l1[ k-1] < l2[ k-1]) is simply to ensure that no duplication occurs. The resulting set of terms resulting from the linkage of l1 and l2 is l1[1] l1[2] … l1[ k-1] l2[ k-1 ].
Pruning: ck is the superset of Lk; that is, its membership may or may not be frequent, but all of the frequent k-term sets are contained in Ck. The database is scanned and the count of each candidate in Ck is determined, thereby determining Lk (i.e., by definition, all candidates whose count value is not less than the minimum support count are frequent and thus belong to Lk). However, Ck can be large, and thus the amount of computation involved is large. For compressing Ck, Apriori properties can be used in the following way: any infrequent (k-1) -item set is not a subset of the likely frequent k-item set. Thus, if the (k-1) -subset of a candidate set of k-items is not in Lk-1, the candidate is also unlikely to be frequent and thus can be deleted by Ck. This subset testing can be done quickly using a hash tree of all the frequent item sets.
Generating association rules from a frequent set of items
Once the frequent set of terms is found by the transactions in database D, it is straightforward to generate strong association rules from them (strong association rules satisfy minimum support and minimum confidence). For confidence, the following equation can be used, where the conditional probability is expressed in terms of item set support counts. consistency (a → B) ═ P (a ═ B) ═ support (a ═ B)/support (a), where support (a ═ B) is the support count of (a £ B), and support (a) is the support count of a. From this equation, the association rule may be generated as follows:
f1, for each frequent item set l, all non-empty subsets of l are generated.
f2, for each non-empty subset s of l, if support (l)/support(s) ≧ min _ conf, the rule is output
Figure GDA0003099439590000141
Where min _ conf is the minimum confidence threshold. Since the rules are generated from a frequent set of items, each rule automatically satisfies a minimum support. The frequent item sets, along with their support, are pre-stored in the hash table so that they can be accessed quickly.
The Apriori algorithm is described below with an example in which a dataset has 9 time windows, i.e., 9 sub-datasets, | D | ═ 9. The sub data set T1 comprises data I1, I2 and I5; the sub data set T2 contains data I2 and I4; the sub data set T3 contains data I2 and I3; the sub data set T4 contains data I1, I2 and I4; the sub data set T5 contains data I1 and I3; the sub data set T6 contains data I2 and I3; the sub data set T7 contains data I1 and I3; the sub data set T8 comprises data I1, I2, I3 and I5; the sub data set T9 contains data I1, I2 and I3.
One), mining frequent item sets
1. On the first iteration of the algorithm, each term is a member of the set of candidate 1-terms C1, the algorithm simply scans all transactions, counting the number of occurrences of each term.
2. Assume that the minimum transaction support count is 2 (i.e., minsup-2/9-22%). A set of frequent 1-item sets L1 may be determined. It consists of a candidate 1-item set with minimal support.
3. To find the set of frequent 2-item sets, L2, the algorithm uses L1 xL 1 to produce the set of candidate 2-item sets, C2.
4. The transaction in D is scanned and a support count for each candidate item in C2 is calculated.
5. A set of frequent 2-item sets L2 is determined, which consists of the candidate 2-item sets in C2 with the least support.
6. The generation of the candidate set of 3-items C3 is detailed in the figure. First, let C3 be L2L 2 { { I1, I2, I3}, { I1, I2, I5}, { I1, I3, I5}, { I2, I3, I4}, { I2, I3, I5}, { I2, I4, I5} }. According to Apriori properties, all subsets of the frequent item set must be frequent, and we can determine that the last 4 candidates are unlikely to be frequent. Therefore, we have deleted them from C3, so that it is not necessary to count them later when scan D determines L3. Note that the Apriori algorithm uses a layer-by-layer search technique, and given a set of k-terms, we need only check whether their (k-1) -subset is frequent.
[ L2L 2 ligation Process to generate C3 ]
1. Connecting: c { { I, I }, { I, I } } { { I, I }, { I }, and { I } }
2. Pruning using Apriori properties: all subsets of the frequent item set must be frequent.
The 2-item subset of f { I1, I2, I3} is { I1, I2}, { I1, I3} and { I2, I3 }. All 2-item subsets of { I1, I2, I3} are elements of L2. Thus, { I1, I2, I3} is retained in C3.
The 2-item subset of f { I1, I2, I5} is { I1, I2}, { I1, I5} and { I2, I5 }. All 2-item subsets of { I1, I2, I5} are elements of L2. Thus, { I1, I2, I5} is retained in C3.
The 2-item subset of f { I1, I3, I5} is { I1, I3}, { I1, I5} and { I3, I5 }. { I3, I5} are not elements of L2 and are therefore infrequent. Thus, { I1, I3, I5} is deleted from C3.
The 2-item subset of f { I2, I3, I4} is { I2, I3}, { I2, I4} and { I3, I4 }. { I3, I4} are not elements of L2 and are therefore infrequent. Thus, { I2, I3, I4} is deleted from C3.
The 2-item subset of f { I2, I3, I5} is { I2, I3}, { I2, I5} and { I3, I5 }. { I3, I5} are not elements of L2 and are therefore infrequent. Thus, { I2, I3, I5} is deleted from C3.
The 2-item subset of f { I2, I4, I5} is { I2, I4}, { I2, I5} and { I4, I5 }. { I4, I5} are not elements of L2 and are therefore infrequent. Thus, { I2, I3, I5} is deleted from C3.
3. After pruning, C3 { { I1, I2, I3}, { I1, I2, I5}
7. The transaction in D is scanned to determine L3, which consists of the set of candidate 3-items in C3 with the least support.
8. The algorithm used L3 xl 3 to generate the set of candidate 4-term sets C4. Although the concatenation yields the result { { I1, I2, I3, I5} }, this set of items is pruned because its subset { I1, I3, I5} is infrequent. In this way it is possible to obtain,
Figure GDA0003099439590000161
the algorithm terminates and finds all the frequent sets of terms.
On the basis of the above embodiments, the IT system comprises one or more of the following domains: services, networks, applications, databases, external interfaces, containers, virtual machines, and physical storage.
On the basis of the above embodiment, step 103 further includes: and displaying the association rule and the root reason. It should be noted that, by displaying the association rule and the root reason, the operation and maintenance personnel can provide decision support conveniently.
Fig. 2 shows a functional block diagram of a system for analyzing a root cause of a fault according to an embodiment of the present invention, and as shown in the figure, the method includes:
the segmentation module 201 is configured to sort the data sets according to the time attributes of the data in the data sets, and segment the data sets according to a preset time window to obtain a plurality of groups of sub data sets; the data set comprises alarm data of each domain in the IT system in a preset time range, error data in a log and abnormal performance data in the performance data set.
IT should be noted that the segmentation module of the system first collects fault data within a certain time range, including alarm data of each domain (in the IT domain, an error is error, for example, a network is disconnected, and a network error comes out; an alarm is an unprocessed error, and an alarm is positive and means that an error occurs, but the erroneous IT system is unprocessed), error data in a log, and abnormal performance data in a performance data set, and these fault data are stored in a database through data cleaning to form a data set for subsequent problem tracing and root cause positioning.
As is well known to those skilled in the art, the log data has levels, such as an info level, a debug level, an error level, and the like, and the log category is determined by the level according to the embodiment of the present invention. The Debug level data is log data with the lowest level, and generally, is not output in the actual operation process of the system. The Info-level log data is used to feed back the current state of the system to the end-user, so the information output here should have a practical meaning to the end-user, i.e. the end-user should be able to see what it means. The information output by the Info can be viewed as part of the software product (as is the text on those interactive interfaces) in some sense. Error level data, i.e. Error data, can be used for some repairable work, but it cannot be determined that the system will work normally, and at a later stage, the system may cause an unrepairable Error (e.g. a downtime) due to the current problem, but may also work until the system is stopped without serious problems.
In a computer system, each data has a time attribute, which indicates the start time, end time, etc. of the data. The data sets are sorted according to the starting time of each data in the data sets, so that the generation sequence of each data (namely, the fault) is obtained, the data sets are further segmented according to the time windows, and the data in the data sets can be classified into the sub-data sets corresponding to different time windows. For example, analyzing data from 10 to 12 points on a certain day for 120 minutes, dividing the data into 12 groups of data by window granularity of 10 minutes, wherein each group of data comprises a plurality of abnormal performance data and alarm data/error data, and the conception of the embodiment of the invention is as follows: if performance problems occur, many problems can be simultaneously developed at some time, but most of the problems are concentrated on several core problems, and the core problems are found out by performing probability analysis on the decomposed set. The data mining system provided by the embodiment of the invention is beneficial to finding out data with higher occurrence probability from a large amount of data.
The association module 202 is configured to obtain a frequent item set and an association rule in a data set according to an Apriori algorithm, where the frequent item set includes a certain amount of data with a strong association relationship.
It should be noted that the Apriori algorithm is a representative algorithm for Association rule mining (Association rule mining), and is used for mining a boolean Association rule frequent item set, so-called frequent item set, which is a data set frequently appearing in a data set as the name suggests. The design concept of the embodiment of the invention is that the data with strong association relationship has higher probability of belonging to the relationship between the root alarm and the related accompanying alarm.
And the root cause searching module 203 is configured to sort according to the time attribute of the data in the frequent item set, sequentially match the data in the top order with the accompanying alarm cause data pre-stored in the alarm cause database, remove the data if the matching is successful, continue to match the next item, and finally use the data which is unsuccessfully matched and is in the top order as the root cause of the data with the last time sequence in the frequent item set.
On the basis of the above embodiment, the system of the embodiment of the present invention further includes a data set acquisition module, where the data set acquisition module specifically includes:
the collection unit is used for acquiring alarm data and performance data of each domain in the IT system within the preset time range through the APM probe and acquiring error data of log data in the IT system through a log acquisition party;
the screening unit is used for screening abnormal performance data in the performance data by adopting a mean value and a multiple variance;
and the aggregation unit is used for forming the alarm data of each domain in the IT system, the error data in the log and the abnormal performance data in the performance data into the data set within the preset time range.
On the basis of the above embodiments, the screening unit is specifically configured to:
sorting the performance data in the performance data set according to a preset rule, and taking the performance data of the median as a mean value;
and calculating the variance of the performance data, filtering out the performance data with the numerical value in the range from the mean value to 3 times of the variance, and taking the residual performance data as the abnormal performance data.
On the basis of the foregoing embodiments, the association module is specifically configured to:
counting the support degrees of all data in the data set, then sorting the data from high to low to obtain a candidate 1-item set, removing the data which is less than the minimum support degree in the candidate 1-item set, and obtaining a frequent 1-item set;
using a layer-by-layer search technique according to an Apriori algorithm until a frequent m-term set is obtained, satisfying the condition: frequently the m-item set is not empty and the (m-1) -subset is frequent, m is no greater than the number of data in the subdata set with the most data, and the (m +1) -item set is empty;
all terms of the frequent m-term set are listed, and association rules are generated according to Apriori algorithm.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of analyzing a root cause of a fault, comprising:
s1, sorting according to the time attribute of each data in the data set, and segmenting the data set according to a preset time window to obtain a plurality of groups of sub data sets;
s2, acquiring a frequent item set and an association rule in a data set according to an Apriori algorithm, wherein the frequent item set comprises a certain amount of data with strong association;
s3, sorting according to the time attribute of the data in the frequent item set, sequentially matching the data in the front sorting with the accompanying alarm reason data prestored in the alarm reason database, if the matching is successful, removing the data, continuously matching the next item, and finally taking the data which is unsuccessfully matched and is sorted in the front as the root reason of the data with the last time sequence in the frequent item set;
the data set comprises alarm data of each domain in the IT system in a preset time range, error data in a log and abnormal performance data in the performance data set.
2. The method of claim 1, wherein the step S1 is preceded by:
acquiring alarm data and performance data of each domain in the IT system within the preset time range through an APM probe, and acquiring error data of log data in the IT system through a log acquisition party;
screening abnormal performance data in the performance data by adopting a mean value and a multiple variance;
and forming the alarm data of each domain in the IT system, the error data in the log and the abnormal performance data in the preset time range into the data set.
3. The method of claim 2, wherein the step of screening the performance data for abnormal performance data using the mean and the multiple variance comprises:
sorting the performance data in the performance data set according to a preset rule, and taking a median as a mean value;
and calculating the variance of the performance data, filtering out the performance data with the numerical value in the range from the mean value to 3 times of the variance, and taking the residual performance data as the abnormal performance data.
4. The method according to claim 1, wherein the step S2 specifically includes:
counting the support degrees of all data in the data set, then sorting the data from high to low to obtain a candidate 1-item set, removing the data which is less than the minimum support degree in the candidate 1-item set, and obtaining a frequent 1-item set;
using a layer-by-layer search technique according to Apriori's algorithm until a frequent m-term set is obtained, which satisfies the condition: frequently the m-item set is not empty and the (m-1) -subset is frequent, m is no greater than the number of data in the subdata set with the most data, and the (m +1) -item set is empty;
all terms of the frequent m-term set are listed, and association rules are generated according to Apriori algorithm.
5. The method of claim 1, wherein the step S3 is further followed by: and displaying the association rule and the root reason.
6. The method of claim 1, wherein the IT system comprises one or more of the following domains: services, networks, applications, databases, external interfaces, containers, virtual machines, and physical storage.
7. A system for analyzing a root cause of a fault, comprising:
the segmentation module is used for sequencing according to the time attribute of each data in the data set and segmenting the data set according to a preset time window to obtain a plurality of groups of sub data sets;
the association module is used for acquiring a frequent item set and an association rule in a data set according to an Apriori algorithm, wherein the frequent item set comprises a certain amount of data with strong association;
the root cause searching module is used for sequencing according to the time attribute of the data in the frequent item set, sequentially matching the data which is sequenced at the front with the accompanying alarm cause data which is prestored in the alarm cause database, removing the data if the matching is successful, continuously matching the next data, and finally taking the data which is unsuccessfully matched and is sequenced at the front as the root cause of the data with the last time sequence in the frequent item set;
the data set comprises alarm data of each domain in the IT system in a preset time range, error data in a log and abnormal performance data in the performance data set.
8. The system of claim 7, further comprising a dataset acquisition module, the dataset acquisition module specifically comprising:
the collection unit is used for acquiring alarm data and performance data of each domain in the IT system within the preset time range through the APM probe and acquiring error data of log data in the IT system through a log acquisition party;
the screening unit is used for screening abnormal performance data in the performance data by adopting a mean value and a multiple variance;
and the aggregation unit is used for forming the alarm data of each domain in the IT system, the error data in the log and the abnormal performance data in the performance data into the data set within the preset time range.
9. The system of claim 8, wherein the screening unit is specifically configured to:
sorting the performance data in the performance data set according to a preset rule, and taking a median as a mean value;
and calculating the variance of the performance data, filtering out the performance data with the numerical value in the range from the mean value to 3 times of the variance, and taking the residual performance data as the abnormal performance data.
10. The system of claim 7, wherein the association module is specifically configured to:
counting the support degrees of all data in the data set, then sorting the data from high to low to obtain a candidate 1-item set, removing the data which is less than the minimum support degree in the candidate 1-item set, and obtaining a frequent 1-item set;
using a layer-by-layer search technique according to an Apriori algorithm until a frequent m-term set is obtained, satisfying the condition: frequently the m-item set is not empty and the (m-1) -subset is frequent, m is no greater than the number of data in the subdata set with the most data, and the (m +1) -item set is empty;
all terms of the frequent m-term set are listed, and association rules are generated according to Apriori algorithm.
CN201810155161.6A 2018-02-23 2018-02-23 Method and system for analyzing fault root cause Active CN108446184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810155161.6A CN108446184B (en) 2018-02-23 2018-02-23 Method and system for analyzing fault root cause

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810155161.6A CN108446184B (en) 2018-02-23 2018-02-23 Method and system for analyzing fault root cause

Publications (2)

Publication Number Publication Date
CN108446184A CN108446184A (en) 2018-08-24
CN108446184B true CN108446184B (en) 2021-09-07

Family

ID=63192836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810155161.6A Active CN108446184B (en) 2018-02-23 2018-02-23 Method and system for analyzing fault root cause

Country Status (1)

Country Link
CN (1) CN108446184B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109358602A (en) * 2018-10-23 2019-02-19 山东中创软件商用中间件股份有限公司 A kind of failure analysis methods, device and relevant device
CN109657547B (en) * 2018-11-13 2020-07-07 成都四方伟业软件股份有限公司 Accompanying model-based abnormal trajectory analysis method
CN109656969B (en) * 2018-11-16 2024-08-23 北京奇虎科技有限公司 Data transaction analysis method and device
CN111241145A (en) * 2018-11-28 2020-06-05 中国移动通信集团浙江有限公司 A method and device for self-healing rule mining based on big data
CN109753526A (en) * 2018-12-28 2019-05-14 四川新网银行股份有限公司 A kind of device and method that warning information analysis is inquired based on timing similarity
CN109815042B (en) * 2019-01-21 2022-05-27 南方科技大学 Locating method, device, server and storage medium for abnormal factors
CN110633195B (en) * 2019-09-29 2023-01-03 北京博睿宏远数据科技股份有限公司 Performance data display method and device, electronic equipment and storage medium
CN110597889A (en) * 2019-10-08 2019-12-20 四川长虹电器股份有限公司 Machine tool fault prediction method based on improved Apriori algorithm
CN110795414B (en) * 2019-11-01 2023-04-14 北京北方华创微电子装备有限公司 Alarm analysis method and device for semiconductor equipment
CN110932899B (en) * 2019-11-28 2022-07-26 杭州东方通信软件技术有限公司 Intelligent fault compression research method and system applying AI
CN110991668A (en) * 2019-11-29 2020-04-10 合肥国轩高科动力能源有限公司 An analysis method of electric vehicle power battery monitoring data based on association rules
CN111898090A (en) * 2020-06-19 2020-11-06 中国电力科学研究院有限公司 A method and system for analyzing the probability distribution of failure causes of primary power equipment
CN111811567B (en) * 2020-07-21 2022-03-01 北京中科五极数据科技有限公司 Equipment detection method based on curve inflection point comparison and related device
CN113660223B (en) * 2021-07-28 2023-06-09 上海纽盾科技股份有限公司 Network security data processing method, device and system based on alarm information
CN113590370B (en) * 2021-08-06 2022-06-21 北京百度网讯科技有限公司 Fault processing method, device, equipment and storage medium
CN113505044B (en) * 2021-09-09 2022-02-08 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN114513802B (en) * 2022-01-04 2023-06-09 武汉烽火技术服务有限公司 Method and device for analyzing bearing network faults based on event stream
CN114064741B (en) * 2022-01-17 2022-05-24 天津所托瑞安汽车科技有限公司 Method, device and equipment for acquiring prepositive data and storage medium
CN115237721B (en) * 2022-07-29 2025-08-29 苏州浪潮智能科技有限公司 A method, device and storage medium for predicting faults based on window frequent sequences
CN118094169B (en) * 2024-04-28 2024-07-16 武汉理工大学 Component correlation analysis method for intelligent operation and maintenance alarm system of complex equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8156377B2 (en) * 2010-07-02 2012-04-10 Oracle International Corporation Method and apparatus for determining ranked causal paths for faults in a complex multi-host system with probabilistic inference in a time series
CN102809965A (en) * 2012-07-30 2012-12-05 燕山大学 A hydraulic equipment fault early warning method based on frequent fault mode
CN105224616A (en) * 2015-09-18 2016-01-06 浪潮软件股份有限公司 APRIORI algorithm improvement method based on time sequence
CN105681312A (en) * 2016-01-28 2016-06-15 李青山 Mobile internet exceptional user detection method based on frequent itemset mining
CN106502815A (en) * 2016-10-20 2017-03-15 北京蓝海讯通科技股份有限公司 A kind of abnormal cause localization method, device and computing device
CN107301119A (en) * 2017-06-28 2017-10-27 北京优特捷信息技术有限公司 The method and device of IT failure root cause analysis is carried out using timing dependence

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060782B2 (en) * 2010-03-01 2011-11-15 Microsoft Corporation Root cause problem identification through event correlation
WO2016099558A1 (en) * 2014-12-19 2016-06-23 Hewlett Packard Enterprise Development Lp Automative system management

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8156377B2 (en) * 2010-07-02 2012-04-10 Oracle International Corporation Method and apparatus for determining ranked causal paths for faults in a complex multi-host system with probabilistic inference in a time series
CN102809965A (en) * 2012-07-30 2012-12-05 燕山大学 A hydraulic equipment fault early warning method based on frequent fault mode
CN105224616A (en) * 2015-09-18 2016-01-06 浪潮软件股份有限公司 APRIORI algorithm improvement method based on time sequence
CN105681312A (en) * 2016-01-28 2016-06-15 李青山 Mobile internet exceptional user detection method based on frequent itemset mining
CN106502815A (en) * 2016-10-20 2017-03-15 北京蓝海讯通科技股份有限公司 A kind of abnormal cause localization method, device and computing device
CN107301119A (en) * 2017-06-28 2017-10-27 北京优特捷信息技术有限公司 The method and device of IT failure root cause analysis is carried out using timing dependence

Also Published As

Publication number Publication date
CN108446184A (en) 2018-08-24

Similar Documents

Publication Publication Date Title
CN108446184B (en) Method and system for analyzing fault root cause
CN113254255B (en) A cloud platform log analysis method, system, device and medium
US6697802B2 (en) Systems and methods for pairwise analysis of event data
CN105677759B (en) An alarm correlation analysis method in information communication network
Xu et al. Online system problem detection by mining patterns of console logs
US8635498B2 (en) Performance analysis of applications
US8918345B2 (en) Network analysis system
CN114090850B (en) Log classification method, electronic device and computer readable storage medium
CN103761173A (en) Log based computer system fault diagnosis method and device
CN111309565B (en) Alarm processing method and device, electronic equipment and computer readable storage medium
KR20150080533A (en) Characterizing data sources in a data storage system
CN112328425A (en) Anomaly detection method and system based on machine learning
CN110149223B (en) Fault positioning method and equipment
KR102470364B1 (en) A method for generating security event traning data and an apparatus for generating security event traning data
CN112953738B (en) Root cause alarm positioning system, method, device and computer equipment
CN110503247A (en) Telecommunication network alarm prediction method and system
CN105827422A (en) Method and device for determining network element alarm correlation relation
CN112968805B (en) Alarm log processing method and device
US11025478B2 (en) Method and apparatus for analysing performance of a network by managing network data relating to operation of the network
US8543552B2 (en) Detecting statistical variation from unclassified process log
CN119759716A (en) Log error prediction method based on random forest algorithm and electronic equipment
CN111814436B (en) User behavior sequence detection method and system based on mutual information and entropy
Pan et al. An Intelligent Framework for Log Anomaly Detection Based on Log Template Extraction
CN114707495A (en) Group complaint identification method and device, electronic equipment and storage medium
CN116112960A (en) Base station out-of-service alarm prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant