WO2021224658A1

WO2021224658A1 - Automatic and adaptive labelling of time series data for cloud system management

Info

Publication number: WO2021224658A1
Application number: PCT/IB2020/054263
Authority: WO
Inventors: Chunyan Fu; Fetahi WUHIB; Mbarka SOUALHIA
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2020-05-05
Filing date: 2020-05-05
Publication date: 2021-11-11
Anticipated expiration: 2022-11-05
Also published as: EP4147418A1

Abstract

Methods and apparatuses for automatic data labeling are described. In one embodiment, a method for a label manager includes receiving a dataset identified as an anomaly and a corresponding list of features associated with the anomaly; selecting a number, n, of features out of the list of features associated with the anomaly; obtaining a label to associate to the dataset based on a comparison between the selected features and features in at least one label in a label pool, the obtaining comprising, based on the comparison, one of generating a new label to associate to the dataset and using the at least one label for the dataset; initiating a label evaluation of at least one label in the label pool; and based on a result of the label evaluation, determining whether to adjust at least one parameter associated with the at least one label.

Description

AUTOMATIC AND ADAPTIVE LABELLING OL TIME SERIES DATA FOR

CLOUD SYSTEM MANAGEMENT

TECHNICAL FIELD

The present disclosure relates to wireless communication and in particular, to automatic and adaptive labelling of time series data for cloud system management.

BACKGROUND

Cloud systems including Edge Clouds are components of the Third Generation Partnership Project (3 GPP) Fifth Generation (5G, also called New Radio or NR) communication system. These systems enable various services such as Internet-of- Things (IoT), content delivery network, vehicle networks, and industry automations. The reliability of such system is a key for a successful 5G operation.

Faults and performance degradations often occur in cloud systems due to unreliable hardware, software errors, network issues or unbalanced traffic, etc. Security attacks such as denial of service and data breach may also occur and bring a risk to the cloud operators.

Once the fault, performance degradation and/or security attack events occur, they can propagate and accumulate. Due to the lack of recovery mechanisms in the systems such as Edge Clouds, early detection or prediction of such events are beneficial.

With the trend of data-centric services in telecom systems, Machine Learning (ML) techniques are more and more used for fault, performance and security management. ML techniques use the data collected from the system, train the detection or prediction models and then use the models for online detection or prediction. In this way, an operator can use ML models to find the correlations between metrics and the events (i.e., faults or anomalies) so that they can detect the event early and apply appropriate recovery or remedy solutions. Or, if such events are recurrent, cumulative or with some seasonal patterns, the ML models can predict them, thus allowing one to prevent the events from happening.

Labelling is a beneficial function for supervised machine learning. The quality of labels used to train the models has a direct impact on the performance of supervised learning. The reliability of telecom cloud systems is expected to be high. That is, in most of the cases, the cloud systems are expected to be in ‘normal’ state (up, running and fulfilling the customer service level agreements (SLAs)). SLA violations and faulty states generated by, e.g., security attack or malfunctioning software and hardware, are seen as anomalies and are often the corner cases.

However, anomalies may be caused by various reasons and may show a high diversity. Proper labelling of such anomalies may assist a manager of a cloud system in 1) quickly identifying the type of an anomaly, 2) promptly finding a root cause and recovering it and/or 3) learning the behavior that leads to an anomaly and preventing the behavior. Therefore, it is beneficial to provide an efficient and accurate labelling arrangement to ensure the correctness of trained models.

Data collected from telecom cloud systems may be time series data. Labelling time series data is usually a manual and costly process and typically requires prior knowledge about the monitored system. Such manual solutions can hardly be generic, i.e., able to identify all labels in large information technology (IT) systems, and it may also be difficult to guarantee a full coverage of the labels especially for large datasets. In addition, it is difficult to identify rare events or unknown faults when they occur.

In many real-world applications, manually labelling massive data collections is expensive and impractical. Many well-established algorithms have been proposed to solve multi-label learning problems in various domains. The following presents some examples of attempts to address the problem of labelling datasets.

One proposed solution used binary relevance classification framework to learn label-specific data representation for each class label. Binary relevance considers each class label as a binary classification problem in which a class label is composed of label-specific features (since each class label might be determined by some specific characteristics of its own). In addition, the approach was designed to learn class- dependent labels in a sparse stacking way. However, the proposed solution is associated with a high cost. It is expensive and even computationally unaffordable for data sets with many labels. Moreover, the classification process for the multi-label may suffer from class-imbalance issues.

Traditional active learning methods require that an expert provides a label for each data query. Contrary to such methods, one active learning approach was proposed that can be used by nonexpert labelers. It is based on pairwise homogeneity using active learning in which the human is asked only to judge whether a pair of instances fit to the same class. The proposed solution allows nonexpert labelers to carry out the labelling task without explicitly knowing the class label of each queried instance. Although such solution may reduce labelling cost, it still relies on input from the human (expert or nonexpert) and requires an initial dataset. It also may not generalize well for large data sets especially for large systems.

Another proposed solution presented learns label correlations for multi-label learning purposes. For instance, a new multi-label approach was proposed to exploit global and local label correlation simultaneously to handle full/missing label cases. This is due to the fact that some work assumes that either local or global correlations can be used by all instances, which is not always the case. In fact, some global label correlations may not be applicable for certain scenarios. Moreover, some partial labels cannot be easily generalizable.

Another proposed solution provided an approach to learn specific features for multi-label classification in the presence of missing labels. The design included a method to learn first label correlation that would be used to add supplementary data for the label matrix. Another method included learning a label-specific data representation for each class label and building a multi-label classifier. Each class is composed of label-specific features. Experiments conducted showed the effectiveness of label matrix completion by exploiting label correlations and learning label- specific data representation for multi-label classification with missing labels.

The experimental results show the effectiveness of some proposed solutions but, they are only applicable when label correlation is symmetric. However, label correlations can be asymmetric for many real world-applications.

In yet another approach, a labelling method is presented that is based on a plurality of samples to calculate an estimation of probability that a label can be assigned to a query sample. The method uses an input a plurality of labels to determine the candidate label among the given plurality based on the estimated probabilities of the samples. Next, the method calculates the dispersion of the estimated probability of the plurality of samples for the obtained candidate label. This is to select the target label for each sample in the plurality of samples. This method relies on input from the human to identify the plurality of labels from which it selects the candidate label and it also requires prior knowledge about the monitored system.

In yet another approach, a computing method was proposed to automatically calculate label probability for a collected observation vector. Each observation vector is associated with a maximum label probability according to a converged classification matrix. This proposed method calculates the distance between the observation vectors belonging to the same cluster from which it calculates the average distance value. Next, the method selects a predefined number of vectors that have minimum values for the average distance and removes them from the unlabeled dataset. The rest of observations are labelled using the value of the target variable for the observation vector. However, removing samples from the dataset can cause a bias when training an ML model and may hide important events that could be captured by the monitored system. In addition, this solution cannot be applied to a large dataset when several labels exist and can lead to imbalanced classes.

In yet another approach, a fully or semi-automated labelling classification system was proposed to classify and convert data to an assembly of graphic and text data forming compound datasets. The approach calculates the means of feature vectors that will be used to improve the classification task for a trained machine learning classifier. Although the proposed approach was applied for structured and unstructured documents, it still relies on a classification method that can affect the labelling results. In addition, the proposed approach cannot be applicable for time series datasets since it uses means of the collected data that can change over time, especially when data drifts.

SUMMARY

Some embodiments advantageously provide a method and system for automatic and adaptive labelling of time series data for cloud system management.

According to one aspect of the present disclosure, a method for a label manager for labeling data of a monitored system is provided. The method includes receiving a dataset identified as an anomaly and a corresponding list of features associated with the anomaly; selecting a number, n, of features out of the list of features associated with the anomaly; obtaining a label to associate to the dataset based on a comparison between the selected features and features in at least one label in a label pool, the obtaining comprising, based on the comparison, one of generating a new label to associate to the dataset and using the at least one label for the dataset; initiating a label evaluation of at least one label in the label pool; and based on a result of the label evaluation, determining whether to adjust at least one parameter associated with the at least one label.

In some embodiments of this aspect, obtaining the label to associate to the dataset further includes for each label in the label pool: comparing features names of the selected features to feature names of features in the label; comparing distribution types of the selected features to distribution types of the features in the label; calculating a deviation between at least one of the selected features and at least one of the features in the label; and based on the comparisons, the calculating and at least one predetermined parameter, one of generating the new label to associate to the anomalous dataset and using the label for the anomalous dataset.

In some embodiments of this aspect, obtaining the label to associate to the dataset further includes: for each label in the label pool: matching features names of the selected features to feature names of features in the label; matching distribution types of the selected features to distribution types of the features in the label; determining a mean deviation between each matching feature; and based on the matchings, the determined mean deviation and at least one predetermined parameter, one of generating the new label to associate to the anomalous dataset and using the label for the anomalous dataset.

In some embodiments of this aspect, the at least one predetermined parameter includes at least one of: a threshold parameter for feature name matching; a threshold parameter for distribution type matching; and a threshold parameter for matching feature mean deviation. In some embodiments of this aspect, receiving the data and the corresponding list of features includes receiving the data as an output of an anomaly detector. In some embodiments of this aspect, selecting further includes determining the number, n, of features out of the list of features having a highest impact on the anomaly. In some embodiments of this aspect, at least one of n, the threshold parameter for feature name matching, the threshold parameter for distribution type matching and the threshold parameter for matching feature mean deviation is a configurable parameter.

In some embodiments of this aspect, the dataset is comprised of time series data. In some embodiments of this aspect, the at least one label includes: a label identifier, a window size, a sample rate, a features list, at least one label key, at least one key performance indicator and at least one adjustable parameter; and the features list includes, for each feature in the features list, a feature name, a distribution type and data for the feature.

According to another aspect of the present disclosure, a method for label testing and label evaluation for a monitored system is provided. The method includes receiving, from a label manager, a request to initiate a label evaluation of at least one label in a label pool; obtaining a test case from a test database, the test case being associated with an expected label; providing, to the label manager, the expected label for the test case; and executing the test case on the monitored system to allow the label manager to determine whether the expected label is produced by the label manager using the test case.

According to yet another aspect of the present disclosure, a computing device implemented as a label manager for labeling data is provided. The computing device includes processing circuitry. The processing circuitry includes a processor and a memory and the processing circuitry is configured to cause the computing device to receive a dataset identified as an anomaly and a corresponding list of features associated with the anomaly; select a number, n, of features out of the list of features associated with the anomaly; obtain a label to associate to the dataset based on a comparison between the selected features and features in at least one label in a label pool, the obtaining comprising, based on the comparison, one of generating a new label to associate to the dataset and using the at least one label for the dataset; initiate a label evaluation of at least one label in the label pool; and based on a result of the label evaluation, determine whether to adjust at least one parameter associated with the at least one label.

In some embodiments of this aspect, the processing circuitry is configured to obtain the label to associate to the dataset by being configured to cause the computing device to: for each label in the label pool: compare features names of the selected features to feature names of features in the label; compare distribution types of the selected features to distribution types of the features in the label; calculate a deviation between at least one of the selected features and at least one of the features in the label; and based on the comparisons, the calculating and at least one predetermined parameter, one of generate the new label to associate to the anomalous dataset and use the label for the anomalous dataset.

In some embodiments of this aspect, the processing circuitry is configured to obtain the label to associate to the dataset by being configured to cause the computing device to: for each label in the label pool: match features names of the selected features to feature names of features in the label; match distribution types of the selected features to distribution types of the features in the label; determine a mean deviation between each matching feature; and based on the matchings, the determined mean deviation and at least one predetermined parameter, one of generate the new label to associate to the anomalous dataset and use the label for the anomalous dataset.

In some embodiments of this aspect, the at least one predetermined parameter includes at least one of: a threshold parameter for feature name matching; a threshold parameter for distribution type matching; and a threshold parameter for matching feature mean deviation. In some embodiments of this aspect, the processing circuitry is configured to cause the computing device to receive the data and the corresponding list of features by being configured to cause the computing device to receive the data as an output of an anomaly detector. In some embodiments of this aspect, the processing circuitry is configured to select by being further configured to cause the computing device to: determine the number, n, of features out of the list of features having a highest impact on the anomaly. In some embodiments of this aspect, at least one of n, the threshold parameter for feature name matching, the threshold parameter for distribution type matching and the threshold parameter for matching feature mean deviation is a configurable parameter.

According to another aspect of the present disclosure, a computing device implemented as a label tester for a monitored system is provided. The computing device includes processing circuitry. The processing circuitry includes a processor and a memory and the processing circuitry is configured to cause the computing device to receive, from a label manager, a request to initiate a label evaluation of at least one label in a label pool; obtain a test case from a test database, the test case being associated with an expected label; provide, to the label manager, the expected label for the test case; and execute the test case on the monitored system to allow the label manager to determine whether the expected label is produced by the label manager using the test case.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding of the present embodiments, and the attendant advantages and features thereof, will be more readily understood by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein:

FIG. 1 is a schematic diagram of an example system illustrating a system according to the principles in the present disclosure;

FIG. 2 is a block diagram of a computing device implemented as a label manager in communication with a computing device implemented as a label tester over a connection according to some embodiments of the present disclosure;

FIG. 3 is a flowchart of an example method for a label manager according to one embodiment of the present disclosure;

FIG. 4 is a flowchart of an example method for a label tester according to one embodiment of the present disclosure;

FIG. 5 shows an example of a label structure according to one embodiment of the present disclosure;

FIG. 6 shows an example of a label according to one embodiment of the present disclosure; FIG. 7 shows an example of a label manager according to one embodiment of the present disclosure;

FIG. 8 shows an example of label creation and matching logic according to one embodiment of the present disclosure;

FIG. 9 shows a sequence diagram of an example of parameter initialization according to one embodiment of the present disclosure;

FIG. 10 shows an example of a label parameter initialization procedure (threshold ‘th’=2) according to one embodiment of the present disclosure;

FIG. 11 shows an example of a label evaluation according to one embodiment of the present disclosure;

FIG. 12 shows an example label tester and evaluator according to one embodiment of the present disclosure; and

FIG. 13 shows an example of a label evaluator state machine according to one embodiment of the present disclosure.

DETAILED DESCRIPTION

As discussed above, traditional label learning methods typically require the labeler to provide a class label for each input system. This assumes that the labelers are experts and have prior knowledge about the monitored system to guarantee the correctness of the output labels. This results in expensive labelling costs and requires a large amount of manual configuration and knowledge. In addition, such traditional label learning methods cannot be generalizable for large data sets, especially for large systems.

A classification method is one of the most common approaches in several domains for labelling data. In fact, classification methods are used to identify to which of a set of categories, a new observation or sample query belongs. This approach could reduce the labelling cost associated with the manual labelling methods. However, it still requires some input from the human and the classification process or the multi-label may suffer from class-imbalance issues.

Other labelling approaches are based on different metrics to assign labels to the input dataset. For example, such approaches use correlation between the labels to conduct the labelling task and derive new labels. However, the correlation between labels would provide better results only when label correlation is symmetric. Also, label correlations can be asymmetric for many real world-applications. Other approaches include calculating an estimation of probability that a label can be assigned to a query sample. But these solutions can be expensive and even computationally unaffordable for data sets with many labels.

Overall, existing labelling approaches are not fully-automated and still rely on input from a human to assign tags (labels) to the input queries. In addition, existing labelling approaches use computationally expensive solutions that cannot be generalized for large systems and may lead to imbalanced labels, which may affect the training results for an ML model. In addition, the known proposed approaches are not adaptive and do not adjust their procedures according to changes that may be experienced by the monitored system (e.g., drift in time series datasets).

Considering the above problems, automatic and adaptive labelling poses an open challenge for ML users to design efficient and accurate models to deal with fault management in any cloud/software platform system. Thus, it is of interest to design an automatic and adaptive labelling system for time series datasets in, e.g., cloud systems.

Some embodiments of the present disclosure provide a system and a method that can label the time series data collected from a cloud system into a number of states/classes. Some embodiments also continuously improve the label performance via an automatic label evaluation and parameter adjustment procedure. Some embodiments analyze the outlier/anomalous part of the data collected from the cloud system and calculate the data pattern leading to the anomaly and classify the anomalies based on the pattern.

An anomaly detector may be used to define the anomalies. A label manager may refine the anomalies into multiple classes. The label manager may also match a data sample onto a label type, label the data samples and store the labelled data. A label tester/evaluator may be used for assisting the label manager in generating proper labels and evaluating the labels. In an evaluation process, the label manager may request the label tester/evaluator to execute test cases that will generate some types of system anomalies. By comparing the expected labels to the generated labels, the label manager may be trained and adjusted to produce the expected labels. Some embodiments may provide a system that automatically labels time series data samples.

Some embodiments include generating and matching labels, which may enable a fully automatic labelling procedure. Meanwhile, some embodiments generate label keys so that the labels can be human understandable.

Some embodiments provide a system that continues to improve its label accuracy via an automatic label evaluation and parameter adjustment procedure.

Some embodiments provide a labeling system that only analyzes the data samples containing anomaly (outlier) cases, which can be lightweight. Such system may also reduce the possibility of classifying an unbalanced data set.

In some embodiments, the granularity of the labels is adjustable (e.g., based on a user’s requirement), which may allow a portability of the method to different types of cloud systems.

Before describing in detail exemplary embodiments, it is noted that the embodiments reside primarily in combinations of apparatus components and processing steps related to automatic and adaptive labelling of time series data for cloud system management. Accordingly, components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.

As used herein, relational terms, such as “first” and “second,” “top” and “bottom,” and the like, may be used solely to distinguish one entity or element from another entity or element without necessarily requiring or implying any physical or logical relationship or order between such entities or elements.

As used herein, relational terms, such as “first” and “second,” “top” and “bottom,” and the like, may be used solely to distinguish one entity or element from another entity or element without necessarily requiring or implying any physical or logical relationship or order between such entities or elements. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the concepts described herein. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In embodiments described herein, the joining term, “in communication with” and the like, may be used to indicate electrical or data communication, which may be accomplished by physical contact, induction, electromagnetic radiation, radio signaling, infrared signaling or optical signaling, for example. One having ordinary skill in the art will appreciate that multiple components may interoperate, and modifications and variations are possible of achieving the electrical and data communication.

In some embodiments described herein, the term “coupled,” “connected,” and the like, may be used herein to indicate a connection, although not necessarily directly, and may include wired and/or wireless connections.

The term “computing device” used herein can be any kind of device, such as, for example, one or more processors (e.g., single or multi-core processor), processing circuitry, a network node, a server, etc.

Note that functions described herein as being performed by a computing device may be distributed over a plurality of devices. In other words, it is contemplated that the functions of a computing device, such as label manager, label tester, or any other nodes described herein are not limited to performance by a single physical device and, in fact, can be distributed among several physical devices.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Some embodiments of the present disclosure provide automation to the labelling process and provide for a system and a method that may reduce the costs associated with labelling. Some embodiments may focus only on the classification of anomalies. In some embodiments, the differentiation between the normal case and anomalies may be implemented via any anomaly detection techniques. For example, some embodiments may include training a model with only normal data, defining a threshold of anomaly, and using the trained model to determine whether an online data sample is a normal case or an anomaly.

Some embodiments provide for a system and a method that automatically generates anomaly label types, labels the time series data and stores the labelled data. Some embodiments may also evaluate the generated labels and adjusts label parameters to reflect the system changes over time. One example of the overall architecture in which some embodiments of the present disclosure may be implemented in is described below.

Referring to the drawing figures, in which like elements are referred to by like reference numerals, there is shown in FIG. 1 a schematic diagram of an example network 10, according to an embodiment, constructed in accordance with the principles of the present disclosure. The network 10 in FIG. 1 is a non-limiting example and other embodiments of the present disclosure may be implemented by one or more other systems and/or networks. The network 10 includes a monitored system 12 (e.g., a cloud system under monitoring), an anomaly detector 14, a label manager 16, a label tester/evaluator 18 (referred to herein for the sake of brevity as “label tester 18”), a label pool 20 and a labelled data store 22.

The anomaly detector 14 is trained to label the outliers out of the normal state of the monitored system 12. An anomaly can be a faulty state, a performance spike, an SLA violation or any system state that falls out of a normal range. The output of the anomaly detector 14 is the data sample(s) together with a raw label(s) of ‘normal’ or ‘abnormal’. In the case of ‘abnormal’, the raw label may also include a list of features that caused the anomaly.

In some embodiments, the anomaly detector 14 may include a model that is trained from the same historical state data that is used to detect anomalous states via unsupervised learning. The choice of model for anomaly detection may depend on the type and characteristics of the collected data. One example is to use multi-variate gaussian mixture model. In that case, the data can be collected from a fault- free system. The model trained with the data may identify clusters to which fault free states of the system belong. The state of a live, monitored system 12 may then be evaluated against these models to classify the state as normal or anomalous.

In some embodiments, the label manager 16 is responsible for refining the raw ‘abnormal’ label into multiple classes of labels. The inputs of the label manager 16 are the online data sample with the created raw label (‘normal’ or ‘abnormal’). Using the historical data from the monitored system 12 [showing a normal state], the label manager 16 may create a correlation matrix to measure the dependencies between the features list. Next, in some embodiments, the label manager 16 identifies the list of deviated features in the case of a detected anomaly. Using current data sample’s time stamp, the label manager 16 then retrieves a chunk of historical data, with a specific size (e.g.,10 minutes (min), 15 minutes, 20 minutes, etc.), from the monitored system 12, to determine the pattern of change caused by the anomaly within a specific time interval. This may be performed to measure the similarity between different anomalies and group them under the same label group in the future. Particularly, the data chunk may be used for either (1) for creating a new label for the online data sample in the case of a different anomaly/change in the monitored system 12; and/or (2) for matching to an existing label if there is such a match in the label pool 20.

In some embodiments, an ‘abnormal’ label is stored in the label pool 20 once it is created. The label does not only contain the data sample values, but also the data patterns that lead the monitored system 12 to the anomalous state, and the reference Key Performance Indicator (KPI) values (e.g., an application response time and a system delay) when the anomaly happens.

The label manager 16 may also be responsible for labelling the data sample and storing the data sample in the labelled data store 22. The labelled data can be used by the monitored system 12, e.g., for training a fault detection/prediction model or training an SLA prediction model.

In some embodiments, the label tester 18 may be a computing device responsible for testing the monitored system 12 and evaluating the quality of the labels. Test cases (such as fault injection, performance load or some test suites) are executed by the label tester 18. The test cases will generate anomalous events under a ‘normal’ system. With the events injected, the label tester 18 stores the expected ‘actual’ labels and the label manager 16 will use the labels as an evaluation base. Note that testing is generally an expensive process; thus, in some embodiments, the testing and evaluation process may only be executed to train a new label manager 16, or when a label manager 16 specifically requests it.

In some embodiments, from time to time (e.g., once per day or once per week or some other time period), the label manager 16 may request a label evaluation process. Based on the type of the labels to evaluate, the label tester 18 will execute proper test cases and obtain the expected labels. Once the test is complete, the label tester 18 informs the label manager 16 and the label manager 16 compares the expected labels to the generated labels and makes an evaluation. The evaluation results may trigger parameter adjustment procedures.

Referring now to FIG. 2, certain elements of an embodiment of example network 10 in accordance with the present disclosure are shown. The network 10 of FIG. 2 includes a label manager 16 and a label tester 18. Note that although only a single label manager 16 and a single label tester 18 are shown for convenience, the network 10 may include many more label managers 16 and label testers 18.

In accordance with an embodiment, example implementations of the label manager 16 and label tester 18 discussed in the preceding paragraphs will now be described with reference to the example network 10 depicted in FIG. 2.

The label manager 16 includes (and/or uses) a communication interface 24, processing circuitry 26, and memory 28. In some embodiments, the communication interface 24 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface. In some embodiments, the communication interface 24 may include a wired interface, such as one or more network interface cards.

The processing circuitry 26 may include one or more processors 30 and memory, such as, the memory 28. In particular, in addition to a traditional processor and memory, the processing circuitry 26 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions. The processor 30 may be configured to access (e.g., write to and/or read from) the memory 28, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).

Thus, the label manager 16 may further include software stored internally in, for example, memory 28, or stored in external memory (e.g., storage resource in the cloud) accessible by the label manager 16 via an external connection. The software may be executable by the processing circuitry 26. The processing circuitry 26 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the label manager 16. The memory 28 is configured to store data, programmatic software code and/or other information described herein. In some embodiments, the software may include instructions stored in memory 28 that, when executed by the processor 30 and/or labelling unit 32, causes the processing circuitry 26 and/or configures the label manager 16 to perform the processes described herein with respect to the label manager 16 (e.g., processes described with reference to FIG. 3 and/or any of the other figures).

The label tester 18 includes (and/or uses) a communication interface 34, processing circuitry 36, and memory 38. In some embodiments, the communication interface 34 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface. In some embodiments, the communication interface 34 may include a wired interface, such as one or more network interface cards.

The processing circuitry 36 may include one or more processors 40 and memory, such as, the memory 38. In particular, in addition to a traditional processor and memory, the processing circuitry 36 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions. The processor 40 may be configured to access (e.g., write to and/or read from) the memory 38, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).

Thus, the label tester 18 may further include software stored internally in, for example, memory 38, or stored in external memory (e.g., storage resource in the cloud) accessible by the label tester 18 via an external connection. The software may be executable by the processing circuitry 36. The processing circuitry 36 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the label tester 18. The memory 38 is configured to store data, programmatic software code and/or other information described herein. In some embodiments, the software may include instructions stored in memory 38 that, when executed by the processor 40 and/or testing unit 42, causes the processing circuitry 36 and/or configures the label tester 18 to perform the processes described herein with respect to the label tester 18 (e.g., processes described with reference to FIG. 4 and/or any of the other figures).

In FIG. 2, the connection between the label manager 16 and label tester 18 is shown without explicit reference to any intermediary devices or connections. However, it should be understood that intermediary devices and/or connections may exist between these devices, although not explicitly shown.

Although FIG. 2 shows labelling unit 32 and testing unit 42, as being within a respective processor, it is contemplated that these elements may be implemented such that a portion of the elements is stored in a corresponding memory within the processing circuitry. In other words, the elements may be implemented in hardware or in a combination of hardware and software within the processing circuitry. In one embodiment, one or more of the label manager 16 and label tester 18 may be implemented as, or may include an application, a program, software or other set of instructions executable by the respective processor(s) according to the techniques disclosed herein.

FIG. 3 is a flowchart of an example process for a computing device implemented as a label manager 16. One or more Blocks and/or functions and/or methods performed by the label manager 16 may be performed by one or more elements of label manager 16 such as by labeling unit 32 in processing circuitry 26, processor 30, memory 28, communication interface 24, etc. according to the example method. The example method includes receiving (Block S100), such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, a dataset identified as an anomaly and a corresponding list of features associated with the anomaly. The method includes selecting (Block S102), such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, a number, n, of features out of the list of features associated with the anomaly. The method includes obtaining (Block S104), such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, a label to associate to the dataset based on a comparison between the selected features and features in at least one label in a label pool, the obtaining comprising, based on the comparison, one of generating a new label to associate to the dataset and using the at least one label for the dataset. The method includes initiating (Block S106), such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, a label evaluation of at least one label in the label pool. The method includes based on a result of the label evaluation, determining (Block S108), such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, whether to adjust at least one parameter associated with the at least one label.

In some embodiments, obtaining the label to associate to the dataset further includes for each label in the label pool: comparing, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, features names of the selected features to feature names of features in the label; comparing, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, distribution types of the selected features to distribution types of the features in the label; calculating, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, a deviation between at least one of the selected features and at least one of the features in the label; and based on the comparisons, the calculating and at least one predetermined parameter, one of generating the new label to associate to the anomalous dataset and using the label for the anomalous dataset. In some embodiments, obtaining the label to associate to the dataset further comprises for each label in the label pool: matching, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, features names of the selected features to feature names of features in the label; matching, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, distribution types of the selected features to distribution types of the features in the label; determining, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, a mean deviation between each matching feature; and based on the matchings, the determined mean deviation and at least one predetermined parameter, one of generating, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, the new label to associate to the anomalous dataset and using, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, the label for the anomalous dataset.

In some embodiments, the at least one predetermined parameter comprises at least one of: a threshold parameter for feature name matching; a threshold parameter for distribution type matching; and a threshold parameter for matching feature mean deviation. In some embodiments, receiving, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, the data and the corresponding list of features includes receiving the data as an output of an anomaly detector. In some embodiments, selecting further includes determining, such as via labeling unit 32, processing circuitry 26, processor 30, memory 28 and/or communication interface 24, the number, n, of features out of the list of features having a highest impact on the anomaly. In some embodiments, at least one of n, the threshold parameter for feature name matching, the threshold parameter for distribution type matching and the threshold parameter for matching feature mean deviation is a configurable parameter.

In some embodiments, the dataset is comprised of time series data. In some embodiments, the at least one label comprises: a label identifier, a window size, a sample rate, a features list, at least one label key, at least one key performance indicator and at least one adjustable parameter; and the features list comprises, for each feature in the features list, a feature name, a distribution type and data for the feature.

FIG. 4 is a flowchart of an example process for a computing device implemented as a label tester 18. One or more Blocks and/or functions and/or methods performed by the label tester 18 may be performed by one or more elements of label tester 18 such as by testing unit 42 in processing circuitry 36, processor 40, memory 38, communication interface 34, etc. according to the example method. The example method includes receiving (Block SI 10), such as via testing unit 42, processing circuitry 36, processor 40, memory 38 and/or communication interface 34, from a label manager 16, a request to initiate a label evaluation of at least one label in a label pool. The method includes obtaining (Block S 112), such as via testing unit 42, processing circuitry 36, processor 40, memory 38 and/or communication interface 34, a test case from a test database, the test case being associated with an expected label. The method includes providing (Block SI 14), such as via testing unit 42, processing circuitry 36, processor 40, memory 38 and/or communication interface 34, to the label manager, the expected label for the test case. The method includes executing (Block SI 16), such as via testing unit 42, processing circuitry 36, processor 40, memory 38 and/or communication interface 34, the test case on the monitored system to allow the label manager 16 to determine whether the expected label is produced by the label manager 16 using the test case.

Having described some embodiments for automatic and adaptive labelling of time series data for cloud system management, a more detailed description of some of the embodiments are described below, which may be implemented by label manager 16, label tester 18 and/or any other elements/devices in the present disclosure.

Label Structure

FIG. 5 shows an example structure of a label, which may be stored in the label pool 20 (FIG. 1). The label may include one or more of the following fields:

-Label Identifier: a unique number used for identifying the label. This number is used to label the data sample.

-Window Size: a number in time units, e.g., seconds. This defines a time window and the data chunk collected in the window is expected to show the pattern from normal state to the anomaly. The window size controls the granularity of the labelling. If the window is too large, it may cause too few labels created than expected. Also, it may lower the speed and performance of the labelling. If the size is too small, the data chunk is not large enough to represent a correct distribution of the data pattern to the anomaly and it may cause creating more labels than expected.

-Sample Rate: the number of data samples to be collected per second.

-Feature List: a list of features having high impacts on the anomaly. For each feature, the feature list includes a feature name, a values list (the list length = Window Size / Sample Rate), and a distribution type calculated from the value list.

-Label Keys: human understandable key words. The keys are retrieved from feature names. The keys may be generated simply based on the frequency of appearing in the feature names, or based on some complex machine reasoning techniques.

-Reference KPI List: a list of key/value pair showing the system KPI when the anomaly label is created. This may assist a human to understand the impact of a type of anomaly.

-Parameters: a set of parameters used by the label manager 16 for label creation and matching.

FIG. 6 shows an example label, where label ‘123’ includes a 15 minute data chunk, leading to a ‘CPU Memory load’ anomaly, affecting the response_time to 200 milliseconds (ms) and monitored system 12 delay to 100 milliseconds. The meaning of the parameters ‘a’ ~ ‘d’ shown in FIG. 6, are described with reference to FIG. 8, and the meaning of parameter ‘n’ is described in more detail in the sections below.

Label Management

In some embodiments, the label manager 16 is the computing device that refines the anomaly labels, labels the data samples, and/or stores the labelled samples in the labelled data store 22. FIG. 7 shows an example of the functional entities that may be included in a label manager 16, processing circuitry 26 and/or the labelling unit 32 depicted in FIG. 2, such as, a correlation analyzer 44, label generator and matcher 46, label adjuster 48 and data labeler 50.

With the input of a list of features deviated in the anomalous state, the correlation analyzer 44 is responsible for outputting the top ‘n’ features that have the highest impacts on the anomaly, where ‘n’ may be a configurable parameter. In order to achieve this, the correlation analyzer 44 may maintain a correlation matrix with all the features. The matrix may be initially calculated using the same ‘normal’ data as what is used by the anomaly detector 14. With the anomalies happening in the monitored system 12, a new matrix may be calculated from time to time (e.g., per day) with the ‘normal’ and ‘abnormal’ data. An example logic of the feature selection function is described in more detail in section a below.

In some embodiments, with the current data sample and the ‘n’ selected features, the label generator and matcher 46 may be configured to create new labels, modify a label and/or save the labels in the label pool 20. The label generator and matcher 46 may also match an anomaly label to a data sample and send the data and the label to the data labeler 50. An example logic of the label generator and matcher 46 is described in section b below.

The data labeler 50 may be an entity that labels the data sample. The data labeler 50 may add the label to the data sample and store the sample in the labelled data store 22. Note that the data received directly from the anomaly detector 14 may be ‘normal’ data and there may be a special label identifier (ID) (e.g., 0) reserved for the ‘normal’ data sample.

The label adjuster 48 may be configured to initialize and modify label parameters. In some embodiments, periodically (e.g., once per day), the label adjuster 48 requests the label tester 18 to evaluate the labels and based on the evaluation results, the label adjuster 48 may request the label generator and matcher 46 to adjust the label parameters and to modify the labels in the label pool 20. An example label parameter initialization and adjustment procedure is described in section c below. a) Example of Feature Selection Logic

The correlation analyzer 44 may be configured to select the number, ‘n’, of features. An example logic (e.g., in pseudo code) is as follows:

Receive a new ‘abnormal’ data sample with a number, m, of deviated features;

IF m == n:

Return the m features;

ELIF m < n:

Find n-m features from the correlation matrix, each with a highest correlation value with one of the m features. Combine the m features and the new found features;

Return the combined features;

ELSE:

Select and return n (larger deviated) features out of m features;

ENDIF. b) An Example Logic of Label Generation and Matching

The label generator and matcher 46 may create labels based on the name, the data range and the distribution of the deviated features. FIG. 8 shows an example logic of the label generation and matching, which may be performed by the label generator and matcher 46 in the label manager 16. In FIG. 8, parameters ‘a’, ‘b’, ‘c’, and ‘d’ control the classification granularity of the anomaly labels. For example, in step S 118, the label generator and matcher 46 in the label manager 16 may obtain a data chunk from the monitored system 12 based on the current data’s time stamp, the window size and the sample rate. In step S120, the label generator and matcher 46 in the label manager 16 may determine whether there is a next label in the label pool 20. In step S122, if the answer is ‘yes’, the label generator and matcher 46 in the label manager 16 may read the next label from the label pool 20.

In step S124, the label generator and matcher 46 in the label manager 16 may determine whether there is at least ‘a’ percentage of feature name matching between the feature names in the next label and the obtained data. In step S126, if the answer is ‘yes’, the label generator and matcher 46 in the label manager 16 may identify distribution types of the matching features. In step S128, the label generator and matcher 46 in the label manager 16 may determine whether there is at least ‘b’ percentage of feature distribution types matching between the next label and the obtained data. In step S130, if the answer is ‘yes’, the label generator and matcher 46 in the label manager 16 may calculate a mean deviation for each matching feature. In step S134, the label generator and matcher 46 in the label manager 16 may determine if there is a ‘c’ percentage of features’ mean deviations less than a predetermined threshold ‘d’. If the answer is ‘yes’, the label generator and matcher 46 in the label manager 16 may proceed to step S136, where a label identifier and data is sent to data labeler 50. If any of steps S124, S128, S130 and S134 results in a ‘no’ answer, the process may return to step S120, where the label generator and matcher 46 in the label manager 16 may attempt to obtain yet another next label in the label pool 20 and the process may repeat. If the answer to step S120 is ‘no’ (e.g., no more labels left in the label pool 200 to compare to the obtained data), the process may proceed to step S138, where a new label is created. For the new label, the label generator and matcher 46 in the label manager 16 may assign a new label identifier, identify distribution types of each feature, retrieve KPI values from the monitored system 12, generate label keys based on feature names, and insert the new label to the label pool 20.

The initial values for the different parameters ws, n, a, b, c and d may be trained via a parameter initialization process, described below in section c. c) Parameter Initialization

Parameters: ‘window size’ (‘ws’), ‘n’, ‘a’, ‘b’, ‘c’, ‘d’ (see definitions in previous sections) are to be trained so that the label generator and matcher 46 in the label manager 16 can generate proper labels. The label adjuster 48 may start the process when a new label manager 16 is created, or the label types are out of date.

FIG. 9 shows an example sequence of the parameter initialization process.

In the example, in step S140, the label adjuster 48 sends an ‘init’ command to the label tester 18, that will generate sets of test cases. Each test case set will generate an anomalous system state that can be mapped onto a label. The test set can be an anomaly injection, a load test or some test suits. Labels are trained one by one, until the parameters are properly selected so that the label manager 16 can label the new test data correctly without changing the parameters for a threshold ‘th’ of times. For example, in step S142, test cases are created. In step S144, label tester 18 indicates that the test cases creation is complete. In step S146, label adjuster 48 sets parameters to initial values, wsO, aO, bO, cO and dO. In step S148, label adjuster 48 sets the initial value for the n parameter to nO. In step S 150, a variable unchanged is set to 0. While unchanged < threshold th (e.g., th=2), the following looped steps may be performed as shown in FIG. 9. In step S152, label adjuster 48 may initiate starting a test for a next label. In step S154, label adjuster 48 waits until the test is complete, and enough samples are collected. In step S156, label tester 18 runs the test and collects data samples. In step S158, a test done indication is sent to the label adjuster 48 when the test is complete along with the data samples and expected labels. In step S160, a variable changed is set to 0.

The following steps may be executed in an inner loop, as shown in FIG. 9, where, for next parameter set j in the parameter range, the inner loop process includes, in step S162, label adjuster 48 sends test data samples (nj) to the correlation analyzer 44. In step S164, label adjuster 48 waits until the test is complete (e.g., a time out).

In step S166, the label data samples including data with the top nj features are sent to the label generator and matcher 46. In step S 168, the label generator and matcher 46 uses the label pool 20 to create and/or match labels. In steps S170 and S172, the label data samples are sent to the data labeler 50 and then to the labelled data store 22. In step S174, label adjuster 48 obtains the generated labels and then, in step S176, compares the generated labels to the expected labels to determine whether the labels are matching. For example, if the variable changed is equivalent to 0, then the variable unchanged is incremented and the inner loop breaks and returns to the outerloop; else, changed is incremented, the parameters are adjusted incrementally, and the inner loop continues. For example, in step S178, the parameters are incremented by e.g., set_parameters (wsj+1, aj+1, bj+1 and dj+1). In some embodiments, the process may end when the threshold is reached, e.g., when threshold reaches e.g., 2, which may mean that for 2 label creation cycles, the parameters were not adjusted by the process and may therefore be considered appropriately initialized.

FIG. 10 illustrates an example of a rough procedure of the parameter initialization process described in FIG. 9. In FIG. 10, each diagram a, b, c, d, e and f shows a result of the multi-step parameter change process and the correspondent labels (LI, L2, L3, L4, L5, L6, respectively) being created. In the last two diagrams e and f, when the label manager 16 creates a new label, there are no more parameter changes. In such a case, if the threshold ‘th’ is set to 2, the parameter initialization procedure will terminate. Note that the diagrams in FIG. 10 are two-dimensional for illustrative purpose only. In practice, the diagrams may be multi-dimensional.

Parameter Adjustment and Label Modification From time to time (e.g., per week or per day or some other time period), the label adjuster 48 may request the label tester 18 to evaluate all or part of the labels. FIG. 11 shows an example sequence diagram of the label adjuster 48 requesting, in step S180, for an evaluation of one label (i.e., label_i). In response, the label tester 18 will generate/create one test set for label_i, such as in step S182. In step SI 84, an indication of the creation of the test case along with the expected labels’ properties may be sent to label adjuster 48. In step S186, label adjuster 48 may initiate to start the test case for label_i. In step S188, label tester 18 may execute the test. In step S190, label tester 18 may indicate to label adjuster 48 that the test is complete, including the start time and the actual expected labels. In step S192, label adjuster 48 may obtain the generated labels including the start time and number of samples (e.g., the labels generated by label manager 16).

After testing, in step S194, the label adjuster 48 compares the generated labels with the expected labels. Based on the comparation result, in step S196, the label adjuster 48 may determine to request the label generator and matcher 46 to 1) flush the label_i if there is nothing matching, or 2) update some parameters in the label type to adjust the label or 3) do nothing if the label_i reflects the system state under testing. In the first case, a new label initialization process may be triggered for label_i. In step S198, the label may be sent to the label pool 20 for storage.

Label Comparison

An example logic of the label comparison process in the label adjuster 48 is provided below, as follows:

Receive the actual label list LI and generated label list L2;

LI includes a number, x, types of labels, L2 includes a number, y, types of labels;

IF x == y;

IF %p of label matching < threshold 1;

Labels are deviated, send update_param([reference value list, reference KPI list, label_i]) to label generator;

END IF;

ELIF |x - y| > threshold2;

Labels are out of date, send flush([label_i]) to label generator; ELIF x > y;

Labels are too coarse. Increase a, b, c, or d, decrease ws, or any combination thereof;

Send update_param([params, label_i]);

ELSE

Labels are too fine. Decrease a, b, c, or d, increase ws, or any combination thereof;

Send update_param([params, label_i]);

ENDIF.

Label Tester/Evaluator

The label tester 18 may be capable of putting the monitored system 12 in several known anomalous states. This may then be used to initialize the parameters of the label creation component (e.g., label generator and matcher 46 in label manager 16) by putting the monitored system 12 in different anomalous states and computing the parameters such that all samples from each anomalous state are assigned the same label while samples from different states are assigned different labels. This process may be repeated the threshold ‘th’ times, until the label manager 16 determines that the parameters are correct, i.e., any samples from subsequent anomalous states will be assigned a unique label if the anomaly is sufficiently different from others. The label tester 18 may also be responsible for injecting anomalous states in the monitored system 12 for label evaluations, where one or multiple labels can be evaluated.

The functional entities of the label tester 18, which may be included in the processing circuitry 36 and/or testing unit 42 depicted in FIG. 2, are shown in FIG.

12, which includes tester 52, label evaluator 54 and test base 56. The label evaluator 54 is responsible for receiving the requests from and sending responses to the label manager 16. Upon receiving the ‘init’ or ‘evaT requests, the label evaluator 54 retrieves the test cases from the test base 56 where the test cases are stored. Upon receiving the ‘start test’ requests, the label evaluator 54 may request the tester 52 to execute a specific test case(s) upon the monitored system 12. FIG. 13 shows an example logic (state machine) of the label evaluator 54, which states may include, for example, waiting; upon receiving the start test indicator, testing; when the testing is done, generating results; when results are generated, responding to the label manager 16; and upon receiving an initialization/evaluation request, creating test cases.

In FIG. 12, the test base 56 stores the test cases, each of which may generate a unique monitored system 12 anomalous state. For example, ‘Case G in the example table in FIG. 12 shows a central processing unit (CPU) stress test case that may be executed in a duration of 3+3 minutes. The whole test may last 1 hour and during the hour, 5 of such CPU stress test cases may be executed.

In some embodiments, it may be assumed that the test cases are created by the monitored system 12, which also maintains and updates the test cases.

The various components/entities described herein in the network 10, e.g., within or connected to label manager 16 and label tester 18 (e.g., correlation analyzer 44, label generator and matcher 46, label adjuster 48, data labeler 50, label pool 20, labelled data store 22, tester 52, label evaluator 54, test base 56, etc.) may be deployed as, for example, microservices or virtual machines (VMs) running in a cloud system.

The arrangements discussed herein may be deployed in distributed cloud systems and edge cloud systems, preferably close to the data collection and storage points. Some embodiments are implemented in a cloud computing environment, and the functionality described herein with respect to each device, e.g., label manager 16 and label tester 18, may be implemented by physical devices and/or resources (e.g., compute, network, storage, etc.) distributed within the cloud computing environment.

A system and a method that 1) automatically labels time series data collected from cloud systems and 2) automatically evaluates and adjust the labels to reflect the expectation is provided. In some embodiments, the system includes:

-a label manager 16 that creates an anomaly label type, matches a data sample into a label type, and labels the data samples (the label type may include a label ID and the current data sample’s values, as well as, the data pattern/distribution of the anomaly, the monitored system’s 12 KPI values, the label keys and the adjustable label parameters that control the granularity of the labels);

-the label manager 16 may request a label tester 18 to assist in label parameter initialization, adjustment, label creation and evaluation; -a label tester 18 may evaluate labels and create test cases that can generate anomalies upon the monitored system 12 (the label tester 18 may also send expected labels to the label manager 16); and

-the label manger 18, as a result of receiving the expected labels, compares the expected labels with the generated labels in order to adjust label parameters to improve the labelling accuracy.

Abbreviations that may be used in the preceding description include:

Abbreviation Explanation

ML Machine Learning

KPI Key Performance Indicator

SLA Service Level Agreement ws Window Size

As will be appreciated by one of skill in the art, the concepts described herein may be embodied as a method, data processing system, and/or computer program product. Accordingly, the concepts described herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects all generally referred to herein as a “circuit” or “module.” Lurthermore, the disclosure may take the form of a computer program product on a tangible computer usable storage medium having computer program code embodied in the medium that can be executed by a computer. Any suitable tangible computer readable medium may be utilized including hard disks, CD-ROMs, electronic storage devices, optical storage devices, or magnetic storage devices.

Some embodiments are described herein with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable memory or storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

It is to be understood that the functions/acts noted in the blocks may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

Computer program code for carrying out operations of the concepts described herein may be written in an object oriented programming language such as Java® or C++. However, the computer program code for carrying out operations of the disclosure may also be written in conventional procedural programming languages, such as the "C" programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer. In the latter scenario, the remote computer may be connected to the user's computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). Many different embodiments have been disclosed herein, in connection with the above description and the drawings. It will be understood that it would be unduly repetitious and obfuscating to literally describe and illustrate every combination and subcombination of these embodiments. Accordingly, all embodiments can be combined in any way and/or combination, and the present specification, including the drawings, shall be construed to constitute a complete written description of all combinations and subcombinations of the embodiments described herein, and of the manner and process of making and using them, and shall support claims to any such combination or subcombination. It will be appreciated by persons skilled in the art that the embodiments described herein are not limited to what has been particularly shown and described herein above. In addition, unless mention was made above to the contrary, it should be noted that all of the accompanying drawings are not to scale. A variety of modifications and variations are possible in light of the above teachings without departing from the scope of the following claims.

Claims

What is claimed is:

1. A method for a label manager (16) for labeling data of a monitored system, the method comprising: receiving (S100) a dataset identified as an anomaly and a corresponding list of features associated with the anomaly; selecting (S102) a number, n, of features out of the list of features associated with the anomaly; obtaining (S104) a label to associate to the dataset based on a comparison between the selected features and features in at least one label in a label pool, the obtaining comprising, based on the comparison, one of generating a new label to associate to the dataset and using the at least one label for the dataset; initiating (S106) a label evaluation of at least one label in the label pool; and based on a result of the label evaluation, determining (S108) whether to adjust at least one parameter associated with the at least one label.

2. The method of Claim 1, wherein obtaining the label to associate to the dataset further comprises: for each label in the label pool: comparing features names of the selected features to feature names of features in the label; comparing distribution types of the selected features to distribution types of the features in the label; and calculating a deviation between at least one of the selected features and at least one of the features in the label; and based on the comparisons, the calculating and at least one predetermined parameter, one of generating the new label to associate to the anomalous dataset and using the label for the anomalous dataset.

3. The method of Claim 1, wherein obtaining the label to associate to the dataset further comprises: for each label in the label pool: matching features names of the selected features to feature names of features in the label; matching distribution types of the selected features to distribution types of the features in the label; determining a mean deviation between each matching feature; and based on the matchings, the determined mean deviation and at least one predetermined parameter, one of generating the new label to associate to the anomalous dataset and using the label for the anomalous dataset.

4. The method of any one of Claim 2 and 3, wherein the at least one predetermined parameter comprises at least one of: a threshold parameter for feature name matching; a threshold parameter for distribution type matching; and a threshold parameter for matching feature mean deviation.

5. The method of any one of Claims 1-4, wherein receiving the data and the corresponding list of features includes receiving the data as an output of an anomaly detector.

6. The method of any one of Claims 1-5, wherein selecting further comprises determining the number, n, of features out of the list of features having a highest impact on the anomaly.

7. The method of Claim 4, wherein at least one of n, the threshold parameter for feature name matching, the threshold parameter for distribution type matching and the threshold parameter for matching feature mean deviation is a configurable parameter.

8. The method of any one of Claims 1-7, wherein the dataset is comprised of time series data.

9. The method of any one of Claims 1-8, wherein the at least one label comprises: a label identifier, a window size, a sample rate, a features list, at least one label key, at least one key performance indicator and at least one adjustable parameter; and the features list comprises, for each feature in the features list, a feature name, a distribution type and data for the feature.

10. A method for label testing and evaluation for a monitored system, the method comprising: receiving (SI 10), from a label manager (16), a request to initiate a label evaluation of at least one label in a label pool; obtaining (SI 12) a test case from a test database, the test case being associated with an expected label; providing (S 114), to the label manager (16), the expected label for the test case; and executing (SI 16) the test case on the monitored system to allow the label manager (16) to determine whether the expected label is produced by the label manager (16) using the test case.

11. A computing device (16) implemented as a label manager for labeling data, the computing device (16) comprising processing circuitry (26), the processing circuitry (26) comprising a processor and a memory (28) and the processing circuitry (26) configured to cause the computing device (16) to: receive a dataset identified as an anomaly and a corresponding list of features associated with the anomaly; select a number, n, of features out of the list of features associated with the anomaly; obtain a label to associate to the dataset based on a comparison between the selected features and features in at least one label in a label pool, the obtaining comprising, based on the comparison, one of generating a new label to associate to the dataset and using the at least one label for the dataset; initiate a label evaluation of at least one label in the label pool; and based on a result of the label evaluation, determine whether to adjust at least one parameter associated with the at least one label.

12. The computing device (16) of Claim 11, wherein the processing circuitry (26) is configured to obtain the label to associate to the dataset by being configured to cause the computing device (16) to: for each label in the label pool: compare features names of the selected features to feature names of features in the label; compare distribution types of the selected features to distribution types of the features in the label; and calculate a deviation between at least one of the selected features and at least one of the features in the label; and based on the comparisons, the calculating and at least one predetermined parameter, one of generate the new label to associate to the anomalous dataset and use the label for the anomalous dataset.

13. The computing device ( 16) of Claim 11 , wherein the processing circuitry (26) is configured to obtain the label to associate to the dataset by being configured to cause the computing device (16) to: for each label in the label pool: match features names of the selected features to feature names of features in the label; match distribution types of the selected features to distribution types of the features in the label; determine a mean deviation between each matching feature; and based on the matchings, the determined mean deviation and at least one predetermined parameter, one of generate the new label to associate to the anomalous dataset and use the label for the anomalous dataset.

14. The computing device (16) of any one of Claim 12 and 13, wherein the at least one predetermined parameter comprises at least one of: a threshold parameter for feature name matching; a threshold parameter for distribution type matching; and a threshold parameter for matching feature mean deviation.

15. The computing device (16) of any one of Claims 11-14, wherein the processing circuitry (26) is configured to cause the computing device (16) to receive the data and the corresponding list of features by being configured to cause the computing device (16) to receive the data as an output of an anomaly detector.

16. The computing device (16) of any one of Claims 11-15, wherein the processing circuitry (26) is configured to select by being further configured to cause the computing device (16) to: determine the number, n, of features out of the list of features having a highest impact on the anomaly.

17. The computing device (16) of Claim 14, wherein at least one of n, the threshold parameter for feature name matching, the threshold parameter for distribution type matching and the threshold parameter for matching feature mean deviation is a configurable parameter.

18. The computing device (16) of any one of Claims 11-17, wherein the dataset is comprised of time series data.

19. The computing device (16) of any one of Claims 11-18, wherein the at least one label comprises: a label identifier, a window size, a sample rate, a features list, at least one label key, at least one key performance indicator and at least one adjustable parameter; and the features list comprises, for each feature in the features list, a feature name, a distribution type and data for the feature.

20. A computing device (18) implemented as a label tester for a monitored system, the computing device (18) comprising processing circuitry (36), the processing circuitry (36) comprising a processor and a memory (38) and the processing circuitry (36) configured to cause the computing device (18) to: receive, from a label manager (16), a request to initiate a label evaluation of at least one label in a label pool; obtain a test case from a test database, the test case being associated with an expected label; provide, to the label manager (16), the expected label for the test case; and execute the test case on the monitored system to allow the label manager (16) to determine whether the expected label is produced by the label manager (16) using the test case.