Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Alternatively, according to an aspect of the embodiment of the present application, there is provided an abnormality detection method, which may be applied, but not limited to, in an application environment as shown in fig. 1, as an alternative implementation manner. As shown in fig. 1, a human-machine interaction may be performed between a user 102 and an electronic device 104. The electronic device 104 includes a memory 106 and a processor 108. The memory 106 is used for storing flow characteristics of the devices before and after the change, and the flow correspondence relationship between the devices before and after the instance change. The processor 108 is configured to obtain a first flow characteristic set corresponding to each first device in a first device set, obtain a second flow characteristic set corresponding to a second device set, obtain a first flow characteristic set, obtain a second flow characteristic set, wherein the first device set is a device set corresponding to a target instance before modification, the second device set is a device set corresponding to the target instance after modification, obtain a change relation between each first device in the first device set and each second device in the second device set, the change relation characterizes a flow corresponding relation between each first device and each second device, determine a reference flow range of each second device in the second device set based on the first flow characteristic set and the change relation, and determine whether the second device is abnormal according to the reference flow range of any second device in the second device set and the second flow characteristic of the second device.
Optionally, the electronic device 104 includes, but is not limited to, a Mobile phone, a notebook computer, a tablet computer, a palm computer, a MID (Mobile INTERNET DEVICES, mobile internet device), a desktop computer, a smart tv, and the like. In other examples, the electronic device 104 may be a single server, a server cluster including a plurality of servers, or a cloud server, where the cloud server includes, but is not limited to, a private cloud server or a public cloud server. The above is merely an example, and is not limited in any way in the present embodiment.
As an alternative implementation manner, as shown in fig. 2, an embodiment of the present application provides an anomaly detection method, which includes the following steps:
S202, obtaining first flow characteristics corresponding to each first device in a first device set, obtaining a first flow characteristic set, and obtaining second flow characteristics corresponding to a second device set, wherein the first device set is a device set corresponding to a target instance before changing, and the second device set is a device set corresponding to the target instance after changing.
Specifically, in the embodiment of the present application, the target examples may be different types of cloud services, where each cloud service corresponds to one or more devices, and the devices may be, for example, various network devices, such as a virtual network switch, an elastic network card, and so on. The first and second traffic characteristics may be time series data comprising a traffic correspondence, i.e. the traffic characteristics are a set of data over time over a period of time and each characteristic is associated with a specific time stamp. The first and second flow characteristics may be outgoing flow data and/or outgoing flow data. The changes of the target instance are generally classified into three types, i.e., upgrade, capacity expansion, and capacity contraction, and in one example, the first devices included in the first device set are, for example, device a, device B, and device C, respectively, and when the change type of the target instance is the upgrade type, the second devices in the second device set are, for example, device a, device B, and device C. When the change type of the target instance is the capacity expansion type, the second device in the second device set may be device a, device B, device C, and device D. When the change type of the target instance is a thumbnail type, the second device in the second device set may be device a and device C.
S204, obtaining a change relation between each first device in the first device set and each second device in the second device set, wherein the change relation characterizes a flow corresponding relation between each first device and each second device.
Specifically, in the embodiment of the application, the flow of each device of the target instance before and after the change usually presents an equilibrium state, that is, the flow data corresponding to the target instance before the change and the flow data corresponding to the target instance after the change usually do not have large fluctuation. Therefore, the traffic of each device in the first device set and the second device set has a mapping relationship or a corresponding relationship. The above-mentioned changing relationship is a corresponding relationship of the flow or a mapping relationship of the flow between the devices in the first device set and the devices in the second device set after the target instance is changed (e.g. capacity shrinking or capacity expanding). The allocation and change of the traffic before and after the instance change can be identified based on the change relation.
In an example, for example, the first devices in the first device set include a device a, a device B, and a device C, and when the change type of the target instance is the upgrade type, the change relationship between each first device in the first device set and each second device in the second device set is that the device a in the first device set corresponds to the flow of the device a in the second device set, the device B in the first device set corresponds to the flow of the device B in the second device set, and the device C in the first device set corresponds to the flow of the device C in the second device set. Assuming that when the change type of the target instance is the capacity expansion type, the second devices in the second device set include a device a, a device B, a device C and a device D, and the change relationship between each first device in the first device set and each second device in the second device set is that the device a in the first device set may correspond to the flow of the device B in the second device set, the device B in the first device set may correspond to the flow of the device C in the second device set, and the device C in the first device set may correspond to the flow of the device a and the flow of the device D in the second device set.
S206, determining a reference flow range of each second device in the second device set based on the first flow characteristic set and the change relation.
Specifically, in the embodiment of the present application, the reference flow range of the second device may be an upper envelope and a lower envelope of the predicted flow corresponding to the second device when no abnormality occurs. In an example, for example, the flow rate of the device a in the first device set corresponds to the flow rate of the device B in the second device set, and according to the change relationship and the first flow rate characteristic corresponding to the device a, the reference flow rate range corresponding to the device B may be predicted.
S208, determining whether the second equipment is abnormal according to the reference flow range of any second equipment in the second equipment set and the second flow characteristics of the second equipment.
Specifically, in the embodiment of the present application, for example, the flow of the device a in the first device set corresponds to the flow of the device B in the second device set, and after determining the reference flow range corresponding to the device B according to the change relationship and the first flow characteristic corresponding to the device a, the second flow characteristic corresponding to the device B is acquired, and if the second flow characteristic data of the device B exceeds the reference flow range, it may be considered that the device B is abnormal.
According to the flow characteristics of the target instance corresponding to the equipment set before and after the change and the flow corresponding relation between the equipment set corresponding to the equipment set before and after the change, the application can accurately acquire the reference flow range of the equipment corresponding to the equipment after the change, judge whether the target instance is abnormal after the change according to the reference flow range, and compared with the mode of manual sampling detection, the application can accurately and rapidly detect the equipment with the abnormality after the instance change and can also improve the efficiency of detecting the abnormality of the instance change.
As an alternative implementation manner, as shown in fig. 3, an embodiment of the present application provides an anomaly detection method, which includes the following steps:
S302, obtaining first flow characteristics corresponding to each first device in a first device set, obtaining a first flow characteristic set, and obtaining second flow characteristics corresponding to a second device set, wherein the first device set is a device set corresponding to a target instance before changing, and the second device set is a device set corresponding to the target instance after changing.
S304, obtaining equipment identifiers of the first equipment and the second equipment, wherein the number of the equipment in the first equipment set is the same as that of the equipment in the second equipment set;
S306, determining the first device and the second device with the same device identification as the associated device with the change relation.
Specifically, in the embodiment of the present application, the device identifier may be a device serial number, a media access Control Address (MAC), or a universal unique identifier (Universally Unique Identifier, UUID), etc. Taking the device identifier as the device serial number as an example, as shown in fig. 4, when the change type of the target instance S is the upgrade type, the target instance S after upgrade is changed to S', and it is assumed that the first devices included in the target instance S before change are respectively device a, device B and device C, the device identifier of device a is 1001, the device identifier of device B is 1002, and the device identifier of device C is 1003. The number of the devices of the instances S ' and S is the same, the device identifier of the device a in the instance S ' is 1001, the device identifier of the device B is 1002, and the device identifier of the device C is 1003, at this time, the device a corresponding to the instance S ' and the device a corresponding to the S may be determined as an associated device having the above-mentioned change relationship, the device B corresponding to the instance S ' and the device B corresponding to the S may be determined as an associated device, and the device C corresponding to the instance S ' and the device C corresponding to the S may be determined as an associated device having the above-mentioned change relationship. Based on the technical means, when the instance is updated, the associated equipment before and after the change can be rapidly identified through the equipment identification, so that the change management flow is simplified, and whether the equipment is abnormal after the instance is changed can be accurately detected.
S308, determining a reference flow range of each second device in the second device set based on the first flow characteristic set and the change relation.
S310, determining whether the second equipment is abnormal according to the reference flow range of any second equipment in the second equipment set and the second flow characteristics of the second equipment.
The above steps S302, S308 to S310 are already described above, and are not described here again.
As an alternative implementation manner, as shown in fig. 5, an embodiment of the present application provides an anomaly detection method, which includes the following steps:
S502, obtaining first flow characteristics corresponding to each first device in a first device set, obtaining a first flow characteristic set, and obtaining second flow characteristics corresponding to a second device set, wherein the first device set is a device set corresponding to a target instance before changing, and the second device set is a device set corresponding to the target instance after changing. The first set of devices and the second set of devices are different in number of devices.
S504, respectively calculating the similarity between the first flow characteristic and each second flow characteristic in the second flow characteristic set aiming at any first flow characteristic in the first flow characteristic set;
s506, determining a second flow characteristic with the similarity with the first flow characteristic being greater than or equal to a preset threshold value as a target flow characteristic;
s508, determining a first device corresponding to the first flow characteristic and a second device corresponding to the target flow characteristic as an associated device with the change relation;
S510, determining a first residual device set and a second residual device set as an associated device set, wherein the first residual device set refers to a set of first devices which do not form associated devices in the first device set, and the second residual device set refers to a set of second devices which do not form associated devices in the second device set;
Specifically, in the embodiment of the present application, after the target instance expands or contracts, the number of devices corresponding to the target instance changes, that is, the number of devices in the first device set is different from the number of devices in the second device set. The method comprises the steps of calculating the similarity between a first flow characteristic and a second flow characteristic, quantifying the change of the flow characteristics of each device before and after the change of a target instance, identifying which devices keep the same or similar flow behaviors before and after the change, determining associated devices (device pairs) with similar flow characteristics according to the result of similarity calculation, determining a set of remaining devices without change as an associated device set, and establishing a mapping relation between the first remaining device set and the second remaining device set.
As shown in fig. 6, after the capacity expansion change is performed on the target instance S, the corresponding devices change from three to four, first flow characteristics corresponding to each first device (device a, device B and device C) in the first device set are obtained, a first flow characteristic set is obtained, and second flow characteristics corresponding to the second device set (device a, device B, device C and device D) are obtained, for any one of the first flow characteristics in the first flow characteristic set, the similarity between each second flow characteristic in the first flow characteristic set and the second flow characteristic set is calculated, when it is determined that the similarity between the flow characteristics of the device a in the first device set and the flow characteristics of the device B in the second device set is greater than or equal to a preset threshold, the similarity between the flow characteristics of the device C of the first device and the flow characteristics of the device a of the second device is greater than or equal to a preset threshold, at this time, the device a in the first device set and the device B in the second device set can be determined to have the relationship, and the device C in the first device set and the device C in the second device set can be determined to be the device in the first device set.
The first device set which does not form the associated device in the first device set comprises a device B and a device D, namely the first rest device set comprises a first device, namely the device B. The second device set which does not form the associated device in the second device set comprises a device C and a device D, namely the second rest device set comprises a second device, namely the device C and the device D. The first set of remaining devices { device B } and the second set of remaining devices { device C, device D } are associated sets of devices.
As shown in fig. 7, after the target instance S changes the volume shrinkage, the corresponding devices change from four to three, obtain first flow characteristics corresponding to each first device (device a, device B, device C and device D) in the first device set, obtain a first flow characteristic set, and second flow characteristics corresponding to the second device set (device a, device B and device C), obtain a second flow characteristic set, respectively calculate, for any first flow characteristic in the first flow characteristic set, similarity between each second flow characteristic in the first flow characteristic set and the second flow characteristic set, and when it is determined that the similarity between the flow characteristic of device a in the first device set and the flow characteristic of device B in the second device set is greater than or equal to a preset threshold, the similarity between the flow characteristic of device C of the first device and the flow characteristic of device a in the second device set is greater than or equal to a preset threshold, at this time, the device a in the first device set and the device B in the second device set can be determined as an associated device, and the device C in the first device set and the second device set can be determined as an associated device.
The first device set which does not form the associated device in the first device set comprises a device B and a device D, namely the first rest device set comprises the first device B and the device D. The second device set which does not form the associated device in the second device set comprises a device C, namely the second rest device set comprises a second device C. The first set of remaining devices { device B, device D } and the second set of remaining devices { device C } are the set of associated devices.
According to the embodiment of the application, the similarity of the flow characteristics between the devices before and after the change of the target instance is calculated, so that the associated devices before and after the change can be more accurately determined, the devices can be accurately tracked and managed, and whether the devices after the change of the instance are abnormal can be accurately detected.
S512, determining a reference flow range of each second device in the second device set based on the first flow characteristic set and a change relation, wherein the change relation represents flow corresponding relation between each first device and each second device.
S514, determining whether the second equipment is abnormal according to the reference flow range of any second equipment in the second equipment set and the second flow characteristics of the second equipment.
The above steps S502, S512-S514 are already described above and are not repeated here.
In one or more embodiments, the computing the similarity between each second flow feature in the first flow feature and the second flow feature set, respectively, includes:
Obtaining statistical index data, wherein the statistical index data comprises at least one of a statistical index corresponding to the first flow characteristic, a statistical index corresponding to each second flow characteristic and a statistical index of relevance between the first flow characteristic and each second flow characteristic;
and calling a preset similarity prediction model to predict the statistical index data to obtain the similarity between each second flow characteristic in the first flow characteristic and the second flow characteristic set.
Specifically, in the embodiment of the present application, the similarity prediction model may be a machine learning-based two-class model, and the preset threshold may be, for example, 0.5 when the similarity prediction model is actually applied, if the probability of the output of the two-class model is higher than the threshold, the two-class model is predicted to be a positive class, i.e., the first flow feature is similar to the second flow feature, and if the output of the two-class model is lower than the threshold, the two-class model is predicted to be a negative class, i.e., the first flow feature is dissimilar to the second flow feature. The training process of the similarity prediction model comprises the steps of firstly obtaining a training sample set and a testing sample set, wherein the training sample set and the testing sample set comprise statistical index data of flow characteristics, then obtaining vectors corresponding to the statistical index data of the similarity to be compared in the training sample set, inputting the vectors corresponding to the statistical index data of the similarity to be compared into a pre-training model of similarity prediction, outputting a similarity prediction value corresponding to the statistical index data of the similarity to be compared, then obtaining a similarity verification value corresponding to the statistical index data of the similarity to be compared from the testing sample set, calculating a training loss value based on the similarity verification value and the similarity prediction value, determining that the similarity prediction model is obtained when the training loss value reaches a convergence condition, and continuously training based on the training loss value until the similarity prediction model is obtained when the training loss value does not reach the convergence condition.
In one example, the structure of the two classification model may be, for example, (GRU, dropout, LSTM, LSTM, normalization, sigmoid), where GRU (Gated Recurrent Unit) is a popular Recurrent Neural Network (RNN) variant that controls the flow of information by updating and resetting gates, thereby reducing the problem of gradient extinction. Dropout is a regularization method aiming at a neural network model, and is characterized in that partial neurons are randomly ignored in the training process, so that one neuron unit and other randomly selected neuron units can work together to achieve a good effect, the joint adaptability among the neural nodes is weakened, and the generalization capability is enhanced. A Long Short-Term Memory network (LSTM) is a time-cycled neural network that is capable of learning Long-Term dependency information. Normalization is a Normalization layer that pulls the data back to a standard normal distribution. Sigmoid is an activation function used by the activation layer to map linear outputs to (0, 1) intervals for converting outputs into probabilities in a two-class problem.
Based on the two classification models, the method can accurately identify the similarity between the first flow characteristic and each second flow characteristic in the second flow characteristic set, so that the reliability of equipment anomaly detection can be improved.
In one or more embodiments, the statistical indicator data includes at least one of:
euclidean distance between the first flow feature and the second flow features;
a correlation coefficient between the first flow characteristic and each of the second flow characteristics;
peaks of each of the first flow characteristic and the second flow characteristic;
the respective variances of the first flow characteristic and the respective second flow characteristics.
Specifically, in the embodiment of the present application, the above-mentioned correlation coefficient may be Pearson (Pearson) correlation coefficient. The application combines a plurality of statistical indexes, so that the model can analyze the flow characteristics from different angles, the analysis comprehensiveness is improved, and the prediction capability of the model on the similarity of the flow characteristics is improved.
In one or more embodiments, the reference traffic range includes an upper envelope and a lower envelope, and the determining, based on the first traffic feature set and the change relation, the reference traffic range of each second device in the second device set includes:
Acquiring each associated device conforming to the change relation;
for any one of the associated devices, acquiring a first flow characteristic corresponding to a first device in the associated devices;
Invoking a preset time sequence prediction algorithm to determine a first flow value conforming to a first preset proportion and a second flow value conforming to a second preset proportion in the first flow characteristic, wherein the first preset proportion is larger than the second preset proportion;
The upper flow limit parameter of the reference flow range is the first flow value, and the lower flow limit parameter of the reference flow range is the second flow value.
Specifically, in the embodiment of the present application, the above-mentioned time sequence prediction algorithm may use a Loess (Seasonal and Trend decomposition using Loess, STL) time sequence data decomposition algorithm for seasonal and trend decomposition, the above-mentioned first preset ratio may be an upper quartile, the above-mentioned second preset ratio may be a lower quartile, and in an example, the above-mentioned upper quartile may be an upper quartile, and the below-mentioned lower quartile may be a lower quartile. The upper flow limit parameter may be an upper envelope of the flow, and the lower flow limit parameter may be a lower envelope of the flow. And calling a preset STL algorithm after acquiring a first flow characteristic corresponding to a first device in any one of the associated devices according to any one of the associated devices, acquiring an upper quantile and a lower quantile corresponding to the first flow characteristic, taking the upper quantile as an upper envelope (upper current limiting flow parameter) corresponding to the reference flow range, taking the lower quantile as a lower envelope (lower current limiting flow parameter) corresponding to the reference flow range, and accurately determining a reference flow range corresponding to a second device in each associated device according to the upper envelope and the lower envelope.
In one or more embodiments, the determining, based on the first traffic feature set and the change relation, a reference traffic range for each second device in the second device set includes:
coding the first flow characteristics corresponding to each device in the first residual device set to obtain a flow coding vector;
Invoking a time sequence prediction pre-training model to process each flow coding vector, and obtaining a predicted flow range corresponding to each device in the first residual device set;
And acquiring a reference flow range corresponding to each device in the second residual device set based on each predicted flow range and a change coefficient, wherein the change coefficient is obtained according to the number of devices in the first residual device set and the number of devices in the second residual device set.
Specifically, in the embodiment of the present application, the time-series prediction pre-training model may be a model obtained by training flow data before and after a sample instance is changed based on a transducer neural network model. Firstly, carrying out normalization processing on first flow characteristics corresponding to each device in the first residual device set, coding the flow characteristics after normalization processing to obtain flow coding vectors, and calling a time sequence prediction pre-training model to process each flow coding vector to obtain a predicted flow range corresponding to each device in the first residual device set. The change coefficient may be obtained according to a ratio between the number of devices in the first remaining device set and the number of devices in the second remaining device set, or may be obtained according to a first weight coefficient corresponding to the number of devices in the first remaining device set and the first remaining device set, and a second weight coefficient corresponding to the number of devices in the second remaining device set and the second remaining device set.
In one example, the number of devices in the first set of remaining devices is first calculated, denoted as N1, the number of devices in the second set of remaining devices is calculated, denoted as N2, and the coefficient of variation c=n1/N2 is changed. The reference flow range corresponding to the first remaining device set is a weighted sum of the predicted flow ranges corresponding to the devices in the first remaining device set, and assuming that the reference flow range corresponding to the first remaining device set is [ L, U ], where L is a lower limit and U is an upper limit, the reference flow range corresponding to each device in the second remaining device set may be [ l×c, u×c ].
According to the embodiment of the application, the long-distance dependency relationship in the sequence data corresponding to the flow data can be effectively captured based on the transducer model, the accuracy of flow prediction can be remarkably improved, the flow prediction and the reference range can be dynamically adjusted by the system according to the change of the number of devices in the network based on the change coefficient, and the flexibility and the accuracy of change abnormality detection are enhanced.
In one or more embodiments, the obtaining, based on each of the predicted flow ranges and the change coefficient, a reference flow range corresponding to each device in the second remaining device set includes:
acquiring a first number of devices contained in the first remaining device set and a second number of devices contained in the second remaining device set;
taking the ratio of the first quantity and the second quantity as the change coefficient;
taking the weighted value of the upper flow limiting flow parameter corresponding to each predicted flow range as a flow peak value, and taking the weighted value of the lower flow limiting flow parameter corresponding to each predicted flow range as a flow valley value;
Determining the product of the flow peak value and the change coefficient as an upper flow limiting flow parameter of a reference flow range corresponding to each device in the second residual device set; and determining the product of the flow valley value and the change coefficient as a lower flow limiting flow parameter of a reference flow range corresponding to each device in the second residual device set.
Specifically, in the embodiment of the present application, as shown in fig. 6, a first number of devices included in the first remaining device set and a second number of devices included in the second remaining device set are obtained, so that a first number value is 1, a second number value is 2, and a change coefficient is determined to be 0.5. As shown in fig. 7, a first number of devices included in the first remaining device set and a second number of devices included in the second remaining device set are obtained, where the first number is 2, the second number is 1, and the change coefficient is 2.
As shown in fig. 7, taking the change type of the target instance as a capacity reduction example, assuming that the reference flow range corresponding to the device B in the first remaining device set is [ L B,UB ], where L B is a lower current limit parameter, U B is an upper current limit parameter, and the reference flow range corresponding to the device D is [ L D,UD ], where L D is a lower current limit parameter, U D is an upper current limit parameter, and assuming that the weight coefficients of the device B and the device D are both 0.5, the flow peak is (U B+UD)/2, and the flow valley is (L B+ LD)/2. The upper current limit parameter corresponding to the device C in the second remaining device set is U B+UD, and the lower current limit parameter corresponding to the device C in the second remaining device set is (L B+ LD).
Based on the technical means, the embodiment of the application can dynamically adjust the flow prediction and reference range according to the change of the number of the devices in the network, enhances the flexibility and the accuracy of the change abnormality detection, can rapidly locate the change abnormality device, and reduces the false alarm and missing report of the abnormality detection.
In one or more embodiments, the acquiring a first traffic characteristic corresponding to each first device in the first device set and a second traffic characteristic corresponding to the second device set includes:
Acquiring a first flow characteristic corresponding to a first time length of the first equipment set and a second flow characteristic corresponding to a second time length of the second equipment set, wherein the first time length is longer than the second time length.
Specifically, the value range of the first duration may be [5,7] days, the value range of the second duration may be [5,20] minutes, the method is favorable for identifying the long-term trend and the periodic mode corresponding to the flow characteristic based on the first flow characteristic of the first duration, more accurate prediction reference data can be provided for predicting the flow characteristic of the changed equipment, and the method and the device can be combined to improve the accuracy of abnormality detection based on the first flow characteristic of the second duration and the rapid discovery of the mutation condition of the flow.
In one or more embodiments, the determining whether the second device is abnormal according to the reference traffic range of any second device in the second device set and the second traffic characteristic of the second device includes:
and determining that the second equipment is abnormal based on the data which is not in the reference flow range corresponding to the second equipment in the second flow characteristic corresponding to the second equipment.
In addition, after determining that any one of the second devices in the second device set is abnormal, it may be determined that the target instance is abnormal after the change.
Based on the above example, as shown in fig. 8, in an application embodiment, there is provided an abnormality detection method including the steps of:
S1, acquiring an index of a change object according to a change scheme, wherein the index comprises flow data of each change instance and each device (gateway) in the instance, generally, the flow data (flow characteristics) of a plurality of instances are generally included in a single change, the flow data 7 days before the start of the change is acquired as pre-change data, and the data 10 minutes after the completion of the change is acquired for subsequent flow balance abnormality detection.
S2, for each change instance, performing change object mapping according to the flow data of the equipment before the change and the flow data of the equipment after the change, which are acquired in the last step. Because the flow time sequence length difference before and after the change is larger, noise generated when the flow similarity calculation is directly performed based on the traditional Euclidean distance and the Pearson correlation coefficient is larger, and the accuracy of the flow similarity calculation is lower. In order to improve accuracy of flow similarity calculation, as shown in fig. 9, the application is based on a deep learning classification model, and judges whether two flow time sequences are similar according to a classification result by taking statistical indexes such as Euclidean distance of the two flow time sequences, pearson correlation coefficient, peak value and variance of the time sequences as input features of the classification model. The classification model is a classification model with a model structure of GRU (GRU, dropout, LSTM, LSTM, normalization, sigmoid), training data of the model are derived from expert labels, the model can better identify similar flow time sequences, and accuracy of model prediction is higher. And calculating the similarity of the flow time sequences of the corresponding devices before and after the instance change, determining the similar flow time sequences as associated devices, and dividing the devices except the associated devices into a group. Specifically, the method comprises the following three cases of 1, if the number of the examples before and after the change is unchanged, determining that the mapping of the change object is completed, namely, associating the same equipment corresponding to the examples before and after the change, 2, if the number of the equipment after the change is increased, mapping the rest of the equipment except for the associated equipment after the similarity matching calculation to the rest of the equipment after the change, and 3, if the number of the equipment after the change is reduced, mapping the rest of the equipment except for the associated equipment after the similarity matching calculation to the rest of the equipment after the change.
S3, after the mapping of the change object is completed, for the equipment (associated equipment) with similar flow before and after the change, the associated equipment has no change of the bearing flow before and after the change, so the time sequence prediction is directly carried out based on the flow data of the equipment before the change, the time sequence prediction algorithm used by the application is a STL time sequence prediction algorithm used by seasonal and trend decomposition, based on the time sequence prediction algorithm, the upper envelope and the lower envelope, namely the upper boundary range and the lower boundary range, of the equipment flow data after the instance is changed can be obtained, and the equipment flow data after the instance is changed and the corresponding upper envelope and the lower envelope are compared, so that whether the equipment is abnormal after the instance is changed can be judged.
For the rest equipment (non-associated equipment), because the capacity expansion, the capacity shrinkage and other changes lead to the change of the equipment bearing flow before and after the change, the equipment before and after the change of the instance has no one-to-one mapping relation, in order to realize the calculation of the accurate interval of the equipment flow after the change, the application is based on a dynamic flow time sequence prediction model created by a transducer model, and scales the time sequence prediction upper and lower envelopes of the equipment flow before the change according to the change coefficient of the equipment number before and after the change, for example, in one instance, the number of the equipment corresponding to the equipment (a first rest equipment set) which is remained before the change of the instance is firstly obtained and is recorded as N1, the number of the equipment corresponding to the equipment (a second rest equipment set) which is remained after the change of the instance is obtained and is recorded as N2, and the change coefficient C=N1/N2. The reference flow range corresponding to the first remaining device set is a weighted sum of the predicted flow ranges corresponding to the devices in the first remaining device set, and assuming that the reference flow range corresponding to the first remaining device set is [ L, U ], where L is a lower limit and U is an upper limit, the reference flow range corresponding to each device in the second remaining device set may be [ l×c, u×c ]. And determining that the current equipment is abnormal based on the data which is not in the corresponding reference flow range in the flow characteristics corresponding to the changed equipment.
Firstly, inputting a time sequence with the time length of 7 days before changing into a dynamic flow time sequence prediction model, normalizing at a model input layer, adjusting time sequence data with different scales to the same magnitude, and particularly using a Max-min scaling method;
The encoder part of the model is composed of a plurality of identical layer stacks, each layer comprises a multi-head self-attention mechanism and a feedforward neural network, the multi-head self-attention mechanism allows the model to pay attention to different parts of an input sequence from different angles, the capability of capturing a complex mode of the model is enhanced, the feedforward neural network is responsible for carrying out nonlinear transformation on information of each position, and further extracting characteristics, and the two are combined, so that the encoder can mine internal relations of data while maintaining the structure of the input sequence, and a time sequence encoding result is obtained;
The decoder is also composed of multiple layers, similar to the encoder, but has an additional function to adapt to the requirement of time sequence prediction, and each layer of decoder processes the target sequence through a mask multi-head self-attention mechanism, prevents future information from being leaked in the training process, ensures the accuracy of prediction, inputs the encoding result and outputs the time sequence prediction result. And finally, identifying equipment change abnormality based on a time sequence prediction result, and judging a flow balance detection result after the change, so that operation and maintenance personnel can be helped to prompt the working efficiency.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
According to another aspect of the embodiment of the present application, there is also provided an abnormality detection apparatus, as shown in fig. 10, including:
A first obtaining unit 1002, configured to obtain first flow characteristics corresponding to each first device in a first device set, to obtain a first flow characteristic set, and second flow characteristics corresponding to a second device set, to obtain a second flow characteristic set;
a second obtaining unit 1004, configured to obtain a change relationship between each first device in the first device set and each second device in the second device set, where the change relationship characterizes a flow corresponding relationship between each first device and each second device;
A first determining unit 1006, configured to determine a reference flow range of each second device in the second device set based on the first flow feature set and the change relationship;
a second determining unit 1008, configured to determine whether an abnormality occurs in any second device in the second device set according to a reference traffic range of the second device and a second traffic characteristic of the second device.
In the embodiment of the application, a first flow characteristic set is obtained by acquiring first flow characteristics corresponding to each first device in a first device set and second flow characteristics corresponding to a second device set, the first device set is a device set corresponding to a target instance before modification, the second device set is a device set corresponding to the target instance after modification, a change relation between each first device in the first device set and each second device in the second device set is acquired, the change relation characterizes the flow corresponding relation between each first device and each second device, a reference flow range of each second device in the second device set is determined based on the first flow characteristic set and the change relation, whether the second device is abnormal or not is determined according to the reference flow range of any second device in the second device set and the second flow characteristics of the second device, the change relation between the corresponding devices in the first device set and the second device set can be judged according to the change relation between the reference flow range of the target instance before modification and the corresponding to the reference flow range of the second device in the second device set,
Compared with the manual sampling detection mode, the method and the device can accurately and rapidly detect the abnormal equipment after the instance is changed, and can improve the efficiency of instance change abnormal detection.
In one or more embodiments, in a case where the number of devices of the first device set and the second device set is the same, the second obtaining unit 1004 includes:
The first acquisition module is used for acquiring the equipment identifiers of the first equipment and the second equipment;
and the first determining module is used for determining the first equipment and the second equipment with the same equipment identification as the associated equipment with the change relation.
In one or more embodiments, in a case where the number of devices of the first device set and the second device set are different, the second obtaining unit 1004 includes:
The calculating module is used for respectively calculating the similarity between the first flow characteristic and each second flow characteristic in the second flow characteristic set aiming at any first flow characteristic in the first flow characteristic set;
a second determining module, configured to determine a second flow characteristic, where a similarity with the first flow characteristic is greater than or equal to a preset threshold, as a target flow characteristic;
a third determining module, configured to determine, as an associated device having the change relationship, a first device corresponding to the first flow characteristic and a second device corresponding to the target flow characteristic;
a fourth determining module, configured to determine a first remaining device set and a second remaining device set as an associated device set, where the first remaining device set refers to a set of first devices that do not form an associated device in the first device set, and the second remaining device set refers to a set of second devices that do not form an associated device in the second device set.
In one or more embodiments, the computing module includes:
The first acquisition subunit is used for acquiring statistical index data, wherein the statistical index data comprises at least one of a statistical index corresponding to the first flow characteristic and a statistical index corresponding to each second flow characteristic respectively, and a statistical index of relevance between the first flow characteristic and each second flow characteristic respectively;
And the prediction subunit is used for calling a preset similarity prediction model to predict the statistical index data so as to obtain the similarity between the first flow characteristic and each second flow characteristic in the second flow characteristic set.
In one or more embodiments, the statistical indicator data includes at least one of:
euclidean distance between the first flow feature and the second flow features;
a correlation coefficient between the first flow characteristic and each of the second flow characteristics;
peaks of each of the first flow characteristic and the second flow characteristic;
the respective variances of the first flow characteristic and the respective second flow characteristics.
In one or more embodiments, the first determining unit 1006 includes:
the second acquisition module is used for acquiring each associated device conforming to the change relation;
A third obtaining module, configured to obtain, for any one of the associated devices, a first flow characteristic corresponding to a first device in the associated devices;
And a fifth determining module, configured to invoke a preset time sequence prediction algorithm to determine a first flow value conforming to a first preset proportion and a second flow value conforming to a second preset proportion in the first flow characteristic, where the first preset proportion is greater than the second preset proportion, an upper flow limit parameter of the reference flow range is the first flow value, and a lower flow limit parameter of the reference flow range is the second flow value.
In one or more embodiments, the first determining unit 1006 includes:
the coding module is used for coding the first flow characteristics corresponding to each device in the first residual device set to obtain a flow coding vector;
The sixth acquisition module is used for calling a time sequence prediction pre-training model to process each flow coding vector so as to acquire a predicted flow range corresponding to each device in the first residual device set;
The seventh obtaining module is configured to obtain a reference flow range corresponding to each device in the second remaining device set based on each predicted flow range and a change coefficient, where the change coefficient is obtained according to the number of devices in the first remaining device set and the number of devices in the second remaining device set.
In one or more embodiments, the seventh acquisition module includes:
A second obtaining subunit, configured to obtain a first number of devices included in the first remaining device set, and a second number of devices included in the second remaining device set;
a first determining subunit configured to take a ratio of the first number and the second number as the change coefficient of variation;
A second determining subunit, configured to take, as a flow peak, a weighted value of an upper flow limit parameter corresponding to each predicted flow range, and take, as a flow valley, a weighted value of a lower flow limit parameter corresponding to each predicted flow range;
And the third determining subunit is used for determining the product of the flow peak value and the change coefficient as an upper flow limiting flow parameter of a reference flow range corresponding to each device in the second residual device set, and determining the product of the flow valley value and the change coefficient as a lower flow limiting flow parameter of a reference flow range corresponding to each device in the second residual device set.
In one or more embodiments, the first obtaining unit 1002 includes:
an eighth obtaining module, configured to obtain a first flow characteristic corresponding to a first time length of the first device set, and a second flow characteristic corresponding to a second time length of the second device set, where the first time length is greater than the second time length.
In one or more embodiments, the second determining unit 1008 includes:
And a sixth determining module, configured to determine that an abnormality occurs in the second device based on data that is not located between the reference flow ranges corresponding to the second device in the second flow characteristics corresponding to the second device.
According to still another aspect of the embodiment of the present application, there is further provided an anomaly detection system for implementing the anomaly detection method described above, where the anomaly detection system includes a first cloud gateway set corresponding to a target instance before modification, a second cloud gateway set corresponding to the target instance after modification, and a detection server;
The detection server is configured to obtain a first flow feature set corresponding to each first cloud gateway in the first cloud gateway set, and obtain a second flow feature set corresponding to the second cloud gateway set;
acquiring a change relation between each first cloud gateway in the first cloud gateway set and each second cloud gateway in the second cloud gateway set, wherein the change relation characterizes a flow corresponding relation between each first cloud gateway and each second cloud gateway;
determining a reference flow range of each second cloud gateway in the second cloud gateway set based on the first flow characteristic set and the change relation;
and determining whether the second cloud gateway is abnormal or not according to the reference flow range of any second cloud gateway in the second cloud gateway set and the second flow characteristics of the second cloud gateway.
Specifically, in the embodiment of the present application, the cloud gateway may be a server or a group of servers, which serve as an entry point of network traffic, process a request from a client and route the request to a backend service. In other embodiments, the cloud gateway may be a virtual network switch, an elastic network card, or the like, or a load balancer, a firewall, or the like.
According to still another aspect of the embodiment of the present application, there is further provided an electronic device for implementing the foregoing abnormality detection method, where the electronic device may be a terminal or a server. The present embodiment is described taking an electronic device as a server as an example. As shown in fig. 11, the electronic device comprises a memory 1102 and a processor 1104, the memory 1102 having stored therein a computer program, the processor 1104 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic device may be at least one network device of a plurality of network devices of a computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, acquiring first flow characteristics corresponding to each first device in a first device set to obtain a first flow characteristic set and second flow characteristics corresponding to a second device set to obtain a second flow characteristic set, wherein the first device set is a device set corresponding to a target instance before changing, and the second device set is a device set corresponding to the target instance after changing;
S2, acquiring a change relation between each first device in the first device set and each second device in the second device set, wherein the change relation characterizes a flow corresponding relation between each first device and each second device;
S3, determining a reference flow range of each second device in the second device set based on the first flow characteristic set and the change relation;
S4, determining whether the second equipment is abnormal or not according to the reference flow range of any second equipment in the second equipment set and the second flow characteristics of the second equipment.
Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 11 is only illustrative, and the electronic device may be a cloud server or a physical server, or a cloud server cluster or a physical server cluster, or the like. Fig. 11 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.
The memory 1102 may be used to store software programs and modules, such as program instructions/modules corresponding to the abnormality detection methods and apparatuses in the embodiments of the present application, and the processor 1104 executes the software programs and modules stored in the memory 1102 to perform various functional applications and data processing, that is, implement the abnormality detection methods described above. Memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 1102 may further include memory located remotely from processor 1104, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be, but is not limited to, specifically configured to store flow characteristics of devices before and after the change, and a flow correspondence relationship between devices before and after the change is instantiated.
As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, a first acquisition unit 1002, a second acquisition unit 1004, a first determination unit 1006, and a second determination unit 1008 in the data processing apparatus. In addition, other module units in the data processing apparatus may be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmission device 1106 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 1106 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1106 is a Radio Frequency (RF) module for communicating wirelessly with the internet.
The electronic device further includes a display 1108 for displaying the operational status of the device and a connection bus 1110 for connecting the various modular components of the electronic device.
In other embodiments, the electronic device may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.
In one or more embodiments, the present application also provides a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the abnormality detection method described above. Wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of:
S1, acquiring first flow characteristics corresponding to each first device in a first device set to obtain a first flow characteristic set and second flow characteristics corresponding to a second device set to obtain a second flow characteristic set, wherein the first device set is a device set corresponding to a target instance before changing, and the second device set is a device set corresponding to the target instance after changing;
S2, acquiring a change relation between each first device in the first device set and each second device in the second device set, wherein the change relation characterizes a flow corresponding relation between each first device and each second device;
S3, determining a reference flow range of each second device in the second device set based on the first flow characteristic set and the change relation;
S4, determining whether the second equipment is abnormal or not according to the reference flow range of any second equipment in the second equipment set and the second flow characteristics of the second equipment.
Alternatively, in this embodiment, all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing the terminal device related hardware, and the program may be stored in a computer readable storage medium, where the storage medium may include a flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and are merely a logical functional division, and there may be other manners of dividing the apparatus in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.
The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.