Disclosure of Invention
The invention aims to provide an anomaly detection method and device based on self-adaptive multi-scale feature modeling, which are used for solving the defects that the dynamic characteristics of a model in the existing method are not considered sufficiently, the potential correlation and the context-dependent capturing capability on a time scale are not sufficient, and the dynamic change of the industrial environment is difficult to deal with.
In order to solve the technical problems, the invention is realized by the following technical scheme:
an anomaly detection method based on adaptive multi-scale feature modeling, comprising:
s1, acquiring historical monitoring time series data of an industrial control system, and preprocessing the data to obtain training data input into an anomaly detection model;
S2, combining Fourier transformation and sliding windows with different sizes, decomposing time components of the input training data, and fusing the time components with the training data to obtain a decomposition result;
S3, calculating a gating weight matrix by combining a gating network according to the decomposition result, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain a multi-scale fusion characteristic;
S4, calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph convolution operation to capture spatial correlation features among nodes to obtain spatial features;
S5, carrying out weighted summation on the spatial features and the multi-scale fusion features to obtain final feature representation combined with space-time information;
S6, sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;
S7, combining the error loss of the reconstruction data and the KL divergence as a loss function training model to obtain a trained anomaly detection model;
s8, inputting the preprocessed data to be detected into an abnormality detection model, and combining a set threshold value to obtain an abnormality detection result.
Preferably, the preprocessing includes processing missing and duplicate values of data.
Preferably, the input training data is subjected to time component decomposition, specifically:
analyzing the frequency characteristic of input data by adopting a Fourier transform technology, capturing periodic variation to obtain seasonal components, wherein the seasonal components are regular fluctuation data which changes along with a fixed period in the data;
Calculating a moving average value of input data by adopting a plurality of sliding windows with different sizes, and capturing the overall rising or falling trend of the input data through smooth short-term fluctuation to obtain trend components, wherein the trend components are trend data of long-term change in the data, and comprise continuous increasing data, continuous falling data or constant data;
subtracting seasonal components and trend components from the input data to obtain residual components, wherein the residual components comprise random fluctuation data or noise data.
Preferably, the S3 specifically is:
according to the decomposition result Extracting feature vectors of each time step, and generating an initial gating weight matrix according to the data decomposition result subset, wherein the formula is as follows:
;
Wherein, A first step of representing the decomposition resultGating weight matrices for the subsets; Is an activation function; And Respectively a learned weight and a bias term; Representing the decomposition result Is the first of (2)Input features of the subsets;
Extracting feature vectors of each time step according to the training data to obtain a data subset, distributing the data subset to a plurality of expert networks with different scales according to the gating weight matrix to select the most suitable expert network to process the current data to obtain output features The expression is:
;
;
Wherein, Representing current data enteredA corresponding gating weight matrix; Representing slave Selecting the output with the largest weightExpert; An expert weight representation representing the noise injected; Representing the number of selection specialists; Representing gating weights; Represents noise sampled from a standard normal distribution with a mean of 0 and a variance of 1; Representing a smoothed ReLU function to enhance the nonlinear expression capabilities of the model; Representing noise weights;
Based on the gating weight matrix, weighting and summing the characteristics processed and output by a plurality of expert networks with different scales to obtain multi-scale fusion characteristics The expression is:
;
Wherein, Is the firstThe output characteristics of the personal expert network; representing the total number of expert networks.
Preferably, the expert networks of different scales define segments of different sizes, so as to extract features of different scales of the input data, and when each expert network processes the data allocated to the expert network, the processing steps are as follows:
Dividing distributed data into a plurality of time slices ;
By setting different time segment sizesDefining processing resolution of time series data for each expert network, wherein each time slice represents a local time window, and learning time dependency of data by using the local window, then:
;
Wherein, Representing an ith time segment; representing time series data in an ith time segment;
each expert network carries out local feature extraction on each time segment through the deep packet convolution layer to obtain short-term dependency features of continuous time steps in each time segment in the time sequence The expression is:
;
Wherein, Representing a convolution feature extraction operation;
Learning different time slices using multi-headed self-attention Constructing interactions between different time segments to generate global featuresCalculating matrix transformation of a query matrix, a key matrix and values when calculating multi-head self-attention, and aggregating the characteristics of each time segment through attention weights;
splicing the features learned by each expert network, and combining the distributed data and short-term dependent features And global features across time segmentsObtaining the fusion characteristics of each expert network output。
Preferably, the S4 specifically is:
Calculating cosine similarity among all node features of the multiscale fusion feature in each time step, and for each time step Is defined by each pair of nodesAndAnd calculating the cosine similarity, wherein the formula is as follows:
;
Wherein, Representing feature vectorsAndCosine similarity between them;、 Representing nodes respectively Sum nodeIs a feature vector of (1); Represents an L2 norm;
if the similarity between the two nodes is greater than a preset threshold And establishing edge connection to obtain an adjacency matrix, wherein the expression is as follows:
;
Wherein, Representing a time stepA corresponding adjacency matrix;
Based on the adjacency matrix, the characteristics of the nodes are aggregated by adopting GRAPHSAGE convolution layers to capture the spatial correlation characteristics, namely the spatial characteristics, among the nodes The formula of GRAPHSAGE convolution layers is:
;
Wherein, Representing nodesIn the first placeA characteristic representation of the layer; ) Representing nodes Aggregate is an aggregation operation; Is the first A weight matrix of the layer; Is an activation function; Representing nodes Neighboring node of (a)Is characterized by the following.
Preferably, the S6 specifically is:
The final characteristic representation is input into a variational self-encoder, and the potential spatial mean mu and the logarithmic variance of the final characteristic representation are obtained through forward propagation learning of a fully-connected network of the variational self-encoder And sampling the potential variables from the potential space by using a re-parameterization technologyThe expression is:
;
Wherein, Representing compliance with standard normal distributionNoise of the mid-sample; representing standard deviation of potential space;
latent variables are connected through a full connection layer Mapping back to the input space, reconstructing the input data to obtain reconstructed dataTo learn a compressed representation of the input data.
Preferably, the error loss of the reconstruction data is measured by adopting a mean square errorThe difference from the training data X input by the model, the expression:
;
Wherein, Representing the reconstructed data error loss;
The KL divergence The difference between the distribution of the potential space and the standard normal distribution is measured, and the expression is as follows:
;
Wherein, Representing the KL divergence; a total number of data subsets representing the training data; Represent the first Standard deviation of the subset of data; Represent the first Potential spatial means of the data subsets;
The loss function And for the weighted sum of the reconstruction data error loss and the KL divergence, the formula is as follows:
;
Wherein, And a weight representing the KL divergence.
Preferably, the method further comprises the steps of carrying out test evaluation on the abnormality detection model, and when the test evaluation result meets the set requirement, carrying out evaluation on the abnormality detection model to be qualified for subsequent abnormality detection, otherwise, carrying out training on the model again, wherein the test evaluation result of the abnormality detection model is calculated by the following formula:
Accuracy rate of The calculation formula of (2) is as follows:;
accuracy rate of The calculation formula of (2) is as follows:;
Recall rate of recall The calculation formula of (2) is as follows:;
The calculation formula of (2) is as follows: ;
Wherein TP represents the number of real examples, namely the number of normal samples in both the actual situation and the detection result, FP represents the number of false positive examples, namely the number of abnormal samples in the actual situation and the detection result, TN represents the number of true negative examples, namely the number of abnormal samples in both the actual situation and the detection situation, and FN represents the number of false negative examples, namely the number of abnormal samples in the actual situation and the detection result.
The invention also provides an anomaly detection device based on the self-adaptive multi-scale feature modeling, which comprises:
the data preprocessing unit is used for acquiring historical monitoring time series data of the industrial control system and preprocessing the data to obtain training data input into an anomaly detection model;
The component decomposition unit is used for combining Fourier transformation and sliding windows with different sizes, performing time component decomposition on the input training data, and fusing the time component decomposition with the training data to obtain a decomposition result;
The multi-scale fusion unit is used for calculating a gating weight matrix according to the decomposition result by combining a gating network, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain multi-scale fusion characteristics;
the graph rolling unit is used for calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph rolling operation so as to capture space correlation features among nodes and obtain space features;
the space-time combining unit is used for carrying out weighted summation on the space features and the multi-scale fusion features to obtain final feature representation combining space-time information;
the data reconstruction unit is used for sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;
The model training unit is used for combining the reconstruction data error loss and the KL divergence as a loss function to train the model so as to obtain a trained anomaly detection model;
The abnormality detection unit is used for inputting the preprocessed data to be detected into the abnormality detection model and obtaining an abnormality detection result by combining a set threshold value.
The invention also provides an abnormality detection device based on the adaptive multi-scale feature modeling, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the computer program can be executed by the processor to realize the abnormality detection method based on the adaptive multi-scale feature modeling.
The invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with computer readable instructions, and the computer readable instructions realize the anomaly detection method based on the self-adaptive multi-scale feature modeling when being executed by a processor of equipment where the computer readable storage medium is located.
In summary, compared with the prior art, the invention has the following beneficial effects:
According to the invention, by combining with time decomposition, mixed multiscale expert network, space-time diagram convolution and other advanced time sequence data analysis technologies, correlation among variables is concerned, and simultaneously, the expert networks with different scales are adaptively combined, so that data characteristics on different time scales are dynamically processed, and the defect of the existing method in dynamic characteristic processing of time scale change is overcome.
Compared with the traditional anomaly detection method, the method focuses on the spatial correlation in the multi-dimensional time sequence data of the industrial control system, and simultaneously gives consideration to the dynamic characteristics of the characteristics along with time. The invention improves the accuracy of abnormality detection in the industrial control system, provides a more flexible and accurate abnormality detection tool for practical application, and adapts to the complex dynamic change of time sequence data in the industrial control system under the condition of no tag data.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, of the embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Example 1
An embodiment of the present invention provides an anomaly detection method based on adaptive multi-scale feature modeling, which may be implemented by an anomaly detection device based on adaptive multi-scale feature modeling (hereinafter referred to as an anomaly detection device), and in particular, executed by one or more processors in the anomaly detection device.
In this embodiment, the abnormality detection device may be an electronic device equipped with a processor, which is provided with a computer program of the abnormality detection method based on adaptive multi-scale feature modeling and which can be executed, for example, a computer, a smart phone, a smart tablet, a workstation, or the like, without limitation.
As shown in fig. 1-2, an anomaly detection method based on adaptive multi-scale feature modeling includes steps S1 to S8.
S1, acquiring historical monitoring time series data of an industrial control system, and preprocessing the data to obtain training data input into an anomaly detection model.
Specifically, historical characteristic data of the industrial control equipment is obtained, and whether the data are complete or not is checked. The characteristic data is a collection of observations or data points associated with a time stamp, and the preprocessing operation includes processing missing and duplicate values of the data.
In this embodiment, the preprocessed data is input into an anomaly detection model based on self-adaptive multi-scale feature extraction and reconstruction, so as to obtain a label of whether the industrial control equipment has an anomaly state at the current moment, and an anomaly detection result is obtained.
Specifically, the feature data after pretreatmentThe method comprises the steps of dividing a training set and a testing set, wherein the training set is used for model training, and the testing set is used for model test evaluation.
Definition:;
Wherein, Represents a timestamp maximum; Characteristic data representing the time instant t is indicated, Representing the feature quantity and outputting as an abnormality detection result label,,The value of the abnormality detection result label at time t is shown.
S2, combining Fourier transformation and sliding windows with different sizes, decomposing time components of the input training data, and fusing the time components with the training data to obtain a decomposition result.
The specific steps include S21 to S24.
S21, firstly, analyzing the frequency characteristic of input data by adopting a Fourier transform technology, and capturing periodic variation to obtain seasonal components, wherein the seasonal components are regular fluctuation data which varies along with a fixed period in the data.
For example, seasonal ingredients are manifested as data peaks in summer and winter each year, and relatively low peaks in spring and autumn.
S22, calculating a moving average value of input data by adopting a plurality of sliding windows with different sizes, and capturing the overall rising or falling trend of the input data through smooth short-term fluctuation to obtain trend components, wherein the trend components are trend data of long-term change in the data, and comprise continuously-increased data, continuously-decreased data or continuously-unchanged data.
Such as a trend of data rising year by year.
S23, subtracting seasonal components and trend components from the input data to obtain residual components, wherein the residual components are parts of the data which cannot be interpreted by seasonality and trend and comprise random fluctuation data or noise data.
Such as fluctuations in data due to emergencies (e.g., extreme weather).
And S24, decomposing the input data into seasonal components, trend components and residual components. Then, these components are combined with the input data to obtain a decomposition result. The time decomposition method can more clearly understand the structure and change rule of the data, is used for subsequent expert selection and feature extraction, and provides scientific basis for decision making.
And S3, calculating a gating weight matrix by combining a gating network according to the decomposition result, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain a multi-scale fusion characteristic.
The specific steps include S31 to S34.
S31, firstly, according to the decomposition resultExtracting feature vectors of each time step to obtain a data decomposition result subset, and generating an initial gating weight matrix according to the data decomposition result subset, wherein the formula is as follows:
;
Wherein, The first to show the decomposition resultGating weight matrices for the subsets; Is an activation function; And Respectively a learned weight and a bias term; Representing the decomposition result Is the first of (2)Input features of the subsets.
And S32, extracting the feature vector of each time step according to the training data to obtain a data subset, and distributing the data subset to a plurality of expert networks with different scales according to the gating weight matrix so as to select the most suitable expert network to process the current data.
In this embodiment, the "Expert Network" refers to a specific neural Network architecture or module, which is designed to be specially responsible for feature extraction of a subtask or a specific field in a complex task. Such a network may be considered an "expert" in that it is focused on handling specific types of data or features, thereby playing a greater role in overall tasks. In the context of time series analysis, the task of an expert network is typically to handle a specific time slice or feature pattern.
Specifically, each expert network is given different weights according to its contribution to the task when processing a particular subset of data, and the gating weight matrix determines which experts will be activated and their degree of activation according to the weights, thereby determining how the subset of data is assigned to each expert for processing. The data is adaptive to the expert selection process, and dynamic adjustment is performed based on the data characteristics of each time step, so as to ensure that the model can select the most appropriate expert network according to the characteristics of the data in different time periods. The expression is:
;
;
Wherein, Representing current data enteredA corresponding gating weight matrix; Representing slave Selecting the output with the largest weightExpert; An expert weight representation representing the noise injected; Representing the number of selection specialists; Representing gating weights; Represents noise sampled from a standard normal distribution with a mean of 0 and a variance of 1; Representing a smoothed ReLU function to enhance the nonlinear expression capabilities of the model; Representing noise weights;
S33, the expert network with different scales defines time slices with different sizes, so that the time resolution of the input data is defined, and the characteristics with different scales of the data are extracted. Each expert network extracts local features in the divided time slices through deep convolution, uses a concentration mechanism crossing the time slices to model the relationship between different time slices, and improves the long-range dependent modeling capacity of the time sequence. The processing steps are as follows:
Dividing distributed data into a plurality of time slices ;
By setting different time segment sizesDefining processing resolution of time series data for each expert network, wherein each time slice represents a local time window, and learning time dependency of data by using the local window, then:
;
Wherein, Representing an ith time segment; Representing time series data in an ith time segment;
each expert network carries out local feature extraction on each time segment through the deep packet convolution layer to obtain short-term dependency features of continuous time steps in each time segment in the time sequence The expression is:
;
Wherein, Representing a convolution feature extraction operation.
In this embodiment, deep packet convolution is a special convolution operation that can divide the input channels into several groups and then convolve each group separately. This has the advantage that the computational effort can be significantly reduced while still maintaining an efficient extraction of each group feature. In this embodiment, the data for each time segment can be thought of as a multi-dimensional time series, and the packet convolution enables the network to process the data in parallel, thereby extracting features within each time segment.
Then, learning different time slices using multi-headed self-attentionConstructing interactions between different time segments to generate global featuresWherein, when calculating the multi-head self-attention, calculating the matrix transformation of the query matrix, the key matrix and the values, and aggregating the characteristics of each time segment through the attention weight. Multi-head self-attentionThe expression of (2) is:
;
Wherein, In order to query the matrix,In the form of a matrix of keys,In the form of a matrix of values,For the dimensions of the key matrix,Is a normalization function.
Splicing the features learned by each expert network, and combining the distributed data and short-term dependent featuresAnd global features across time segmentsObtaining the fusion characteristics of each expert network output。
S34, based on the gating weight matrix, processing the output characteristics of a plurality of expert networks with different scalesWeighted summation is carried out to obtain multi-scale fusion characteristicsThe expression is:
Wherein, Is the firstThe output characteristics of the personal expert network; representing the total number of expert networks.
S4, calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph convolution operation to capture spatial correlation features among nodes to obtain spatial features.
The specific steps include S41 to S43.
S41, calculating cosine similarity among all node features of the multi-scale fusion feature in each time step, and for each time stepIs defined by each pair of nodesAndAnd calculating the cosine similarity, wherein the formula is as follows:
;
Wherein, Representing feature vectorsAndCosine similarity between them;、 Representing nodes respectively Sum nodeAt time stepIs a feature vector of (1); Represents an L2 norm;
s42, if the similarity between the two nodes is greater than a preset threshold And establishing edge connection to obtain an adjacency matrix, wherein the expression is as follows:
;
Wherein, Representing a time stepA corresponding adjacency matrix.
S43, based on the adjacency matrix, aggregating the characteristics of the nodes by adopting GRAPHSAGE (GRAPH SAMPLE AND AGGREGATE) convolution layers to capture the spatial correlation characteristics, namely the spatial characteristics, among the nodesThe formula of GRAPHSAGE convolution layers is:
;
Wherein, Representing nodesIn the first placeA characteristic representation of the layer; ) Representing nodes Aggregate is an aggregation operation; Is the first A weight matrix of the layer; Is an activation function; Representing nodes Neighboring node of (a)Is characterized by the following.
And S5, carrying out weighted summation on the spatial features and the multi-scale fusion features to obtain final feature representation combined with the space-time information.
In this step, the spatial features extracted from the graph are convolvedAnd expert network extracted multiscale fusion featuresWeighted summation to obtain final feature representation combined with temporal and spatial information。
And S6, sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data.
The specific steps include S61 to S62.
S61, inputting the final characteristic representation into a variational self-encoder part, and obtaining a potential spatial mean mu and a logarithmic variance of the final characteristic representation through forward propagation learning of a fully-connected network of the variational self-encoder partAnd sampling the potential variables from the potential space by using a re-parameterization technologyThe expression is:
;
Wherein, As a potential variable of the set of variables,Representing compliance with standard normal distributionNoise of the mid-sample; representing the standard deviation of the potential space.
In this embodiment, the encoder (Encoder) of the variational self-encoder consists of a fully connected network (Fully Connected Network, FC) whose function is to map the input features to the distribution parameters (mean and variance) of the underlying space. The fully-connected network of encoders is typically composed of multiple layers of neural networks, each layer being feature-transformed by linear transformation and a nonlinear activation function (e.g., reLU).
Sampling directly from the potential space results in an inability to counter-propagate gradients, thus requiring the use of re-parameterization techniques. The core idea of the re-parameterization technique is to separate the randomness from the parameters of the network so that the gradient can be counter-propagated through the sampling process.
S62, in the decoder part, the latent variables are transferred through a full connection layerMapping back to the input space to reconstruct the input data to obtain reconstructed dataTo learn a compressed representation of the input data, the expression is:
;
Wherein, Representing a reconstruction compression operation.
In this embodiment, the decoder is also a fully connected network, whose role is to map the latent variables back into the original feature space, generating reconstructed features.
And S7, combining the error loss of the reconstruction data and the KL divergence as a loss function training model to obtain a trained anomaly detection model.
In the present embodiment, the loss functionThe error loss of the reconstruction data and the KL divergence are combined, and the formula is as follows:
;
Wherein, A weight representing the degree of divergence of the KL,The error loss of the reconstructed data is calculated,Is KL divergence.
The error loss of the reconstruction data is that the reconstruction data is measured by adopting a mean square errorThe difference from the training data X input by the model, the expression:
;
Wherein, Representing the reconstructed data error loss;
The KL divergence The difference between the distribution of the potential space and the standard normal distribution is measured, and the expression is as follows:
;
Wherein, A total number of data subsets representing the training data; Represent the first Standard deviation of the subset of data; Represent the first Potential spatial means of the data subsets.
S8, inputting the preprocessed data to be detected into an abnormality detection model, and combining a set threshold value to obtain an abnormality detection result.
Specifically, after the model is trained, testing and evaluating the trained abnormality detection model, and when the testing and evaluating result reaches the set requirement, evaluating the abnormality detection model to be qualified for subsequent abnormality detection, otherwise, retraining the model, wherein the testing and evaluating result of the abnormality detection model is calculated by the following formula:
Accuracy rate of The calculation formula of (2) is as follows:;
accuracy rate of The calculation formula of (2) is as follows:;
Recall rate of recall The calculation formula of (2) is as follows:;
The calculation formula of (2) is as follows: ;
Wherein TP represents the number of real examples, namely the number of normal samples in both the actual situation and the detection result, FP represents the number of false positive examples, namely the number of abnormal samples in the actual situation and the detection result, TN represents the number of true negative examples, namely the number of abnormal samples in both the actual situation and the detection situation, and FN represents the number of false negative examples, namely the number of abnormal samples in the actual situation and the detection result.
Specifically, training data is input into the anomaly detection model to obtain training sample anomaly scores. In this embodiment, the mean value of the training sample anomaly scores is added to three times the standard deviation as the threshold value. And inputting the test data set into the anomaly detection model, and taking a result obtained by weighting the reconstruction error loss and the KL divergence in the total loss function as an anomaly detection score corresponding to each test sample in the test set. And comparing the obtained anomaly scores in the test samples with a threshold value to obtain anomaly class labels corresponding to each test sample in the test set.
When the test evaluation result reaches a set threshold, the abnormal detection model is evaluated to be qualified and used for subsequent unsupervised abnormal detection. Monitoring time sequence data of industrial control equipment is obtained in real time, and is input into an abnormality detection model after being preprocessed, so that an abnormality detection result at the current moment is obtained.
In another preferred embodiment, to verify the effectiveness of the anomaly detection model and model solutions proposed by the present invention, SWaT dataset for safety studies of industrial control systems is selected as the subject of study. The dataset was 51-dimensional in total, containing 11 days of operational data, with the first 7 days being normal operational data and the last 4 days receiving typical network attack data for 36 attack scenarios. The data set contains 51 sensors and actuators data covering various aspects of the water treatment process, such as water level, flow, pressure, PH, etc. The data is recorded in time series, once every 1 second, containing a time stamp and the measurements or status of the individual sensors/actuators.
The data of the first 7 days in the dataset are taken as training sets and are all normal data, the data of the last 4 days are taken as test sets, the test sets comprise normal data and abnormal data, then model training is carried out, and relevant parameter settings are shown in table 1.
TABLE 1 Experimental parameter setting Table
The experimental results obtained are shown in table 2.
TABLE 2 experimental results
As can be seen from Table 2, the result of the detection by adopting the abnormality detection model is relatively high, and the method is characterized by obvious advantages in accuracy and precision, and can effectively improve the accuracy of abnormality detection in an industrial control system.
In summary, compared with the prior art, the invention has the following beneficial effects:
The invention combines a plurality of advanced time sequence data analysis technologies such as time decomposition, mixed multiscale expert network, space-time diagram convolution and the like, and the invention dynamically processes the data characteristics on different time scales by adaptively combining the expert networks of different scales while paying attention to the relativity among the variables, thereby overcoming the defect of the existing method on the dynamic characteristic processing of time scale change. The method considers the dynamic change characteristic of the data on the time scale, and can accurately model the abnormal detection task in the industrial control system.
Specifically, the time sequence characteristics of the data are comprehensively modeled on multiple scales by combining time decomposition with a gating network and distributing a proper expert network model to the data on different time scales. Correlations within and between different time periods in the time series are mined in the expert layer using deep convolutional networks and a concentration mechanism across time segments. The spatial interaction characteristics between the variables are then modeled in conjunction with the graph structure. Finally, the potential representation of the data is learned in a reconstruction module to generate reconstruction data, and the reconstruction error and the KL divergence of the potential space are calculated to perform model training, so that efficient anomaly detection is performed under the unsupervised condition.
Compared with the traditional anomaly detection method, the method focuses on the spatial correlation in the multi-dimensional time sequence data of the industrial control system, simultaneously gives consideration to the dynamic characteristics of the characteristics along with time, can adaptively adjust the training strategy according to the characteristics of the data, and ensures accurate anomaly detection and identification under different environments. The invention improves the accuracy of abnormality detection in the industrial control system, provides a more flexible and accurate abnormality detection tool for practical application, and adapts to the complex dynamic change of time sequence data in the industrial control system under the condition of no tag data.
Example two
As shown in fig. 3, a second embodiment of the present invention further provides an anomaly detection device based on adaptive multi-scale feature modeling, including:
the data preprocessing unit is used for acquiring historical monitoring time series data of the industrial control system and preprocessing the data to obtain training data input into an anomaly detection model;
The component decomposition unit is used for combining Fourier transformation and sliding windows with different sizes, performing time component decomposition on the input training data, and fusing the time component decomposition with the training data to obtain a decomposition result;
The multi-scale fusion unit is used for calculating a gating weight matrix according to the decomposition result by combining a gating network, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain multi-scale fusion characteristics;
the graph rolling unit is used for calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph rolling operation so as to capture space correlation features among nodes and obtain space features;
the space-time combining unit is used for carrying out weighted summation on the space features and the multi-scale fusion features to obtain final feature representation combining space-time information;
the data reconstruction unit is used for sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;
The model training unit is used for combining the reconstruction data error loss and the KL divergence as a loss function to train the model so as to obtain a trained anomaly detection model;
The abnormality detection unit is used for inputting the preprocessed data to be detected into the abnormality detection model and obtaining an abnormality detection result by combining a set threshold value.
Example III
The third embodiment of the present invention further provides an abnormality detection device based on adaptive multi-scale feature modeling, which includes a memory and a processor, where the memory stores a computer program, and the computer program is capable of being executed by the processor to implement the abnormality detection method based on adaptive multi-scale feature modeling as described above.
Example IV
The fourth embodiment of the present invention further provides a computer readable storage medium, where computer readable instructions are stored, and when the computer readable instructions are executed by a processor of a device in which the computer readable storage medium is located, the anomaly detection method based on the adaptive multi-scale feature modeling is implemented as described above.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely an association relationship describing the associated object, and means that there may be three relationships, e.g., a and/or B, and that there may be three cases where a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The term "if" as used herein may be interpreted as "at" or "when" depending on the context "or" in response to a determination "or" in response to a detection. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.