CN119939476A

CN119939476A - Anomaly detection method and device based on adaptive multi-scale feature modeling

Info

Publication number: CN119939476A
Application number: CN202510412968.3A
Authority: CN
Inventors: 王成; 陈珞瑶; 廖晓斌; 李艳星; 陈林聪; 高毅超
Original assignee: Fujian Beihang Construction Engineering Co ltd; Huaqiao University
Current assignee: Fujian Beihang Construction Engineering Co ltd; Huaqiao University
Priority date: 2025-04-03
Filing date: 2025-04-03
Publication date: 2025-05-06
Anticipated expiration: 2045-04-03
Also published as: CN119939476B

Abstract

The present invention provides an anomaly detection method and device based on adaptive multi-scale feature modeling, which relates to the technical field of data anomaly detection. The present invention combines time component decomposition with a gated network to assign suitable expert networks to data at different time scales. In the expert layer, a deep convolutional network and an attention mechanism across time segments are used to explore the relationships within and between different time periods in a time series. Then, combined with a graph structure, the spatial interaction features between modeling variables are modeled. Finally, the potential representation of the data is learned to generate reconstructed data, and the reconstruction error and the KL divergence of the latent space are calculated for model training, thereby performing efficient anomaly detection under unsupervised conditions. The present invention improves the accuracy of anomaly detection in industrial control systems, provides a more flexible and accurate anomaly detection tool for practical applications, and adapts to the complex dynamic changes of time series data in industrial control systems in the absence of labeled data.

Description

Abnormality detection method and device based on self-adaptive multi-scale feature modeling

Technical Field

The invention relates to the technical field of industrial control system time sequence data anomaly detection, in particular to an anomaly detection method and device based on self-adaptive multi-scale feature modeling.

Background

Industrial Control Systems (ICS) are widely used in the fields of energy, manufacturing, transportation, water treatment, etc., for monitoring and controlling various aspects of industrial processes. These systems typically rely on a large number of sensors, actuators, and control devices to ensure proper operation of the devices by collecting relevant data and performing anomaly detection analysis, which is an important basis for complex system analysis and monitoring. Anomaly detection aims at identifying behaviors that deviate from normal patterns, which generally indicate that a system may have potential faults, risks, or abnormal events.

Traditional anomaly detection methods mostly rely on manually set thresholds or rules. Due to the variety and complexity of industrial processes, traditional methods are limited by expert experience and model expressive power, and lack sufficient adaptability. The deep learning technology is widely applied in the field of abnormality detection, and is the latest direction of abnormality detection. However, in multi-dimensional timing anomaly detection, the sequences at different time scales exhibit different variations and fluctuations. The existing deep learning anomaly detection technology still has the limitation that the dynamic characteristics are not fully considered when modeling the dependency relationship of different time ranges, the flexibility of multi-scale feature extraction and fusion is neglected, multi-level space-time information in data is not fully utilized, the dynamic characteristics are not fully considered when modeling the dependency relationship of different time ranges, the potential correlation and the context dependency capture capability on the time scale are not sufficient, and the dynamic variation industrial environment is difficult to deal with.

In view of the above, the applicant has studied the prior art and has made the present application.

Disclosure of Invention

The invention aims to provide an anomaly detection method and device based on self-adaptive multi-scale feature modeling, which are used for solving the defects that the dynamic characteristics of a model in the existing method are not considered sufficiently, the potential correlation and the context-dependent capturing capability on a time scale are not sufficient, and the dynamic change of the industrial environment is difficult to deal with.

In order to solve the technical problems, the invention is realized by the following technical scheme:

an anomaly detection method based on adaptive multi-scale feature modeling, comprising:

s1, acquiring historical monitoring time series data of an industrial control system, and preprocessing the data to obtain training data input into an anomaly detection model;

S2, combining Fourier transformation and sliding windows with different sizes, decomposing time components of the input training data, and fusing the time components with the training data to obtain a decomposition result;

S3, calculating a gating weight matrix by combining a gating network according to the decomposition result, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain a multi-scale fusion characteristic;

S4, calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph convolution operation to capture spatial correlation features among nodes to obtain spatial features;

S5, carrying out weighted summation on the spatial features and the multi-scale fusion features to obtain final feature representation combined with space-time information;

S6, sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;

S7, combining the error loss of the reconstruction data and the KL divergence as a loss function training model to obtain a trained anomaly detection model;

s8, inputting the preprocessed data to be detected into an abnormality detection model, and combining a set threshold value to obtain an abnormality detection result.

Preferably, the preprocessing includes processing missing and duplicate values of data.

Preferably, the input training data is subjected to time component decomposition, specifically:

analyzing the frequency characteristic of input data by adopting a Fourier transform technology, capturing periodic variation to obtain seasonal components, wherein the seasonal components are regular fluctuation data which changes along with a fixed period in the data;

Calculating a moving average value of input data by adopting a plurality of sliding windows with different sizes, and capturing the overall rising or falling trend of the input data through smooth short-term fluctuation to obtain trend components, wherein the trend components are trend data of long-term change in the data, and comprise continuous increasing data, continuous falling data or constant data;

subtracting seasonal components and trend components from the input data to obtain residual components, wherein the residual components comprise random fluctuation data or noise data.

Preferably, the S3 specifically is:

according to the decomposition result Extracting feature vectors of each time step, and generating an initial gating weight matrix according to the data decomposition result subset, wherein the formula is as follows:

;

Wherein, A first step of representing the decomposition resultGating weight matrices for the subsets; Is an activation function; And Respectively a learned weight and a bias term; Representing the decomposition result Is the first of (2)Input features of the subsets;

Extracting feature vectors of each time step according to the training data to obtain a data subset, distributing the data subset to a plurality of expert networks with different scales according to the gating weight matrix to select the most suitable expert network to process the current data to obtain output features The expression is:

;

Wherein, Representing current data enteredA corresponding gating weight matrix; Representing slave Selecting the output with the largest weightExpert; An expert weight representation representing the noise injected; Representing the number of selection specialists; Representing gating weights; Represents noise sampled from a standard normal distribution with a mean of 0 and a variance of 1; Representing a smoothed ReLU function to enhance the nonlinear expression capabilities of the model; Representing noise weights;

Based on the gating weight matrix, weighting and summing the characteristics processed and output by a plurality of expert networks with different scales to obtain multi-scale fusion characteristics The expression is:

;

Wherein, Is the firstThe output characteristics of the personal expert network; representing the total number of expert networks.

Preferably, the expert networks of different scales define segments of different sizes, so as to extract features of different scales of the input data, and when each expert network processes the data allocated to the expert network, the processing steps are as follows:

Dividing distributed data into a plurality of time slices ;

By setting different time segment sizesDefining processing resolution of time series data for each expert network, wherein each time slice represents a local time window, and learning time dependency of data by using the local window, then:

;

Wherein, Representing an ith time segment; representing time series data in an ith time segment;

each expert network carries out local feature extraction on each time segment through the deep packet convolution layer to obtain short-term dependency features of continuous time steps in each time segment in the time sequence The expression is:

;

Wherein, Representing a convolution feature extraction operation;

Learning different time slices using multi-headed self-attention Constructing interactions between different time segments to generate global featuresCalculating matrix transformation of a query matrix, a key matrix and values when calculating multi-head self-attention, and aggregating the characteristics of each time segment through attention weights;

splicing the features learned by each expert network, and combining the distributed data and short-term dependent features And global features across time segmentsObtaining the fusion characteristics of each expert network output。

Preferably, the S4 specifically is:

Calculating cosine similarity among all node features of the multiscale fusion feature in each time step, and for each time step Is defined by each pair of nodesAndAnd calculating the cosine similarity, wherein the formula is as follows:

;

Wherein, Representing feature vectorsAndCosine similarity between them;、 Representing nodes respectively Sum nodeIs a feature vector of (1); Represents an L2 norm;

if the similarity between the two nodes is greater than a preset threshold And establishing edge connection to obtain an adjacency matrix, wherein the expression is as follows:

;

Wherein, Representing a time stepA corresponding adjacency matrix;

Based on the adjacency matrix, the characteristics of the nodes are aggregated by adopting GRAPHSAGE convolution layers to capture the spatial correlation characteristics, namely the spatial characteristics, among the nodes The formula of GRAPHSAGE convolution layers is:

;

Wherein, Representing nodesIn the first placeA characteristic representation of the layer; ) Representing nodes Aggregate is an aggregation operation; Is the first A weight matrix of the layer; Is an activation function; Representing nodes Neighboring node of (a)Is characterized by the following.

Preferably, the S6 specifically is:

The final characteristic representation is input into a variational self-encoder, and the potential spatial mean mu and the logarithmic variance of the final characteristic representation are obtained through forward propagation learning of a fully-connected network of the variational self-encoder And sampling the potential variables from the potential space by using a re-parameterization technologyThe expression is:

;

Wherein, Representing compliance with standard normal distributionNoise of the mid-sample; representing standard deviation of potential space;

latent variables are connected through a full connection layer Mapping back to the input space, reconstructing the input data to obtain reconstructed dataTo learn a compressed representation of the input data.

Preferably, the error loss of the reconstruction data is measured by adopting a mean square errorThe difference from the training data X input by the model, the expression:

;

Wherein, Representing the reconstructed data error loss;

The KL divergence The difference between the distribution of the potential space and the standard normal distribution is measured, and the expression is as follows:

;

Wherein, Representing the KL divergence; a total number of data subsets representing the training data; Represent the first Standard deviation of the subset of data; Represent the first Potential spatial means of the data subsets;

The loss function And for the weighted sum of the reconstruction data error loss and the KL divergence, the formula is as follows:

;

Wherein, And a weight representing the KL divergence.

Preferably, the method further comprises the steps of carrying out test evaluation on the abnormality detection model, and when the test evaluation result meets the set requirement, carrying out evaluation on the abnormality detection model to be qualified for subsequent abnormality detection, otherwise, carrying out training on the model again, wherein the test evaluation result of the abnormality detection model is calculated by the following formula:

Accuracy rate of The calculation formula of (2) is as follows:;

accuracy rate of The calculation formula of (2) is as follows:;

Recall rate of recall The calculation formula of (2) is as follows:;

The calculation formula of (2) is as follows: ;

Wherein TP represents the number of real examples, namely the number of normal samples in both the actual situation and the detection result, FP represents the number of false positive examples, namely the number of abnormal samples in the actual situation and the detection result, TN represents the number of true negative examples, namely the number of abnormal samples in both the actual situation and the detection situation, and FN represents the number of false negative examples, namely the number of abnormal samples in the actual situation and the detection result.

The invention also provides an anomaly detection device based on the self-adaptive multi-scale feature modeling, which comprises:

the data preprocessing unit is used for acquiring historical monitoring time series data of the industrial control system and preprocessing the data to obtain training data input into an anomaly detection model;

The component decomposition unit is used for combining Fourier transformation and sliding windows with different sizes, performing time component decomposition on the input training data, and fusing the time component decomposition with the training data to obtain a decomposition result;

The multi-scale fusion unit is used for calculating a gating weight matrix according to the decomposition result by combining a gating network, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain multi-scale fusion characteristics;

the graph rolling unit is used for calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph rolling operation so as to capture space correlation features among nodes and obtain space features;

the space-time combining unit is used for carrying out weighted summation on the space features and the multi-scale fusion features to obtain final feature representation combining space-time information;

the data reconstruction unit is used for sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;

The model training unit is used for combining the reconstruction data error loss and the KL divergence as a loss function to train the model so as to obtain a trained anomaly detection model;

The abnormality detection unit is used for inputting the preprocessed data to be detected into the abnormality detection model and obtaining an abnormality detection result by combining a set threshold value.

The invention also provides an abnormality detection device based on the adaptive multi-scale feature modeling, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the computer program can be executed by the processor to realize the abnormality detection method based on the adaptive multi-scale feature modeling.

The invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with computer readable instructions, and the computer readable instructions realize the anomaly detection method based on the self-adaptive multi-scale feature modeling when being executed by a processor of equipment where the computer readable storage medium is located.

In summary, compared with the prior art, the invention has the following beneficial effects:

According to the invention, by combining with time decomposition, mixed multiscale expert network, space-time diagram convolution and other advanced time sequence data analysis technologies, correlation among variables is concerned, and simultaneously, the expert networks with different scales are adaptively combined, so that data characteristics on different time scales are dynamically processed, and the defect of the existing method in dynamic characteristic processing of time scale change is overcome.

Compared with the traditional anomaly detection method, the method focuses on the spatial correlation in the multi-dimensional time sequence data of the industrial control system, and simultaneously gives consideration to the dynamic characteristics of the characteristics along with time. The invention improves the accuracy of abnormality detection in the industrial control system, provides a more flexible and accurate abnormality detection tool for practical application, and adapts to the complex dynamic change of time sequence data in the industrial control system under the condition of no tag data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an anomaly detection method based on adaptive multi-scale feature modeling according to a first embodiment.

Fig. 2 is a flowchart of an anomaly detection method based on adaptive multi-scale feature modeling according to an embodiment.

Fig. 3 is a schematic diagram of an anomaly detection device based on adaptive multi-scale feature modeling according to a second embodiment.

The invention is further described in detail below with reference to the drawings and the specific examples.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, of the embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

Example 1

An embodiment of the present invention provides an anomaly detection method based on adaptive multi-scale feature modeling, which may be implemented by an anomaly detection device based on adaptive multi-scale feature modeling (hereinafter referred to as an anomaly detection device), and in particular, executed by one or more processors in the anomaly detection device.

In this embodiment, the abnormality detection device may be an electronic device equipped with a processor, which is provided with a computer program of the abnormality detection method based on adaptive multi-scale feature modeling and which can be executed, for example, a computer, a smart phone, a smart tablet, a workstation, or the like, without limitation.

As shown in fig. 1-2, an anomaly detection method based on adaptive multi-scale feature modeling includes steps S1 to S8.

S1, acquiring historical monitoring time series data of an industrial control system, and preprocessing the data to obtain training data input into an anomaly detection model.

Specifically, historical characteristic data of the industrial control equipment is obtained, and whether the data are complete or not is checked. The characteristic data is a collection of observations or data points associated with a time stamp, and the preprocessing operation includes processing missing and duplicate values of the data.

In this embodiment, the preprocessed data is input into an anomaly detection model based on self-adaptive multi-scale feature extraction and reconstruction, so as to obtain a label of whether the industrial control equipment has an anomaly state at the current moment, and an anomaly detection result is obtained.

Specifically, the feature data after pretreatmentThe method comprises the steps of dividing a training set and a testing set, wherein the training set is used for model training, and the testing set is used for model test evaluation.

Definition:;

Wherein, Represents a timestamp maximum; Characteristic data representing the time instant t is indicated, Representing the feature quantity and outputting as an abnormality detection result label,,The value of the abnormality detection result label at time t is shown.

S2, combining Fourier transformation and sliding windows with different sizes, decomposing time components of the input training data, and fusing the time components with the training data to obtain a decomposition result.

The specific steps include S21 to S24.

S21, firstly, analyzing the frequency characteristic of input data by adopting a Fourier transform technology, and capturing periodic variation to obtain seasonal components, wherein the seasonal components are regular fluctuation data which varies along with a fixed period in the data.

For example, seasonal ingredients are manifested as data peaks in summer and winter each year, and relatively low peaks in spring and autumn.

S22, calculating a moving average value of input data by adopting a plurality of sliding windows with different sizes, and capturing the overall rising or falling trend of the input data through smooth short-term fluctuation to obtain trend components, wherein the trend components are trend data of long-term change in the data, and comprise continuously-increased data, continuously-decreased data or continuously-unchanged data.

Such as a trend of data rising year by year.

S23, subtracting seasonal components and trend components from the input data to obtain residual components, wherein the residual components are parts of the data which cannot be interpreted by seasonality and trend and comprise random fluctuation data or noise data.

Such as fluctuations in data due to emergencies (e.g., extreme weather).

And S24, decomposing the input data into seasonal components, trend components and residual components. Then, these components are combined with the input data to obtain a decomposition result. The time decomposition method can more clearly understand the structure and change rule of the data, is used for subsequent expert selection and feature extraction, and provides scientific basis for decision making.

And S3, calculating a gating weight matrix by combining a gating network according to the decomposition result, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain a multi-scale fusion characteristic.

The specific steps include S31 to S34.

S31, firstly, according to the decomposition resultExtracting feature vectors of each time step to obtain a data decomposition result subset, and generating an initial gating weight matrix according to the data decomposition result subset, wherein the formula is as follows:

;

Wherein, The first to show the decomposition resultGating weight matrices for the subsets; Is an activation function; And Respectively a learned weight and a bias term; Representing the decomposition result Is the first of (2)Input features of the subsets.

And S32, extracting the feature vector of each time step according to the training data to obtain a data subset, and distributing the data subset to a plurality of expert networks with different scales according to the gating weight matrix so as to select the most suitable expert network to process the current data.

In this embodiment, the "Expert Network" refers to a specific neural Network architecture or module, which is designed to be specially responsible for feature extraction of a subtask or a specific field in a complex task. Such a network may be considered an "expert" in that it is focused on handling specific types of data or features, thereby playing a greater role in overall tasks. In the context of time series analysis, the task of an expert network is typically to handle a specific time slice or feature pattern.

Specifically, each expert network is given different weights according to its contribution to the task when processing a particular subset of data, and the gating weight matrix determines which experts will be activated and their degree of activation according to the weights, thereby determining how the subset of data is assigned to each expert for processing. The data is adaptive to the expert selection process, and dynamic adjustment is performed based on the data characteristics of each time step, so as to ensure that the model can select the most appropriate expert network according to the characteristics of the data in different time periods. The expression is:

;

S33, the expert network with different scales defines time slices with different sizes, so that the time resolution of the input data is defined, and the characteristics with different scales of the data are extracted. Each expert network extracts local features in the divided time slices through deep convolution, uses a concentration mechanism crossing the time slices to model the relationship between different time slices, and improves the long-range dependent modeling capacity of the time sequence. The processing steps are as follows:

Dividing distributed data into a plurality of time slices ;

;

Wherein, Representing a convolution feature extraction operation.

In this embodiment, deep packet convolution is a special convolution operation that can divide the input channels into several groups and then convolve each group separately. This has the advantage that the computational effort can be significantly reduced while still maintaining an efficient extraction of each group feature. In this embodiment, the data for each time segment can be thought of as a multi-dimensional time series, and the packet convolution enables the network to process the data in parallel, thereby extracting features within each time segment.

Then, learning different time slices using multi-headed self-attentionConstructing interactions between different time segments to generate global featuresWherein, when calculating the multi-head self-attention, calculating the matrix transformation of the query matrix, the key matrix and the values, and aggregating the characteristics of each time segment through the attention weight. Multi-head self-attentionThe expression of (2) is:

;

Wherein, In order to query the matrix,In the form of a matrix of keys,In the form of a matrix of values,For the dimensions of the key matrix,Is a normalization function.

Splicing the features learned by each expert network, and combining the distributed data and short-term dependent featuresAnd global features across time segmentsObtaining the fusion characteristics of each expert network output。

S34, based on the gating weight matrix, processing the output characteristics of a plurality of expert networks with different scalesWeighted summation is carried out to obtain multi-scale fusion characteristicsThe expression is:

S4, calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph convolution operation to capture spatial correlation features among nodes to obtain spatial features.

The specific steps include S41 to S43.

S41, calculating cosine similarity among all node features of the multi-scale fusion feature in each time step, and for each time stepIs defined by each pair of nodesAndAnd calculating the cosine similarity, wherein the formula is as follows:

;

Wherein, Representing feature vectorsAndCosine similarity between them;、 Representing nodes respectively Sum nodeAt time stepIs a feature vector of (1); Represents an L2 norm;

s42, if the similarity between the two nodes is greater than a preset threshold And establishing edge connection to obtain an adjacency matrix, wherein the expression is as follows:

;

Wherein, Representing a time stepA corresponding adjacency matrix.

S43, based on the adjacency matrix, aggregating the characteristics of the nodes by adopting GRAPHSAGE (GRAPH SAMPLE AND AGGREGATE) convolution layers to capture the spatial correlation characteristics, namely the spatial characteristics, among the nodesThe formula of GRAPHSAGE convolution layers is:

;

And S5, carrying out weighted summation on the spatial features and the multi-scale fusion features to obtain final feature representation combined with the space-time information.

In this step, the spatial features extracted from the graph are convolvedAnd expert network extracted multiscale fusion featuresWeighted summation to obtain final feature representation combined with temporal and spatial information。

And S6, sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data.

The specific steps include S61 to S62.

S61, inputting the final characteristic representation into a variational self-encoder part, and obtaining a potential spatial mean mu and a logarithmic variance of the final characteristic representation through forward propagation learning of a fully-connected network of the variational self-encoder partAnd sampling the potential variables from the potential space by using a re-parameterization technologyThe expression is:

;

Wherein, As a potential variable of the set of variables,Representing compliance with standard normal distributionNoise of the mid-sample; representing the standard deviation of the potential space.

In this embodiment, the encoder (Encoder) of the variational self-encoder consists of a fully connected network (Fully Connected Network, FC) whose function is to map the input features to the distribution parameters (mean and variance) of the underlying space. The fully-connected network of encoders is typically composed of multiple layers of neural networks, each layer being feature-transformed by linear transformation and a nonlinear activation function (e.g., reLU).

Sampling directly from the potential space results in an inability to counter-propagate gradients, thus requiring the use of re-parameterization techniques. The core idea of the re-parameterization technique is to separate the randomness from the parameters of the network so that the gradient can be counter-propagated through the sampling process.

S62, in the decoder part, the latent variables are transferred through a full connection layerMapping back to the input space to reconstruct the input data to obtain reconstructed dataTo learn a compressed representation of the input data, the expression is:

;

Wherein, Representing a reconstruction compression operation.

In this embodiment, the decoder is also a fully connected network, whose role is to map the latent variables back into the original feature space, generating reconstructed features.

And S7, combining the error loss of the reconstruction data and the KL divergence as a loss function training model to obtain a trained anomaly detection model.

In the present embodiment, the loss functionThe error loss of the reconstruction data and the KL divergence are combined, and the formula is as follows:

;

Wherein, A weight representing the degree of divergence of the KL,The error loss of the reconstructed data is calculated,Is KL divergence.

The error loss of the reconstruction data is that the reconstruction data is measured by adopting a mean square errorThe difference from the training data X input by the model, the expression:

;

Wherein, Representing the reconstructed data error loss;

;

Wherein, A total number of data subsets representing the training data; Represent the first Standard deviation of the subset of data; Represent the first Potential spatial means of the data subsets.

Specifically, after the model is trained, testing and evaluating the trained abnormality detection model, and when the testing and evaluating result reaches the set requirement, evaluating the abnormality detection model to be qualified for subsequent abnormality detection, otherwise, retraining the model, wherein the testing and evaluating result of the abnormality detection model is calculated by the following formula:

Accuracy rate of The calculation formula of (2) is as follows:;

accuracy rate of The calculation formula of (2) is as follows:;

Recall rate of recall The calculation formula of (2) is as follows:;

The calculation formula of (2) is as follows: ;

Specifically, training data is input into the anomaly detection model to obtain training sample anomaly scores. In this embodiment, the mean value of the training sample anomaly scores is added to three times the standard deviation as the threshold value. And inputting the test data set into the anomaly detection model, and taking a result obtained by weighting the reconstruction error loss and the KL divergence in the total loss function as an anomaly detection score corresponding to each test sample in the test set. And comparing the obtained anomaly scores in the test samples with a threshold value to obtain anomaly class labels corresponding to each test sample in the test set.

When the test evaluation result reaches a set threshold, the abnormal detection model is evaluated to be qualified and used for subsequent unsupervised abnormal detection. Monitoring time sequence data of industrial control equipment is obtained in real time, and is input into an abnormality detection model after being preprocessed, so that an abnormality detection result at the current moment is obtained.

In another preferred embodiment, to verify the effectiveness of the anomaly detection model and model solutions proposed by the present invention, SWaT dataset for safety studies of industrial control systems is selected as the subject of study. The dataset was 51-dimensional in total, containing 11 days of operational data, with the first 7 days being normal operational data and the last 4 days receiving typical network attack data for 36 attack scenarios. The data set contains 51 sensors and actuators data covering various aspects of the water treatment process, such as water level, flow, pressure, PH, etc. The data is recorded in time series, once every 1 second, containing a time stamp and the measurements or status of the individual sensors/actuators.

The data of the first 7 days in the dataset are taken as training sets and are all normal data, the data of the last 4 days are taken as test sets, the test sets comprise normal data and abnormal data, then model training is carried out, and relevant parameter settings are shown in table 1.

TABLE 1 Experimental parameter setting Table

The experimental results obtained are shown in table 2.

TABLE 2 experimental results

As can be seen from Table 2, the result of the detection by adopting the abnormality detection model is relatively high, and the method is characterized by obvious advantages in accuracy and precision, and can effectively improve the accuracy of abnormality detection in an industrial control system.

The invention combines a plurality of advanced time sequence data analysis technologies such as time decomposition, mixed multiscale expert network, space-time diagram convolution and the like, and the invention dynamically processes the data characteristics on different time scales by adaptively combining the expert networks of different scales while paying attention to the relativity among the variables, thereby overcoming the defect of the existing method on the dynamic characteristic processing of time scale change. The method considers the dynamic change characteristic of the data on the time scale, and can accurately model the abnormal detection task in the industrial control system.

Specifically, the time sequence characteristics of the data are comprehensively modeled on multiple scales by combining time decomposition with a gating network and distributing a proper expert network model to the data on different time scales. Correlations within and between different time periods in the time series are mined in the expert layer using deep convolutional networks and a concentration mechanism across time segments. The spatial interaction characteristics between the variables are then modeled in conjunction with the graph structure. Finally, the potential representation of the data is learned in a reconstruction module to generate reconstruction data, and the reconstruction error and the KL divergence of the potential space are calculated to perform model training, so that efficient anomaly detection is performed under the unsupervised condition.

Compared with the traditional anomaly detection method, the method focuses on the spatial correlation in the multi-dimensional time sequence data of the industrial control system, simultaneously gives consideration to the dynamic characteristics of the characteristics along with time, can adaptively adjust the training strategy according to the characteristics of the data, and ensures accurate anomaly detection and identification under different environments. The invention improves the accuracy of abnormality detection in the industrial control system, provides a more flexible and accurate abnormality detection tool for practical application, and adapts to the complex dynamic change of time sequence data in the industrial control system under the condition of no tag data.

Example two

As shown in fig. 3, a second embodiment of the present invention further provides an anomaly detection device based on adaptive multi-scale feature modeling, including:

Example III

The third embodiment of the present invention further provides an abnormality detection device based on adaptive multi-scale feature modeling, which includes a memory and a processor, where the memory stores a computer program, and the computer program is capable of being executed by the processor to implement the abnormality detection method based on adaptive multi-scale feature modeling as described above.

Example IV

The fourth embodiment of the present invention further provides a computer readable storage medium, where computer readable instructions are stored, and when the computer readable instructions are executed by a processor of a device in which the computer readable storage medium is located, the anomaly detection method based on the adaptive multi-scale feature modeling is implemented as described above.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely an association relationship describing the associated object, and means that there may be three relationships, e.g., a and/or B, and that there may be three cases where a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The term "if" as used herein may be interpreted as "at" or "when" depending on the context "or" in response to a determination "or" in response to a detection. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An anomaly detection method based on adaptive multi-scale feature modeling, characterized by comprising:

S1, obtain the historical monitoring time series data of the industrial control system and preprocess the data to obtain the training data for the input anomaly detection model;

S2, combining Fourier transform with sliding windows of different sizes, decomposing the input training data by time component and fusing it with the training data to obtain a decomposition result;

S3, according to the decomposition result, a gating weight matrix is calculated in combination with a gating network, and the output of processing the training data by a plurality of expert networks of different scales is adaptively weighted and combined to obtain a multi-scale fusion feature;

S4, calculating the adjacency matrix of the multi-scale fusion feature at each time step, and aggregating the node features of the adjacency matrix using a graph convolution operation to capture the spatial correlation features between nodes to obtain spatial features;

S5, performing weighted summation of the spatial feature and the multi-scale fusion feature to obtain a final feature representation combining spatiotemporal information;

S6, sending the final feature representation to a variational autoencoder to learn a latent space representation, and performing data decoding and reconstruction to obtain reconstructed data;

S7, combining the reconstructed data error loss and KL divergence as the loss function to train the model and obtain a trained anomaly detection model;

S8, input the preprocessed data to be detected into the anomaly detection model, and combine it with the set threshold to obtain the anomaly detection result.

2. According to the anomaly detection method based on adaptive multi-scale feature modeling according to claim 1, it is characterized in that the preprocessing includes processing missing values and duplicate values of the data.

3. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, characterized in that the input training data is decomposed into time components, specifically:

The frequency characteristics of the input data are analyzed by Fourier transform technology to capture periodic changes and obtain seasonal components; wherein the seasonal components are regular fluctuation data in the data that changes with a fixed period;

A moving average of the input data is calculated using multiple sliding windows of different sizes, and the overall upward or downward trend of the input data is captured by smoothing short-term fluctuations to obtain a trend component; wherein the trend component is the trend data of long-term changes in the data, including continuously growing data, continuously declining data, or unchanged data;

The seasonal component and the trend component are subtracted from the input data to obtain a residual component; the residual component includes random fluctuation data or noise data.

4. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, wherein S3 is specifically:

According to the decomposition results , extract the feature vector of each time step, and obtain the data decomposition result subset; generate the initial gating weight matrix according to the data decomposition result subset, the formula is:

;

in, The decomposition result is represented by The gating weight matrix of the subsets; is the activation function; and are the learned weights and bias terms respectively; Represents the decomposition result No. The input features of the subset;

According to the training data, the feature vector of each time step is extracted to obtain a data subset; according to the gated weight matrix, the data subset is assigned to a plurality of expert networks of different scales to select the most suitable expert network to process the current data and obtain the output feature , the expression is:

;

in, Indicates the current data entered The corresponding gating weight matrix; Indicates from Select the output with the largest weight experts; represents the expert weight representation after noise injection; Indicates the number of selected experts; represents the gating weight; represents noise sampled from a standard normal distribution with mean 0 and variance 1; Represents a smooth ReLU function to enhance the nonlinear expression ability of the model; represents the noise weight;

Based on the gated weight matrix, the features of the outputs of the expert networks at different scales are weighted and summed to obtain the multi-scale fusion features. , the expression is:

;

in, It is Output features of the expert network; Represents the total number of expert networks.

5. According to claim 4, an anomaly detection method based on adaptive multi-scale feature modeling is characterized in that expert networks of different scales define fragments of different sizes, thereby extracting different scale features of input data, and when each expert network processes the data assigned to it, the processing steps are as follows:

Divide the allocated data into multiple time segments ;

By setting different time slice sizes , define the processing resolution of time series data for each expert network, each time segment represents a local time window, and use the local window to learn the time dependency of the data, then:

;

in, represents the i-th time segment; Represents the time series data in the i-th time segment;

Each expert network extracts local features of each time segment through a deep grouped convolutional layer to obtain the short-term dependency features of continuous time steps in each time segment in the time series. , the expression is:

;

in, Represents the convolution feature extraction operation;

Using multi-head self-attention to learn different time segments long-term dependencies between them, building interactions between different time segments and generating global features ; When calculating multi-head self-attention, the query matrix, key matrix and value matrix transformation are calculated, and the features of each time segment are aggregated through the attention weights;

Concatenate the features learned by each expert network, combining the assigned data and short-term dependency features and global features across time segments , get the fusion features output by each expert network .

6. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, wherein S4 is specifically:

Calculate the cosine similarity between all node features of the multi-scale fusion feature in each time step, and for each time step Each pair of nodes and , calculate the cosine similarity, the formula is:

;

in, Represents the feature vector and The cosine similarity between ; , Respectively represent nodes and nodes , calculate their dot product and normalize; represents the L2 norm;

If the similarity between two nodes is greater than the preset threshold , then establish edge connections and obtain the adjacency matrix, the expression is:

;

in, Indicates the time step The corresponding adjacency matrix;

Based on the adjacency matrix, the GraphSAGE convolutional layer is used to aggregate the features of the nodes to capture the spatial correlation features between the nodes, namely the spatial features , the formula of the GraphSAGE convolutional layer is:

;

in, Representation Node In the Feature representation of the layer; ) represents a node The neighbor node set of ; Aggregate is the aggregation operation; For the The weight matrix of the layer; is the activation function; Representation Node Neighbor nodes The feature representation of .

7. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, wherein S6 is specifically:

The final feature representation is input into the variational autoencoder, and the latent space mean μ and logarithmic variance of the final feature representation are obtained through forward propagation learning of the fully connected network of the variational autoencoder. ; and use the reparameterization technique to sample the latent variables from the latent space , the expression is:

;

in, It follows the standard normal distribution The noise sampled in represents the standard deviation of the latent space;

Through a fully connected layer, the latent variables Map back to the input space, reconstruct the input data to get the reconstructed data , to learn a compressed representation of the input data.

8. The anomaly detection method based on adaptive multi-scale feature modeling according to claim 7 is characterized in that the reconstructed data error loss is: the reconstructed data is measured by using the mean square error The difference between the training data X input to the model is expressed as:

;

in, represents the reconstructed data error loss;

The KL divergence It measures the difference between the distribution of the latent space and the standard normal distribution, and the expression is:

;

in, represents the KL divergence; a total number of data subsets representing the training data; Indicates The standard deviation of a subset of data; Indicates The latent space mean of a subset of data;

The loss function is the weighted sum of the reconstructed data error loss and the KL divergence, and the formula is:

;

in, represents the weight of the KL divergence.

9. According to any one of claims 1 to 8, an anomaly detection method based on adaptive multi-scale feature modeling is characterized by further comprising testing and evaluating the anomaly detection model. When the test evaluation result meets the set requirements, the anomaly detection model is evaluated as qualified and used for subsequent anomaly detection; otherwise, the model is retrained; wherein the test evaluation result of the anomaly detection model is calculated by the following formula:

Accuracy The calculation formula is: ;

Recall The calculation formula is: ;

The calculation formula is: ;

Among them, TP represents the number of true positives, that is, the number of samples where both the actual situation and the test results are normal; FP represents the number of false positives, that is, the number of samples where the actual situation is abnormal and the test results are normal; TN represents the number of true negatives, that is, the number of samples where both the actual situation and the test situation are abnormal; FN represents the number of false negatives, that is, the number of samples where the actual situation is normal and the test results are abnormal.

10. An anomaly detection device based on adaptive multi-scale feature modeling, characterized by comprising:

A data preprocessing unit is used to obtain the historical monitoring time series data of the industrial control system and preprocess the data to obtain the training data for the input anomaly detection model;

A component decomposition unit, used to combine Fourier transform with sliding windows of different sizes, decompose the input training data into time components, and then fuse them with the training data to obtain a decomposition result;

A multi-scale fusion unit, used to calculate a gating weight matrix based on the decomposition result in combination with a gating network, and adaptively weightedly combine the outputs of a plurality of expert networks of different scales processing the training data to obtain a multi-scale fusion feature;

A graph convolution unit is used to calculate the adjacency matrix of the multi-scale fusion feature at each time step, and aggregate the node features of the adjacency matrix using a graph convolution operation to capture the spatial correlation features between nodes and obtain spatial features;

A spatiotemporal combining unit, used for performing weighted summation of the spatial feature and the multi-scale fusion feature to obtain a final feature representation combining spatiotemporal information;

A data reconstruction unit, used for sending the final feature representation into a variational autoencoder to learn a latent space representation, and performing data decoding and reconstruction to obtain reconstructed data;

The model training unit is used to train the model by combining the reconstructed data error loss and KL divergence as the loss function to obtain a trained anomaly detection model;

The anomaly detection unit is used to input the preprocessed data to be detected into the anomaly detection model and obtain the anomaly detection result in combination with the set threshold.