[go: up one dir, main page]

CN119939476A - Anomaly detection method and device based on adaptive multi-scale feature modeling - Google Patents

Anomaly detection method and device based on adaptive multi-scale feature modeling Download PDF

Info

Publication number
CN119939476A
CN119939476A CN202510412968.3A CN202510412968A CN119939476A CN 119939476 A CN119939476 A CN 119939476A CN 202510412968 A CN202510412968 A CN 202510412968A CN 119939476 A CN119939476 A CN 119939476A
Authority
CN
China
Prior art keywords
data
anomaly detection
features
time
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510412968.3A
Other languages
Chinese (zh)
Other versions
CN119939476B (en
Inventor
王成
陈珞瑶
廖晓斌
李艳星
陈林聪
高毅超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Beihang Construction Engineering Co ltd
Huaqiao University
Original Assignee
Fujian Beihang Construction Engineering Co ltd
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Beihang Construction Engineering Co ltd, Huaqiao University filed Critical Fujian Beihang Construction Engineering Co ltd
Priority to CN202510412968.3A priority Critical patent/CN119939476B/en
Publication of CN119939476A publication Critical patent/CN119939476A/en
Application granted granted Critical
Publication of CN119939476B publication Critical patent/CN119939476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Testing And Monitoring For Control Systems (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提供的一种基于自适应多尺度特征建模的异常检测方法及装置,涉及数据异常检测技术领域。本发明结合时间成分分解与门控网络,在不同时间尺度上为数据分配适合的专家网络。在专家层中利用深度卷积网络与跨时间片段的注意力机制,挖掘时间序列中不同时间段之内以及不同时间段之间的相互关系。然后结合图结构,建模变量间的空间交互特征。最后,学习数据的潜在表示来生成重构数据,计算重构误差和潜在空间的KL散度进行模型训练,从而在无监督的条件下进行高效的异常检测。本发明提升了工业控制系统中异常检测的精度,为实际应用提供更加灵活和准确的异常检测工具,在没有标签数据的情况下,适应工业控制系统中时间序列数据的复杂动态变化。

The present invention provides an anomaly detection method and device based on adaptive multi-scale feature modeling, which relates to the technical field of data anomaly detection. The present invention combines time component decomposition with a gated network to assign suitable expert networks to data at different time scales. In the expert layer, a deep convolutional network and an attention mechanism across time segments are used to explore the relationships within and between different time periods in a time series. Then, combined with a graph structure, the spatial interaction features between modeling variables are modeled. Finally, the potential representation of the data is learned to generate reconstructed data, and the reconstruction error and the KL divergence of the latent space are calculated for model training, thereby performing efficient anomaly detection under unsupervised conditions. The present invention improves the accuracy of anomaly detection in industrial control systems, provides a more flexible and accurate anomaly detection tool for practical applications, and adapts to the complex dynamic changes of time series data in industrial control systems in the absence of labeled data.

Description

Abnormality detection method and device based on self-adaptive multi-scale feature modeling
Technical Field
The invention relates to the technical field of industrial control system time sequence data anomaly detection, in particular to an anomaly detection method and device based on self-adaptive multi-scale feature modeling.
Background
Industrial Control Systems (ICS) are widely used in the fields of energy, manufacturing, transportation, water treatment, etc., for monitoring and controlling various aspects of industrial processes. These systems typically rely on a large number of sensors, actuators, and control devices to ensure proper operation of the devices by collecting relevant data and performing anomaly detection analysis, which is an important basis for complex system analysis and monitoring. Anomaly detection aims at identifying behaviors that deviate from normal patterns, which generally indicate that a system may have potential faults, risks, or abnormal events.
Traditional anomaly detection methods mostly rely on manually set thresholds or rules. Due to the variety and complexity of industrial processes, traditional methods are limited by expert experience and model expressive power, and lack sufficient adaptability. The deep learning technology is widely applied in the field of abnormality detection, and is the latest direction of abnormality detection. However, in multi-dimensional timing anomaly detection, the sequences at different time scales exhibit different variations and fluctuations. The existing deep learning anomaly detection technology still has the limitation that the dynamic characteristics are not fully considered when modeling the dependency relationship of different time ranges, the flexibility of multi-scale feature extraction and fusion is neglected, multi-level space-time information in data is not fully utilized, the dynamic characteristics are not fully considered when modeling the dependency relationship of different time ranges, the potential correlation and the context dependency capture capability on the time scale are not sufficient, and the dynamic variation industrial environment is difficult to deal with.
In view of the above, the applicant has studied the prior art and has made the present application.
Disclosure of Invention
The invention aims to provide an anomaly detection method and device based on self-adaptive multi-scale feature modeling, which are used for solving the defects that the dynamic characteristics of a model in the existing method are not considered sufficiently, the potential correlation and the context-dependent capturing capability on a time scale are not sufficient, and the dynamic change of the industrial environment is difficult to deal with.
In order to solve the technical problems, the invention is realized by the following technical scheme:
an anomaly detection method based on adaptive multi-scale feature modeling, comprising:
s1, acquiring historical monitoring time series data of an industrial control system, and preprocessing the data to obtain training data input into an anomaly detection model;
S2, combining Fourier transformation and sliding windows with different sizes, decomposing time components of the input training data, and fusing the time components with the training data to obtain a decomposition result;
S3, calculating a gating weight matrix by combining a gating network according to the decomposition result, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain a multi-scale fusion characteristic;
S4, calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph convolution operation to capture spatial correlation features among nodes to obtain spatial features;
S5, carrying out weighted summation on the spatial features and the multi-scale fusion features to obtain final feature representation combined with space-time information;
S6, sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;
S7, combining the error loss of the reconstruction data and the KL divergence as a loss function training model to obtain a trained anomaly detection model;
s8, inputting the preprocessed data to be detected into an abnormality detection model, and combining a set threshold value to obtain an abnormality detection result.
Preferably, the preprocessing includes processing missing and duplicate values of data.
Preferably, the input training data is subjected to time component decomposition, specifically:
analyzing the frequency characteristic of input data by adopting a Fourier transform technology, capturing periodic variation to obtain seasonal components, wherein the seasonal components are regular fluctuation data which changes along with a fixed period in the data;
Calculating a moving average value of input data by adopting a plurality of sliding windows with different sizes, and capturing the overall rising or falling trend of the input data through smooth short-term fluctuation to obtain trend components, wherein the trend components are trend data of long-term change in the data, and comprise continuous increasing data, continuous falling data or constant data;
subtracting seasonal components and trend components from the input data to obtain residual components, wherein the residual components comprise random fluctuation data or noise data.
Preferably, the S3 specifically is:
according to the decomposition result Extracting feature vectors of each time step, and generating an initial gating weight matrix according to the data decomposition result subset, wherein the formula is as follows:
;
Wherein, A first step of representing the decomposition resultGating weight matrices for the subsets; Is an activation function; And Respectively a learned weight and a bias term; Representing the decomposition result Is the first of (2)Input features of the subsets;
Extracting feature vectors of each time step according to the training data to obtain a data subset, distributing the data subset to a plurality of expert networks with different scales according to the gating weight matrix to select the most suitable expert network to process the current data to obtain output features The expression is:
;
;
Wherein, Representing current data enteredA corresponding gating weight matrix; Representing slave Selecting the output with the largest weightExpert; An expert weight representation representing the noise injected; Representing the number of selection specialists; Representing gating weights; Represents noise sampled from a standard normal distribution with a mean of 0 and a variance of 1; Representing a smoothed ReLU function to enhance the nonlinear expression capabilities of the model; Representing noise weights;
Based on the gating weight matrix, weighting and summing the characteristics processed and output by a plurality of expert networks with different scales to obtain multi-scale fusion characteristics The expression is:
;
Wherein, Is the firstThe output characteristics of the personal expert network; representing the total number of expert networks.
Preferably, the expert networks of different scales define segments of different sizes, so as to extract features of different scales of the input data, and when each expert network processes the data allocated to the expert network, the processing steps are as follows:
Dividing distributed data into a plurality of time slices ;
By setting different time segment sizesDefining processing resolution of time series data for each expert network, wherein each time slice represents a local time window, and learning time dependency of data by using the local window, then:
;
Wherein, Representing an ith time segment; representing time series data in an ith time segment;
each expert network carries out local feature extraction on each time segment through the deep packet convolution layer to obtain short-term dependency features of continuous time steps in each time segment in the time sequence The expression is:
;
Wherein, Representing a convolution feature extraction operation;
Learning different time slices using multi-headed self-attention Constructing interactions between different time segments to generate global featuresCalculating matrix transformation of a query matrix, a key matrix and values when calculating multi-head self-attention, and aggregating the characteristics of each time segment through attention weights;
splicing the features learned by each expert network, and combining the distributed data and short-term dependent features And global features across time segmentsObtaining the fusion characteristics of each expert network output
Preferably, the S4 specifically is:
Calculating cosine similarity among all node features of the multiscale fusion feature in each time step, and for each time step Is defined by each pair of nodesAndAnd calculating the cosine similarity, wherein the formula is as follows:
;
Wherein, Representing feature vectorsAndCosine similarity between them; Representing nodes respectively Sum nodeIs a feature vector of (1); Represents an L2 norm;
if the similarity between the two nodes is greater than a preset threshold And establishing edge connection to obtain an adjacency matrix, wherein the expression is as follows:
;
Wherein, Representing a time stepA corresponding adjacency matrix;
Based on the adjacency matrix, the characteristics of the nodes are aggregated by adopting GRAPHSAGE convolution layers to capture the spatial correlation characteristics, namely the spatial characteristics, among the nodes The formula of GRAPHSAGE convolution layers is:
;
Wherein, Representing nodesIn the first placeA characteristic representation of the layer; ) Representing nodes Aggregate is an aggregation operation; Is the first A weight matrix of the layer; Is an activation function; Representing nodes Neighboring node of (a)Is characterized by the following.
Preferably, the S6 specifically is:
The final characteristic representation is input into a variational self-encoder, and the potential spatial mean mu and the logarithmic variance of the final characteristic representation are obtained through forward propagation learning of a fully-connected network of the variational self-encoder And sampling the potential variables from the potential space by using a re-parameterization technologyThe expression is:
;
Wherein, Representing compliance with standard normal distributionNoise of the mid-sample; representing standard deviation of potential space;
latent variables are connected through a full connection layer Mapping back to the input space, reconstructing the input data to obtain reconstructed dataTo learn a compressed representation of the input data.
Preferably, the error loss of the reconstruction data is measured by adopting a mean square errorThe difference from the training data X input by the model, the expression:
;
Wherein, Representing the reconstructed data error loss;
The KL divergence The difference between the distribution of the potential space and the standard normal distribution is measured, and the expression is as follows:
;
Wherein, Representing the KL divergence; a total number of data subsets representing the training data; Represent the first Standard deviation of the subset of data; Represent the first Potential spatial means of the data subsets;
The loss function And for the weighted sum of the reconstruction data error loss and the KL divergence, the formula is as follows:
;
Wherein, And a weight representing the KL divergence.
Preferably, the method further comprises the steps of carrying out test evaluation on the abnormality detection model, and when the test evaluation result meets the set requirement, carrying out evaluation on the abnormality detection model to be qualified for subsequent abnormality detection, otherwise, carrying out training on the model again, wherein the test evaluation result of the abnormality detection model is calculated by the following formula:
Accuracy rate of The calculation formula of (2) is as follows:;
accuracy rate of The calculation formula of (2) is as follows:;
Recall rate of recall The calculation formula of (2) is as follows:;
The calculation formula of (2) is as follows: ;
Wherein TP represents the number of real examples, namely the number of normal samples in both the actual situation and the detection result, FP represents the number of false positive examples, namely the number of abnormal samples in the actual situation and the detection result, TN represents the number of true negative examples, namely the number of abnormal samples in both the actual situation and the detection situation, and FN represents the number of false negative examples, namely the number of abnormal samples in the actual situation and the detection result.
The invention also provides an anomaly detection device based on the self-adaptive multi-scale feature modeling, which comprises:
the data preprocessing unit is used for acquiring historical monitoring time series data of the industrial control system and preprocessing the data to obtain training data input into an anomaly detection model;
The component decomposition unit is used for combining Fourier transformation and sliding windows with different sizes, performing time component decomposition on the input training data, and fusing the time component decomposition with the training data to obtain a decomposition result;
The multi-scale fusion unit is used for calculating a gating weight matrix according to the decomposition result by combining a gating network, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain multi-scale fusion characteristics;
the graph rolling unit is used for calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph rolling operation so as to capture space correlation features among nodes and obtain space features;
the space-time combining unit is used for carrying out weighted summation on the space features and the multi-scale fusion features to obtain final feature representation combining space-time information;
the data reconstruction unit is used for sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;
The model training unit is used for combining the reconstruction data error loss and the KL divergence as a loss function to train the model so as to obtain a trained anomaly detection model;
The abnormality detection unit is used for inputting the preprocessed data to be detected into the abnormality detection model and obtaining an abnormality detection result by combining a set threshold value.
The invention also provides an abnormality detection device based on the adaptive multi-scale feature modeling, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the computer program can be executed by the processor to realize the abnormality detection method based on the adaptive multi-scale feature modeling.
The invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with computer readable instructions, and the computer readable instructions realize the anomaly detection method based on the self-adaptive multi-scale feature modeling when being executed by a processor of equipment where the computer readable storage medium is located.
In summary, compared with the prior art, the invention has the following beneficial effects:
According to the invention, by combining with time decomposition, mixed multiscale expert network, space-time diagram convolution and other advanced time sequence data analysis technologies, correlation among variables is concerned, and simultaneously, the expert networks with different scales are adaptively combined, so that data characteristics on different time scales are dynamically processed, and the defect of the existing method in dynamic characteristic processing of time scale change is overcome.
Compared with the traditional anomaly detection method, the method focuses on the spatial correlation in the multi-dimensional time sequence data of the industrial control system, and simultaneously gives consideration to the dynamic characteristics of the characteristics along with time. The invention improves the accuracy of abnormality detection in the industrial control system, provides a more flexible and accurate abnormality detection tool for practical application, and adapts to the complex dynamic change of time sequence data in the industrial control system under the condition of no tag data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an anomaly detection method based on adaptive multi-scale feature modeling according to a first embodiment.
Fig. 2 is a flowchart of an anomaly detection method based on adaptive multi-scale feature modeling according to an embodiment.
Fig. 3 is a schematic diagram of an anomaly detection device based on adaptive multi-scale feature modeling according to a second embodiment.
The invention is further described in detail below with reference to the drawings and the specific examples.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, of the embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Example 1
An embodiment of the present invention provides an anomaly detection method based on adaptive multi-scale feature modeling, which may be implemented by an anomaly detection device based on adaptive multi-scale feature modeling (hereinafter referred to as an anomaly detection device), and in particular, executed by one or more processors in the anomaly detection device.
In this embodiment, the abnormality detection device may be an electronic device equipped with a processor, which is provided with a computer program of the abnormality detection method based on adaptive multi-scale feature modeling and which can be executed, for example, a computer, a smart phone, a smart tablet, a workstation, or the like, without limitation.
As shown in fig. 1-2, an anomaly detection method based on adaptive multi-scale feature modeling includes steps S1 to S8.
S1, acquiring historical monitoring time series data of an industrial control system, and preprocessing the data to obtain training data input into an anomaly detection model.
Specifically, historical characteristic data of the industrial control equipment is obtained, and whether the data are complete or not is checked. The characteristic data is a collection of observations or data points associated with a time stamp, and the preprocessing operation includes processing missing and duplicate values of the data.
In this embodiment, the preprocessed data is input into an anomaly detection model based on self-adaptive multi-scale feature extraction and reconstruction, so as to obtain a label of whether the industrial control equipment has an anomaly state at the current moment, and an anomaly detection result is obtained.
Specifically, the feature data after pretreatmentThe method comprises the steps of dividing a training set and a testing set, wherein the training set is used for model training, and the testing set is used for model test evaluation.
Definition:;
Wherein, Represents a timestamp maximum; Characteristic data representing the time instant t is indicated, Representing the feature quantity and outputting as an abnormality detection result label,,The value of the abnormality detection result label at time t is shown.
S2, combining Fourier transformation and sliding windows with different sizes, decomposing time components of the input training data, and fusing the time components with the training data to obtain a decomposition result.
The specific steps include S21 to S24.
S21, firstly, analyzing the frequency characteristic of input data by adopting a Fourier transform technology, and capturing periodic variation to obtain seasonal components, wherein the seasonal components are regular fluctuation data which varies along with a fixed period in the data.
For example, seasonal ingredients are manifested as data peaks in summer and winter each year, and relatively low peaks in spring and autumn.
S22, calculating a moving average value of input data by adopting a plurality of sliding windows with different sizes, and capturing the overall rising or falling trend of the input data through smooth short-term fluctuation to obtain trend components, wherein the trend components are trend data of long-term change in the data, and comprise continuously-increased data, continuously-decreased data or continuously-unchanged data.
Such as a trend of data rising year by year.
S23, subtracting seasonal components and trend components from the input data to obtain residual components, wherein the residual components are parts of the data which cannot be interpreted by seasonality and trend and comprise random fluctuation data or noise data.
Such as fluctuations in data due to emergencies (e.g., extreme weather).
And S24, decomposing the input data into seasonal components, trend components and residual components. Then, these components are combined with the input data to obtain a decomposition result. The time decomposition method can more clearly understand the structure and change rule of the data, is used for subsequent expert selection and feature extraction, and provides scientific basis for decision making.
And S3, calculating a gating weight matrix by combining a gating network according to the decomposition result, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain a multi-scale fusion characteristic.
The specific steps include S31 to S34.
S31, firstly, according to the decomposition resultExtracting feature vectors of each time step to obtain a data decomposition result subset, and generating an initial gating weight matrix according to the data decomposition result subset, wherein the formula is as follows:
;
Wherein, The first to show the decomposition resultGating weight matrices for the subsets; Is an activation function; And Respectively a learned weight and a bias term; Representing the decomposition result Is the first of (2)Input features of the subsets.
And S32, extracting the feature vector of each time step according to the training data to obtain a data subset, and distributing the data subset to a plurality of expert networks with different scales according to the gating weight matrix so as to select the most suitable expert network to process the current data.
In this embodiment, the "Expert Network" refers to a specific neural Network architecture or module, which is designed to be specially responsible for feature extraction of a subtask or a specific field in a complex task. Such a network may be considered an "expert" in that it is focused on handling specific types of data or features, thereby playing a greater role in overall tasks. In the context of time series analysis, the task of an expert network is typically to handle a specific time slice or feature pattern.
Specifically, each expert network is given different weights according to its contribution to the task when processing a particular subset of data, and the gating weight matrix determines which experts will be activated and their degree of activation according to the weights, thereby determining how the subset of data is assigned to each expert for processing. The data is adaptive to the expert selection process, and dynamic adjustment is performed based on the data characteristics of each time step, so as to ensure that the model can select the most appropriate expert network according to the characteristics of the data in different time periods. The expression is:
;
;
Wherein, Representing current data enteredA corresponding gating weight matrix; Representing slave Selecting the output with the largest weightExpert; An expert weight representation representing the noise injected; Representing the number of selection specialists; Representing gating weights; Represents noise sampled from a standard normal distribution with a mean of 0 and a variance of 1; Representing a smoothed ReLU function to enhance the nonlinear expression capabilities of the model; Representing noise weights;
S33, the expert network with different scales defines time slices with different sizes, so that the time resolution of the input data is defined, and the characteristics with different scales of the data are extracted. Each expert network extracts local features in the divided time slices through deep convolution, uses a concentration mechanism crossing the time slices to model the relationship between different time slices, and improves the long-range dependent modeling capacity of the time sequence. The processing steps are as follows:
Dividing distributed data into a plurality of time slices ;
By setting different time segment sizesDefining processing resolution of time series data for each expert network, wherein each time slice represents a local time window, and learning time dependency of data by using the local window, then:
;
Wherein, Representing an ith time segment; Representing time series data in an ith time segment;
each expert network carries out local feature extraction on each time segment through the deep packet convolution layer to obtain short-term dependency features of continuous time steps in each time segment in the time sequence The expression is:
;
Wherein, Representing a convolution feature extraction operation.
In this embodiment, deep packet convolution is a special convolution operation that can divide the input channels into several groups and then convolve each group separately. This has the advantage that the computational effort can be significantly reduced while still maintaining an efficient extraction of each group feature. In this embodiment, the data for each time segment can be thought of as a multi-dimensional time series, and the packet convolution enables the network to process the data in parallel, thereby extracting features within each time segment.
Then, learning different time slices using multi-headed self-attentionConstructing interactions between different time segments to generate global featuresWherein, when calculating the multi-head self-attention, calculating the matrix transformation of the query matrix, the key matrix and the values, and aggregating the characteristics of each time segment through the attention weight. Multi-head self-attentionThe expression of (2) is:
;
Wherein, In order to query the matrix,In the form of a matrix of keys,In the form of a matrix of values,For the dimensions of the key matrix,Is a normalization function.
Splicing the features learned by each expert network, and combining the distributed data and short-term dependent featuresAnd global features across time segmentsObtaining the fusion characteristics of each expert network output
S34, based on the gating weight matrix, processing the output characteristics of a plurality of expert networks with different scalesWeighted summation is carried out to obtain multi-scale fusion characteristicsThe expression is:
Wherein, Is the firstThe output characteristics of the personal expert network; representing the total number of expert networks.
S4, calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph convolution operation to capture spatial correlation features among nodes to obtain spatial features.
The specific steps include S41 to S43.
S41, calculating cosine similarity among all node features of the multi-scale fusion feature in each time step, and for each time stepIs defined by each pair of nodesAndAnd calculating the cosine similarity, wherein the formula is as follows:
;
Wherein, Representing feature vectorsAndCosine similarity between them; Representing nodes respectively Sum nodeAt time stepIs a feature vector of (1); Represents an L2 norm;
s42, if the similarity between the two nodes is greater than a preset threshold And establishing edge connection to obtain an adjacency matrix, wherein the expression is as follows:
;
Wherein, Representing a time stepA corresponding adjacency matrix.
S43, based on the adjacency matrix, aggregating the characteristics of the nodes by adopting GRAPHSAGE (GRAPH SAMPLE AND AGGREGATE) convolution layers to capture the spatial correlation characteristics, namely the spatial characteristics, among the nodesThe formula of GRAPHSAGE convolution layers is:
;
Wherein, Representing nodesIn the first placeA characteristic representation of the layer; ) Representing nodes Aggregate is an aggregation operation; Is the first A weight matrix of the layer; Is an activation function; Representing nodes Neighboring node of (a)Is characterized by the following.
And S5, carrying out weighted summation on the spatial features and the multi-scale fusion features to obtain final feature representation combined with the space-time information.
In this step, the spatial features extracted from the graph are convolvedAnd expert network extracted multiscale fusion featuresWeighted summation to obtain final feature representation combined with temporal and spatial information
And S6, sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data.
The specific steps include S61 to S62.
S61, inputting the final characteristic representation into a variational self-encoder part, and obtaining a potential spatial mean mu and a logarithmic variance of the final characteristic representation through forward propagation learning of a fully-connected network of the variational self-encoder partAnd sampling the potential variables from the potential space by using a re-parameterization technologyThe expression is:
;
Wherein, As a potential variable of the set of variables,Representing compliance with standard normal distributionNoise of the mid-sample; representing the standard deviation of the potential space.
In this embodiment, the encoder (Encoder) of the variational self-encoder consists of a fully connected network (Fully Connected Network, FC) whose function is to map the input features to the distribution parameters (mean and variance) of the underlying space. The fully-connected network of encoders is typically composed of multiple layers of neural networks, each layer being feature-transformed by linear transformation and a nonlinear activation function (e.g., reLU).
Sampling directly from the potential space results in an inability to counter-propagate gradients, thus requiring the use of re-parameterization techniques. The core idea of the re-parameterization technique is to separate the randomness from the parameters of the network so that the gradient can be counter-propagated through the sampling process.
S62, in the decoder part, the latent variables are transferred through a full connection layerMapping back to the input space to reconstruct the input data to obtain reconstructed dataTo learn a compressed representation of the input data, the expression is:
;
Wherein, Representing a reconstruction compression operation.
In this embodiment, the decoder is also a fully connected network, whose role is to map the latent variables back into the original feature space, generating reconstructed features.
And S7, combining the error loss of the reconstruction data and the KL divergence as a loss function training model to obtain a trained anomaly detection model.
In the present embodiment, the loss functionThe error loss of the reconstruction data and the KL divergence are combined, and the formula is as follows:
;
Wherein, A weight representing the degree of divergence of the KL,The error loss of the reconstructed data is calculated,Is KL divergence.
The error loss of the reconstruction data is that the reconstruction data is measured by adopting a mean square errorThe difference from the training data X input by the model, the expression:
;
Wherein, Representing the reconstructed data error loss;
The KL divergence The difference between the distribution of the potential space and the standard normal distribution is measured, and the expression is as follows:
;
Wherein, A total number of data subsets representing the training data; Represent the first Standard deviation of the subset of data; Represent the first Potential spatial means of the data subsets.
S8, inputting the preprocessed data to be detected into an abnormality detection model, and combining a set threshold value to obtain an abnormality detection result.
Specifically, after the model is trained, testing and evaluating the trained abnormality detection model, and when the testing and evaluating result reaches the set requirement, evaluating the abnormality detection model to be qualified for subsequent abnormality detection, otherwise, retraining the model, wherein the testing and evaluating result of the abnormality detection model is calculated by the following formula:
Accuracy rate of The calculation formula of (2) is as follows:;
accuracy rate of The calculation formula of (2) is as follows:;
Recall rate of recall The calculation formula of (2) is as follows:;
The calculation formula of (2) is as follows: ;
Wherein TP represents the number of real examples, namely the number of normal samples in both the actual situation and the detection result, FP represents the number of false positive examples, namely the number of abnormal samples in the actual situation and the detection result, TN represents the number of true negative examples, namely the number of abnormal samples in both the actual situation and the detection situation, and FN represents the number of false negative examples, namely the number of abnormal samples in the actual situation and the detection result.
Specifically, training data is input into the anomaly detection model to obtain training sample anomaly scores. In this embodiment, the mean value of the training sample anomaly scores is added to three times the standard deviation as the threshold value. And inputting the test data set into the anomaly detection model, and taking a result obtained by weighting the reconstruction error loss and the KL divergence in the total loss function as an anomaly detection score corresponding to each test sample in the test set. And comparing the obtained anomaly scores in the test samples with a threshold value to obtain anomaly class labels corresponding to each test sample in the test set.
When the test evaluation result reaches a set threshold, the abnormal detection model is evaluated to be qualified and used for subsequent unsupervised abnormal detection. Monitoring time sequence data of industrial control equipment is obtained in real time, and is input into an abnormality detection model after being preprocessed, so that an abnormality detection result at the current moment is obtained.
In another preferred embodiment, to verify the effectiveness of the anomaly detection model and model solutions proposed by the present invention, SWaT dataset for safety studies of industrial control systems is selected as the subject of study. The dataset was 51-dimensional in total, containing 11 days of operational data, with the first 7 days being normal operational data and the last 4 days receiving typical network attack data for 36 attack scenarios. The data set contains 51 sensors and actuators data covering various aspects of the water treatment process, such as water level, flow, pressure, PH, etc. The data is recorded in time series, once every 1 second, containing a time stamp and the measurements or status of the individual sensors/actuators.
The data of the first 7 days in the dataset are taken as training sets and are all normal data, the data of the last 4 days are taken as test sets, the test sets comprise normal data and abnormal data, then model training is carried out, and relevant parameter settings are shown in table 1.
TABLE 1 Experimental parameter setting Table
The experimental results obtained are shown in table 2.
TABLE 2 experimental results
As can be seen from Table 2, the result of the detection by adopting the abnormality detection model is relatively high, and the method is characterized by obvious advantages in accuracy and precision, and can effectively improve the accuracy of abnormality detection in an industrial control system.
In summary, compared with the prior art, the invention has the following beneficial effects:
The invention combines a plurality of advanced time sequence data analysis technologies such as time decomposition, mixed multiscale expert network, space-time diagram convolution and the like, and the invention dynamically processes the data characteristics on different time scales by adaptively combining the expert networks of different scales while paying attention to the relativity among the variables, thereby overcoming the defect of the existing method on the dynamic characteristic processing of time scale change. The method considers the dynamic change characteristic of the data on the time scale, and can accurately model the abnormal detection task in the industrial control system.
Specifically, the time sequence characteristics of the data are comprehensively modeled on multiple scales by combining time decomposition with a gating network and distributing a proper expert network model to the data on different time scales. Correlations within and between different time periods in the time series are mined in the expert layer using deep convolutional networks and a concentration mechanism across time segments. The spatial interaction characteristics between the variables are then modeled in conjunction with the graph structure. Finally, the potential representation of the data is learned in a reconstruction module to generate reconstruction data, and the reconstruction error and the KL divergence of the potential space are calculated to perform model training, so that efficient anomaly detection is performed under the unsupervised condition.
Compared with the traditional anomaly detection method, the method focuses on the spatial correlation in the multi-dimensional time sequence data of the industrial control system, simultaneously gives consideration to the dynamic characteristics of the characteristics along with time, can adaptively adjust the training strategy according to the characteristics of the data, and ensures accurate anomaly detection and identification under different environments. The invention improves the accuracy of abnormality detection in the industrial control system, provides a more flexible and accurate abnormality detection tool for practical application, and adapts to the complex dynamic change of time sequence data in the industrial control system under the condition of no tag data.
Example two
As shown in fig. 3, a second embodiment of the present invention further provides an anomaly detection device based on adaptive multi-scale feature modeling, including:
the data preprocessing unit is used for acquiring historical monitoring time series data of the industrial control system and preprocessing the data to obtain training data input into an anomaly detection model;
The component decomposition unit is used for combining Fourier transformation and sliding windows with different sizes, performing time component decomposition on the input training data, and fusing the time component decomposition with the training data to obtain a decomposition result;
The multi-scale fusion unit is used for calculating a gating weight matrix according to the decomposition result by combining a gating network, and adaptively weighting and combining a plurality of expert networks with different scales to process the output of the training data so as to obtain multi-scale fusion characteristics;
the graph rolling unit is used for calculating an adjacent matrix of the multi-scale fusion feature at each time step, and aggregating node features of the adjacent matrix by adopting graph rolling operation so as to capture space correlation features among nodes and obtain space features;
the space-time combining unit is used for carrying out weighted summation on the space features and the multi-scale fusion features to obtain final feature representation combining space-time information;
the data reconstruction unit is used for sending the final characteristic representation to a variation self-encoder to learn potential space representation, and carrying out data decoding reconstruction to obtain reconstruction data;
The model training unit is used for combining the reconstruction data error loss and the KL divergence as a loss function to train the model so as to obtain a trained anomaly detection model;
The abnormality detection unit is used for inputting the preprocessed data to be detected into the abnormality detection model and obtaining an abnormality detection result by combining a set threshold value.
Example III
The third embodiment of the present invention further provides an abnormality detection device based on adaptive multi-scale feature modeling, which includes a memory and a processor, where the memory stores a computer program, and the computer program is capable of being executed by the processor to implement the abnormality detection method based on adaptive multi-scale feature modeling as described above.
Example IV
The fourth embodiment of the present invention further provides a computer readable storage medium, where computer readable instructions are stored, and when the computer readable instructions are executed by a processor of a device in which the computer readable storage medium is located, the anomaly detection method based on the adaptive multi-scale feature modeling is implemented as described above.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely an association relationship describing the associated object, and means that there may be three relationships, e.g., a and/or B, and that there may be three cases where a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The term "if" as used herein may be interpreted as "at" or "when" depending on the context "or" in response to a determination "or" in response to a detection. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种基于自适应多尺度特征建模的异常检测方法,其特征在于,包括:1. An anomaly detection method based on adaptive multi-scale feature modeling, characterized by comprising: S1,获取工业控制系统的历史监测时间序列数据,并对数据进行预处理,得到输入异常检测模型的训练数据;S1, obtain the historical monitoring time series data of the industrial control system and preprocess the data to obtain the training data for the input anomaly detection model; S2,结合傅里叶变换与不同尺寸的滑动窗口,对输入的所述训练数据进行时间成分分解后与所述训练数据融合,得到分解结果;S2, combining Fourier transform with sliding windows of different sizes, decomposing the input training data by time component and fusing it with the training data to obtain a decomposition result; S3,根据所述分解结果,结合门控网络计算出门控权重矩阵,并自适应地加权组合多个不同尺度的专家网络处理所述训练数据的输出,得到多尺度融合特征;S3, according to the decomposition result, a gating weight matrix is calculated in combination with a gating network, and the output of processing the training data by a plurality of expert networks of different scales is adaptively weighted and combined to obtain a multi-scale fusion feature; S4,计算所述多尺度融合特征在每个时间步长的邻接矩阵,并采用图卷积操作对所述邻接矩阵的节点特征进行聚合,以捕捉节点间的空间关联特征,得到空间特征;S4, calculating the adjacency matrix of the multi-scale fusion feature at each time step, and aggregating the node features of the adjacency matrix using a graph convolution operation to capture the spatial correlation features between nodes to obtain spatial features; S5,将所述空间特征与所述多尺度融合特征进行加权求和,得到结合时空信息的最终特征表示;S5, performing weighted summation of the spatial feature and the multi-scale fusion feature to obtain a final feature representation combining spatiotemporal information; S6,将所述最终特征表示送入变分自编码器学习潜在空间表示,并进行数据解码重构,得到重构数据;S6, sending the final feature representation to a variational autoencoder to learn a latent space representation, and performing data decoding and reconstruction to obtain reconstructed data; S7,结合重构数据误差损失和KL散度作为损失函数训练模型,得到训练好的异常检测模型;S7, combining the reconstructed data error loss and KL divergence as the loss function to train the model and obtain a trained anomaly detection model; S8,将预处理后的待检测数据输入异常检测模型,结合设定的阈值,得到异常检测结果。S8, input the preprocessed data to be detected into the anomaly detection model, and combine it with the set threshold to obtain the anomaly detection result. 2.根据权利要求1所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,所述预处理包括对数据的缺失值和重复值进行处理。2. According to the anomaly detection method based on adaptive multi-scale feature modeling according to claim 1, it is characterized in that the preprocessing includes processing missing values and duplicate values of the data. 3.根据权利要求1所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,对输入的所述训练数据进行时间成分分解,具体为:3. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, characterized in that the input training data is decomposed into time components, specifically: 采用傅里叶变换技术分析输入数据的频率特性,捕捉周期性变化,得到季节性成分;其中,所述季节性成分为数据中随固定周期变化的规律性波动数据;The frequency characteristics of the input data are analyzed by Fourier transform technology to capture periodic changes and obtain seasonal components; wherein the seasonal components are regular fluctuation data in the data that changes with a fixed period; 采用多个不同尺寸的滑动窗口计算输入数据的移动平均值,通过平滑短期波动捕捉输入数据的整体上升或下降趋势,得到趋势成分;其中,所述趋势成分为数据中长期变化的趋势数据,包括持续增长数据、持续下降数据或保持不变数据;A moving average of the input data is calculated using multiple sliding windows of different sizes, and the overall upward or downward trend of the input data is captured by smoothing short-term fluctuations to obtain a trend component; wherein the trend component is the trend data of long-term changes in the data, including continuously growing data, continuously declining data, or unchanged data; 将输入数据减去季节性成分与趋势成分,得到残差成分;所述残差成分包含随机波动数据或噪声数据。The seasonal component and the trend component are subtracted from the input data to obtain a residual component; the residual component includes random fluctuation data or noise data. 4.根据权利要求1所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,所述S3具体为:4. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, wherein S3 is specifically: 根据所述分解结果,提取每个时间步长的特征向量,得到数据分解结果子集;根据数据分解结果子集生成初始的门控权重矩阵,公式为:According to the decomposition results , extract the feature vector of each time step, and obtain the data decomposition result subset; generate the initial gating weight matrix according to the data decomposition result subset, the formula is: ; 其中,表示所述分解结果的第个子集的门控权重矩阵;为激活函数;分别为学习到的权重和偏置项;表示所述分解结果的第个子集的输入特征;in, The decomposition result is represented by The gating weight matrix of the subsets; is the activation function; and are the learned weights and bias terms respectively; Represents the decomposition result No. The input features of the subset; 根据所述训练数据,提取每个时间步长的特征向量,得到数据子集;根据所述门控权重矩阵,将所述数据子集分配给多个不同尺度的专家网络,以选择最适合的专家网络处理当前数据,得到输出特征,表达式为:According to the training data, the feature vector of each time step is extracted to obtain a data subset; according to the gated weight matrix, the data subset is assigned to a plurality of expert networks of different scales to select the most suitable expert network to process the current data and obtain the output feature , the expression is: ; ; 其中,表示输入的当前数据对应的门控权重矩阵;表示从的输出中选择权重最大的个专家;表示经过噪声注入的专家权重表示;表示选择专家个数;表示门控权重;表示从均值为0、方差为1的标准正态分布中抽样的噪声;表示平滑的ReLU函数,以增强模型的非线性表达能力;表示噪声权重;in, Indicates the current data entered The corresponding gating weight matrix; Indicates from Select the output with the largest weight experts; represents the expert weight representation after noise injection; Indicates the number of selected experts; represents the gating weight; represents noise sampled from a standard normal distribution with mean 0 and variance 1; Represents a smooth ReLU function to enhance the nonlinear expression ability of the model; represents the noise weight; 基于所述门控权重矩阵,对多个不同尺度的专家网络处理输出的特征进行加权求和,得到多尺度融合特征,表达式为:Based on the gated weight matrix, the features of the outputs of the expert networks at different scales are weighted and summed to obtain the multi-scale fusion features. , the expression is: ; 其中,是第个专家网络的输出特征;表示专家网络的总数。in, It is Output features of the expert network; Represents the total number of expert networks. 5.根据权利要求4所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,不同尺度的专家网络定义不同大小的片段,从而提取出输入数据的不同尺度特征,在每个专家网络对其分配到的数据进行处理时,处理步骤如下:5. According to claim 4, an anomaly detection method based on adaptive multi-scale feature modeling is characterized in that expert networks of different scales define fragments of different sizes, thereby extracting different scale features of input data, and when each expert network processes the data assigned to it, the processing steps are as follows: 将分配的数据划分为多个时间片段Divide the allocated data into multiple time segments ; 通过设置不同的时间片段大小,为每个专家网络定义时间序列数据的处理分辨率,每个时间片段代表一个局部的时间窗口,利用局部窗口学习数据的时间依赖关系,则:By setting different time slice sizes , define the processing resolution of time series data for each expert network, each time segment represents a local time window, and use the local window to learn the time dependency of the data, then: ; 其中,表示第i个时间片段;表示第i个时间片段里的时间序列数据;in, represents the i-th time segment; Represents the time series data in the i-th time segment; 每个专家网络通过深度分组卷积层对各个时间片段进行局部特征提取,得到时间序列中每个时间片段内连续时间步长的短期依赖特征,表达式为:Each expert network extracts local features of each time segment through a deep grouped convolutional layer to obtain the short-term dependency features of continuous time steps in each time segment in the time series. , the expression is: ; 其中,表示卷积特征提取操作;in, Represents the convolution feature extraction operation; 利用多头自注意力学习不同时间片段之间的长期依赖关系,构建不同时间片段之间的相互作用,生成全局特征;在计算多头自注意力时,计算查询矩阵、键矩阵和值的矩阵变换,通过注意力权重聚合各个时间片段的特征;Using multi-head self-attention to learn different time segments long-term dependencies between them, building interactions between different time segments and generating global features ; When calculating multi-head self-attention, the query matrix, key matrix and value matrix transformation are calculated, and the features of each time segment are aggregated through the attention weights; 将每个专家网络学习到的特征拼接,结合分配的数据、短期依赖特征和跨时间片段的全局特征,得到每个专家网络输出的融合特征Concatenate the features learned by each expert network, combining the assigned data and short-term dependency features and global features across time segments , get the fusion features output by each expert network . 6.根据权利要求1所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,所述S4具体为:6. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, wherein S4 is specifically: 计算所述多尺度融合特征在每个时间步长中所有节点特征之间的余弦相似度,对每个时间步长的每一对节点,计算出余弦相似度,公式为:Calculate the cosine similarity between all node features of the multi-scale fusion feature in each time step, and for each time step Each pair of nodes and , calculate the cosine similarity, the formula is: ; 其中,表示特征向量之间的余弦相似度;分别表示节点和节点的特征向量,计算它们的点积并归一化;表示L2范数;in, Represents the feature vector and The cosine similarity between ; , Respectively represent nodes and nodes , calculate their dot product and normalize; represents the L2 norm; 若两个节点之间相似度大于预设阈值,则建立边连接,得到邻接矩阵,表达式为:If the similarity between two nodes is greater than the preset threshold , then establish edge connections and obtain the adjacency matrix, the expression is: ; 其中,表示时间步长对应的邻接矩阵;in, Indicates the time step The corresponding adjacency matrix; 基于邻接矩阵,采用GraphSAGE卷积层对节点的特征进行聚合,以捕捉节点之间的空间关联特征,即空间特征,GraphSAGE卷积层的公式为:Based on the adjacency matrix, the GraphSAGE convolutional layer is used to aggregate the features of the nodes to capture the spatial correlation features between the nodes, namely the spatial features , the formula of the GraphSAGE convolutional layer is: ; 其中,表示节点在第层的特征表示;)表示节点的邻居节点集合;Aggregate为聚合操作;为第层的权重矩阵;为激活函数;表示节点的邻居节点的特征表示。in, Representation Node In the Feature representation of the layer; ) represents a node The neighbor node set of ; Aggregate is the aggregation operation; For the The weight matrix of the layer; is the activation function; Representation Node Neighbor nodes The feature representation of . 7.根据权利要求1所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,所述S6具体为:7. The method for anomaly detection based on adaptive multi-scale feature modeling according to claim 1, wherein S6 is specifically: 将所述最终特征表示输入变分自编码器,经变分自编码器的全连接网络的前向传播学习,得到所述最终特征表示的潜在空间均值μ与对数方差;并使用重参数化技术从潜在空间中采样得到潜在变量,表达式为:The final feature representation is input into the variational autoencoder, and the latent space mean μ and logarithmic variance of the final feature representation are obtained through forward propagation learning of the fully connected network of the variational autoencoder. ; and use the reparameterization technique to sample the latent variables from the latent space , the expression is: ; 其中,表示服从标准正态分布中采样的噪声;表示潜在空间的标准差;in, It follows the standard normal distribution The noise sampled in represents the standard deviation of the latent space; 通过一个全连接层将潜在变量映射回输入空间,重构输入数据得到重构数据,以学习输入数据的压缩表示。Through a fully connected layer, the latent variables Map back to the input space, reconstruct the input data to get the reconstructed data , to learn a compressed representation of the input data. 8.根据权利要求7所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,所述重构数据误差损失为:采用均方误差衡量所述重构数据与模型输入的训练数据X之间的差异,表达式:8. The anomaly detection method based on adaptive multi-scale feature modeling according to claim 7 is characterized in that the reconstructed data error loss is: the reconstructed data is measured by using the mean square error The difference between the training data X input to the model is expressed as: ; 其中,表示所述重构数据误差损失;in, represents the reconstructed data error loss; 所述KL散度衡量了潜在空间的分布与标准正态分布之间的差异,表达式为:The KL divergence It measures the difference between the distribution of the latent space and the standard normal distribution, and the expression is: ; 其中,表示所述KL散度;表示所述训练数据的数据子集的总数;表示第个数据子集的标准差;表示第个数据子集的潜在空间均值;in, represents the KL divergence; a total number of data subsets representing the training data; Indicates The standard deviation of a subset of data; Indicates The latent space mean of a subset of data; 所述损失函数为所述重构数据误差损失与所述KL散度的加权和,公式为:The loss function is the weighted sum of the reconstructed data error loss and the KL divergence, and the formula is: ; 其中,表示所述KL散度的权重。in, represents the weight of the KL divergence. 9.根据权利要求1-8任一项所述的一种基于自适应多尺度特征建模的异常检测方法,其特征在于,还包括对所述异常检测模型进行测试评估,当测试评估结果达到设定的要求时,所述异常检测模型评估合格,用于后续异常检测;否则,重新对模型进行训练;其中,所述异常检测模型的测试评估结果通过以下公式计算:9. According to any one of claims 1 to 8, an anomaly detection method based on adaptive multi-scale feature modeling is characterized by further comprising testing and evaluating the anomaly detection model. When the test evaluation result meets the set requirements, the anomaly detection model is evaluated as qualified and used for subsequent anomaly detection; otherwise, the model is retrained; wherein the test evaluation result of the anomaly detection model is calculated by the following formula: 准确率的计算公式为:Accuracy The calculation formula is: ; 精确率的计算公式为:Accuracy The calculation formula is: ; 召回率的计算公式为:Recall The calculation formula is: ; 的计算公式为: The calculation formula is: ; 其中,TP表示真正例数,即实际情况和检测结果都是正常样本的数量;FP表示假正例数,即实际情况为异常样本而检测结果为正常样本的数量;TN表示真反例数,即实际情况和检测情况都为异常样本的数量;FN表示假反例数,即实际情况为正常样本而检测结果为异常样本的数量。Among them, TP represents the number of true positives, that is, the number of samples where both the actual situation and the test results are normal; FP represents the number of false positives, that is, the number of samples where the actual situation is abnormal and the test results are normal; TN represents the number of true negatives, that is, the number of samples where both the actual situation and the test situation are abnormal; FN represents the number of false negatives, that is, the number of samples where the actual situation is normal and the test results are abnormal. 10.一种基于自适应多尺度特征建模的异常检测装置,其特征在于,包括:10. An anomaly detection device based on adaptive multi-scale feature modeling, characterized by comprising: 数据预处理单元,用于获取工业控制系统的历史监测时间序列数据,并对数据进行预处理,得到输入异常检测模型的训练数据;A data preprocessing unit is used to obtain the historical monitoring time series data of the industrial control system and preprocess the data to obtain the training data for the input anomaly detection model; 成分分解单元,用于结合傅里叶变换与不同尺寸的滑动窗口,对输入的所述训练数据进行时间成分分解后与所述训练数据融合,得到分解结果;A component decomposition unit, used to combine Fourier transform with sliding windows of different sizes, decompose the input training data into time components, and then fuse them with the training data to obtain a decomposition result; 多尺度融合单元,用于根据所述分解结果,结合门控网络计算出门控权重矩阵,并自适应地加权组合多个不同尺度的专家网络处理所述训练数据的输出,得到多尺度融合特征;A multi-scale fusion unit, used to calculate a gating weight matrix based on the decomposition result in combination with a gating network, and adaptively weightedly combine the outputs of a plurality of expert networks of different scales processing the training data to obtain a multi-scale fusion feature; 图卷积单元,用于计算所述多尺度融合特征在每个时间步长的邻接矩阵,并采用图卷积操作对所述邻接矩阵的节点特征进行聚合,以捕捉节点间的空间关联特征,得到空间特征;A graph convolution unit is used to calculate the adjacency matrix of the multi-scale fusion feature at each time step, and aggregate the node features of the adjacency matrix using a graph convolution operation to capture the spatial correlation features between nodes and obtain spatial features; 时空结合单元,用于将所述空间特征与所述多尺度融合特征进行加权求和,得到结合时空信息的最终特征表示;A spatiotemporal combining unit, used for performing weighted summation of the spatial feature and the multi-scale fusion feature to obtain a final feature representation combining spatiotemporal information; 数据重构单元,用于将所述最终特征表示送入变分自编码器学习潜在空间表示,并进行数据解码重构,得到重构数据;A data reconstruction unit, used for sending the final feature representation into a variational autoencoder to learn a latent space representation, and performing data decoding and reconstruction to obtain reconstructed data; 模型训练单元,用于结合重构数据误差损失和KL散度作为损失函数训练模型,得到训练好的异常检测模型;The model training unit is used to train the model by combining the reconstructed data error loss and KL divergence as the loss function to obtain a trained anomaly detection model; 异常检测单元,用于将预处理后的待检测数据输入异常检测模型,结合设定的阈值,得到异常检测结果。The anomaly detection unit is used to input the preprocessed data to be detected into the anomaly detection model and obtain the anomaly detection result in combination with the set threshold.
CN202510412968.3A 2025-04-03 2025-04-03 Abnormality detection method and device based on self-adaptive multi-scale feature modeling Active CN119939476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510412968.3A CN119939476B (en) 2025-04-03 2025-04-03 Abnormality detection method and device based on self-adaptive multi-scale feature modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510412968.3A CN119939476B (en) 2025-04-03 2025-04-03 Abnormality detection method and device based on self-adaptive multi-scale feature modeling

Publications (2)

Publication Number Publication Date
CN119939476A true CN119939476A (en) 2025-05-06
CN119939476B CN119939476B (en) 2025-06-06

Family

ID=95541885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510412968.3A Active CN119939476B (en) 2025-04-03 2025-04-03 Abnormality detection method and device based on self-adaptive multi-scale feature modeling

Country Status (1)

Country Link
CN (1) CN119939476B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120449062A (en) * 2025-07-10 2025-08-08 中国矿业大学 Time sequence anomaly detection method based on incremental learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370610A1 (en) * 2018-05-29 2019-12-05 Microsoft Technology Licensing, Llc Data anomaly detection
CN115879505A (en) * 2022-11-15 2023-03-31 哈尔滨理工大学 An Adaptive Correlation-Aware Unsupervised Deep Learning Anomaly Detection Method
CN117272196A (en) * 2023-08-23 2023-12-22 浙江工业大学 Industrial time sequence data anomaly detection method based on time-space diagram attention network
CN118535981A (en) * 2024-05-17 2024-08-23 桂林电子科技大学 Multi-sensor time sequence data anomaly detection method based on time sequence converter
CN118779804A (en) * 2024-07-12 2024-10-15 西南交通大学 A time series data anomaly detection method based on joint graph learning and dual attention mechanism
CN119520165A (en) * 2025-01-16 2025-02-25 中国海洋大学 Unsupervised anomaly detection method and system for industrial control networks based on spatiotemporal codec

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370610A1 (en) * 2018-05-29 2019-12-05 Microsoft Technology Licensing, Llc Data anomaly detection
CN115879505A (en) * 2022-11-15 2023-03-31 哈尔滨理工大学 An Adaptive Correlation-Aware Unsupervised Deep Learning Anomaly Detection Method
CN117272196A (en) * 2023-08-23 2023-12-22 浙江工业大学 Industrial time sequence data anomaly detection method based on time-space diagram attention network
CN118535981A (en) * 2024-05-17 2024-08-23 桂林电子科技大学 Multi-sensor time sequence data anomaly detection method based on time sequence converter
CN118779804A (en) * 2024-07-12 2024-10-15 西南交通大学 A time series data anomaly detection method based on joint graph learning and dual attention mechanism
CN119520165A (en) * 2025-01-16 2025-02-25 中国海洋大学 Unsupervised anomaly detection method and system for industrial control networks based on spatiotemporal codec

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONG, XIYAO: "Online Multivariate Time Series Anomaly Detection Method Based on Contrastive Learning", ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 9 October 2024 (2024-10-09) *
杨晨龙: "基于GAT-AGRU的多元时序数据异常检测", 理论与算法, vol. 47, no. 17, 3 December 2024 (2024-12-03) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120449062A (en) * 2025-07-10 2025-08-08 中国矿业大学 Time sequence anomaly detection method based on incremental learning

Also Published As

Publication number Publication date
CN119939476B (en) 2025-06-06

Similar Documents

Publication Publication Date Title
CN119071052B (en) Network anomaly monitoring method and system for switch
Xu et al. Global attention mechanism based deep learning for remaining useful life prediction of aero-engine
CN118094427B (en) Anomaly detection method and system for IoT time series data based on dynamic graph attention
CN119939476B (en) Abnormality detection method and device based on self-adaptive multi-scale feature modeling
CN116522265A (en) Industrial Internet time sequence data anomaly detection method and device
CN119089367B (en) Intelligent temperature-controlled lead-free tin bar production method and device
CN116680639B (en) A deep learning-based anomaly detection method for deep-sea submersible sensor data
CN112163020A (en) A kind of multidimensional time series anomaly detection method and detection system
CN116611022B (en) Intelligent campus education big data fusion method and platform
CN120337010B (en) Equipment telemetry data fault analysis method and system based on machine learning
CN119740197B (en) Fault diagnosis and maintenance method and system for intelligent street lamp
CN119066348A (en) A water bloom prediction model and method based on multi-scale temporal convolutional neural network
CN117914629A (en) Network security detection method and system
CN119885095A (en) Large-scale historical data-oriented substation misoperation mode mining method
CN119399572B (en) Marine organism population anomaly monitoring method and system based on Transformer
CN118916823B (en) Environment detection early warning method and system based on artificial intelligence
Huang et al. A deep learning approach for predicting critical events using event logs
CN119539140A (en) A carbon emission prediction method and system based on LSTM and self-attention
Lu et al. Weak monotonicity with trend analysis for unsupervised feature evaluation
CN119046846A (en) Deep learning-based ATM (automatic teller machine) smashing recognition method and related equipment
Tinawi Machine learning for time series anomaly detection
CN118708932A (en) A feature perception pre-training method and system for time series anomaly detection
CN118536014A (en) Digital environment monitoring management system and method
Hu et al. Multi-Scale Transformers with Contrastive Learning for UAV Anomaly Detection
Donets et al. APPLICATION OF A DATA STRATIFICATION APPROACH IN COMPUTER MEDICAL MONITORING SYSTEMS.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant