Fault early warning method of rotating machinery based on Bayesian LSTM model
Technical Field
The invention belongs to the field of fault early warning of large-scale rotating machinery equipment, and relates to a fault early warning method of rotating machinery based on a Bayesian LSTM model.
Background
In order to improve the data quality, reduce the influence of bad data on the analysis result and facilitate the analysis and mining of the data, data preprocessing is necessary, namely identification and integrated transformation are carried out on the bad data, and the accuracy of subsequent modeling prediction is directly related. Lee et al recently highlighted the key role of Signal Processing techniques in the diagnosis and prognosis of rotating machines (Lee, Jay; Wu, Fangji; ZHao, Wenyu; Ghaffari, Mass; Liao, Linxia; Siegel, David. Prognostics and health management design for road map Systems-Reviews, methods and applications, Mechanical Systems and Signal Processing,2014,42(1-2): 314-334 doi:10.1016/j. ymssp.2013.06.004.). Wavelet transformation is a common method and has some applications in monitoring data processing of nuclear power plants. Such as Upadhyaya et al, enhance sensor measurements by filtering the low and high frequency components of the nuclear power plant sensor using wavelet transforms. An efficient method for data pre-processing is provided by using Wavelet transformation for signal conditioning with minimal distortion of signal bandwidth (Upadhyaya, Belle R.; Mehta, Chaitanya; Bayram, Duygu. integration of Time Series Modeling and Wavelet Transform for Monitoring Nuclear Plant Sensors. IEEE Transactions on Nuclear Science,2014,61(5):2628-2635.doi: 10.1109/TNS.2014.2341035). Whereas wavelet transform can only transform the low-pass filtered result, discrete wavelet packet transform is a more elaborate signal analysis method, while it can decompose the high frequency part.
To simplify the study of the problem, reduce the number of variables, fewer variables are used to represent most of the information of the problem. PCA is used as a linear dimensionality reduction method and is widely applied to the fields of pattern recognition, feature extraction and the like. A Fault Detection and Diagnosis (FDD) framework is constructed for a Nuclear Power Plant (NPP) pressurized water reactor, PCA is adopted to remove information of fault sensors, fuzzy theory is combined with data fusion, and data of a plurality of sensors are fused onto one node (Guohua Wu, Jeijua Tong, Liguo Zhuang, Yunfei Zhuao, Zhuhing Duan. framework for fault diagnosis with multi-source sensor nodes in Nuclear power plant based on Bayesian network of Nuclear energy.2018,122:297-308.https:// doi. org/10.1016/j. anucene.2018.08.08.). PPCA is a derivative of conventional PCA, and overcomes the limitation that conventional PCA simply discards other non-principal components. In PPCA, this discarded information is used as gaussian noise estimation, which can maximally preserve the useful information in the original signal.
The application of deep learning in the field of fault diagnosis and health management of mechanical equipment is gradually increased recently, wherein the effect of applying a Recurrent Neural Network (RNN) to sequence-type data is obvious. Long-and-short memory networks (LSTM) are a variant of RNN that effectively alleviate the gradient explosion and gradient disappearance problems that occur with RNN processing longer time series data. Therefore, when long-time sequence data is processed, the LSTM can effectively solve the long-term dependence problem of effective information in the long-time sequence, is more sensitive to deep features of long-term historical data, is easier to capture key information hidden in the long-term historical data, is suitable for being applied to big data processing, and improves prediction accuracy. For example, Yang et al proposed a novel electromagnetic bearing fault detection and isolation method, which used an advanced Long Short Term Memory (LSTM) neural network to build a sensor data model, and could effectively process time series data (Yang, lacing; Guo, Yingqing; Zhao, Wanli. Long short-term memory network based fault detection and isolation for electric-mechanical actuators. NEUROMPUTING.2019, 360:85-96.DOI:10.1016/j. neucom.2019.06.029).
For the judgment of the accuracy of the neural network model, the mean square error MSE and the goodness of fit R are usually relied on2Two indexes are provided. The mean square error MSE can indicate the deviation degree of a predicted result from an actual result, but the sample set dimension difference causes the prediction result not to be readable, the decision coefficient R2 represents the superiority degree of the model compared with the direct averaging, and the result is between 0 and 1, the closer to 1, the better the model effect is. Research shows that the Bayesian method for quantitative model verification has larger exploration space, so that the Bayesian hypothesis testing method is introduced into the inventionThe method can quantify the reliability of the model and also consider the prior information of the training set. The calculation of the expression of the Bayes factor was studied as in Jiang et al, facilitating the overall reliability assessment of model validation (Jiang X M, Mahadevan S. Bayesian structural evaluation modeling for the systematic uncertain qualification. International Journal for the Numerical Methods in Engineering,2009,80(6): 717) 737.).
The invention provides a fault early warning method based on deep learning in the process of researching fault early warning in large-scale rotating machinery, which processes, analyzes and trains a real-time monitoring data model, establishes an LSTM neural network prediction model considering data uncertainty and a quantitative reliability inspection method according to a Bayesian lambda value.
Disclosure of Invention
Aiming at the defects of the prior art, the invention ingeniously combines advanced signal processing, mode recognition, intelligent algorithm and a probability decision method, provides a rotating machinery fault early warning method based on a Bayesian LSTM model, can realize that multiple parameter data share one algorithm frame, improves the correlation degree of data utilization and improves the efficiency; the fault information can be judged and analyzed more accurately. The method comprises the steps of firstly utilizing discrete wavelet packet threshold denoising and probability principal component analysis to process data; thirdly, performing phase space reconstruction on the data set by utilizing a C-C algorithm to obtain a reconstruction data set according to the embedding dimension and the optimal time delay; then, taking one part of the data set as a training set to train the LSTM model, taking the other part of the data set as a verification set to verify the reliability of the model, and firstly providing a Bayesian model reliability test method; and finally, a method for judging whether to give out an early warning or not is provided on the basis of the prior information of the historical data set, so that the model can accurately give out an alarm in the fault creep stage of the unit or when the monitoring system has a problem.
The invention is realized by at least one of the following technical solutions.
A fault early warning method of a rotating machine based on a Bayesian LSTM model comprises the following steps:
s1, reading n groups of time sequencesProcessing the signal data into a p-dimensional matrix Xp×nConveniently processing signals to obtain noise-containing abnormal-free data, decomposing the signals by a discrete wavelet packet threshold denoising method, filtering wavelet coefficients and reconstructing to effectively remove noise;
s2, for p-dimensional sample data, each dimension may not independently represent equipment status, and each dimension may be correlated with each other, in order to reduce the dimension and improve efficiency, Probability Principal Component Analysis (PPCA) is performed on the sample data, and the sample data is reduced to q-dimension Xq×n;
S3, performing phase space reconstruction on the dimensionality reduction signal data by using a C-C algorithm;
s4, dividing the sample data into training, verifying and testing data according to proportion, and facilitating the training, verifying and testing of the prediction model; constructing an LSTM (least squares metric) cyclic neural network, taking a sigmoid function as an activation function, and taking a back propagation algorithm as a training algorithm of the activation function to obtain optimal parameters of a model; verifying the model precision by using the verification sample, judging whether the model precision meets the application condition or not, and continuing training if the model precision does not meet the precision requirement; if the model prediction is correct, the model is applied by using a test sample, the confidence coefficient of the data predicted by the model is solved by adopting a Bayesian hypothesis reliability test method, and finally, a judgment method for whether to give out an early warning is given on the basis of the prior information of a historical data set, so that the model can accurately give out an alarm in a unit fault creep stage or when a monitoring system has a problem.
Further, in step S1, the read signal data is an original signal, that is, a signal collected by the sensor without any processing is usually collected from the sensor, the data recorded by the real-time monitoring system often contains noise, and if the data does not undergo denoising processing, the data affects the subsequent signal analysis, so that the analysis has a deviation, and the result has no meaning; therefore, a discrete wavelet packet threshold denoising method is adopted to denoise an original signal, and the method specifically comprises the following steps:
after a noisy signal is subjected to multi-layer decomposition, the energy of the noisy signal is mainly concentrated in partial wavelet packet decomposition coefficients, the noise energy is distributed in the coefficients of the whole wavelet domain, and the amplitude value of the wavelet packet transformation coefficient of the signal is larger than that of the wavelet packet transformation coefficient of the noise; setting a proper threshold value, screening out wavelet packet transformation coefficients caused by signals, filtering out coefficients caused by noise, and performing wavelet reconstruction to obtain signals without noise, so that the selection of the threshold value has a decisive influence on the denoising effect of the wavelet packet threshold value;
the threshold is determined by a Bayesian estimation method, the advantage of prior information is considered, the threshold is established in the face of a risk function, and the denoising effect is effectively enhanced;
in discrete wavelet packet transform, the basic wavelet function is typically:
wherein t represents a continuous time variable; pi is a time index; the syndrome of an attack is the frequency index; z is the set of all integers; l is2(R) represents hilbert space; ψ is a basic wavelet function;
in practical applications of the discrete wavelet packet technique, a signal f (t) with n discrete data points is wavelet decomposed:
fj(t) the wavelet series is further decomposed into wavelet packet components:
w
nwavelet function cluster representing the nth discrete data point,
n 0,1,2, …, where k represents the number of layers to continue decomposition, 0 ≦ k ≦ j, coefficient
Referred to as f (t) at a resolution of j-kOrthogonal wavelet packet decomposition coefficients; w is a
n(t) satisfies the two-scale equation:
wherein, { h }
n}
n∈ZIs a conjugate quadrature mirror filter satisfying
g
n=(-1)
kh
1-k;
k,lIs a Kronecker function, and satisfies
k,l∈Z;
When n is 0, w0(t) is a scale function, and when n is 1, w is1(t) is the wavelet function ψ (t), and in this case, equation (1) can be expressed as:
wavelet packet decomposition coefficient
Next layer decomposition coefficient
The following is obtained by a recurrence formula:
if the initial value of the wavelet packet decomposition coefficient is the discrete data, the reconstruction algorithm is as follows:
and (4) carrying out noise-containing abnormal signal decomposition by using the formula (6), obtaining the wavelet packet decomposition coefficient of each node of the binary tree in the 3-level wavelet packet decomposition, filtering out the wavelet packet decomposition coefficient according to a threshold value, and then carrying out signal reconstruction according to the formula (7).
Further, the threshold is determined by using a bayesian estimation method, which includes:
the threshold is determined by adopting a Bayes shock threshold estimation method with self-adaptability, and the calculation formula is as follows:
T=σ2/σx; (8)
wherein σ2Is the variance of the noise, σxIs the original signal variance;
the noisy signal is:
f(t)=g(t)+(t); (9)
wherein g (t) is the original signal and (t) is the noise signal;
writing a wavelet packet decomposition coefficient d obtained by performing discrete wavelet packet transformation on the signal into:
d=dx+d; (10)
wherein d isxIs the noise wavelet coefficient, d is the original signal wavelet coefficient;
the noise standard deviation was estimated as:
wherein,
that is, the wavelet coefficients of each node in the binary tree are averaged by sigma
yRepresenting the noisy signal variance:
the raw signal variance is estimated as:
thus obtaining Bayes threshold value capable of being adaptively adjusted according to scale
The wavelet coefficients can be filtered by selecting a soft threshold function:
further, in step S2, performing Probabilistic Principal Component Analysis (PPCA) on the noise-reduced signal to reduce the dimension of the multidimensional data and to process the uncertainty of the data; probability Principal Component Analysis (PPCA) defines a proper probability model for Principal Component Analysis (PCA), thereby overcoming the limitation that the traditional principal component analysis simply discards other non-principal components; in the probability principal component analysis, the discarded information is used as Gaussian noise estimation, and the useful information in the original signal can be retained to the maximum extent;
with sample X of dimension pp×nIn the probabilistic principal component analysis, a sample X is assumedp×nExpressed as X ═ Wz + μ +, where W is the weight vector, dimension p × q, q ≦ p, and z is the q × N dimension obeying z to N (0, I)q) A random gaussian vector, also known as the result of X dimensionality reduction, μ is the sample mean, which is noise, assuming that the noise follows a gaussian distribution with variance σ:
substituting into Bayes formula to obtain z posterior probability P (z | X) -N (M)-1WT(X-μ),σ2M-1) Wherein M ═ σ2Iq+WTW)-1In Bayesian probabilistic principal component analysis, the expected M of the z posterior probability is considered-1WT(X- μ) is the result of dimension reduction of X, and then only W and σ in the formula are unknown numbers, and W and σ are estimated by the maximum likelihood function, and the estimation result is:
wherein λjIs obtained by decomposing the covariance matrix of the sample X according to the eigenvalues, i.e. Cvj=λjvj,vjIs a feature vector, Uq=(v1,v2,…,vq),Δq=diag(λ1,…,λq);
When q is equal to p, there is the equation W-1X=z+W-1μ+W-1Using W-1The method can be used for carrying out primary fault diagnosis by searching reversely with the main component; w-1Expressed as:
wherein, w12Represents the data sample Xp×nThe contribution rate of the second-dimensional signal of (a) to the first principal component; by utilizing the principle, the influence degree of all signals on the first main component can be obtained, when the main component is abnormal, the probability of the signal with high contribution rate to have fault is the maximum, the signal is diagnosed preferentially, and the problem can be found quickly and effectively.
Further, in step S3, since the artificial neural network has a strong nonlinear mapping capability, the artificial neural network is often combined with the C-C algorithm, and in the reconstructed phase space, the artificial neural network is used to approximate the mapping relationship between the current state and the future state to predict the chaotic time series, and meanwhile, the input and output of the neural network are determined by the reconstructed m of the embedded dimension and the delay time τ; for the determination of the delay time τ and the embedding dimension m, the delay time and the embedding dimension can be estimated simultaneously using correlation integration, as follows:
first construct disjoint time series matrices:
assuming that the maximum value of the delay time τ is T, where T is a natural number in [1, T ], k is INT (N/T), N is the length of the time series data set, and each row of the time series matrix is called a sub-time series vector; defining the correlation product of each subsequence as:
where r > 0, M is the number of phase space points M ═ N- (M-1) t, where the embedding dimension M may then be 2,3,4,5, d
bc=||Y
B-Y
C||
∞,
The correlation integral is a cumulative distribution function and represents the probability that the distance between any two points in the phase space is less than the radius r, and the distance between the points is represented by the infinite norm of the difference of vectors; define the detection statistic as:
S(m,N,r,t)=C(m,N,r,t)-Cm(1,N,r,t); (21)
the delay time τ may be obtained by the first local minimum of the following equation:
where Δ S (m, t, N) is the maximum minimum delta of the statistic S (r) corresponding to the radius r:
ΔS(m,t,N)=max{S(m,N,rj,t)}-min{S(m,N,rj,t)}; (23)
while the delay window τw(m-1) τ can be obtained by the minimum value of the following formula:
embedding dimension m, and finally reconstructing a time sequence to obtain a reconstruction matrix F in a formula (19); and taking the element of each row in the matrix as an input node of the neural network, and taking the next tau time of the row corresponding to the last element as an output node, constructing an input and output layer, completing phase space reconstruction, and obtaining reconstructed data, namely sample data.
Further, step S4 specifically includes the following steps:
s4.1, constructing an LSTM prediction model;
s4.2, training an LSTM prediction model;
s4.3, carrying out reliability evaluation on the LSTM model;
and S4.4, early warning is carried out by setting a threshold value on the basis of the prior information of the historical data set.
Further, in step S4.1, a data set obtained after reconstructing the phase space is divided into a training set, a verification set and a test set, the training set is used for building a recurrent neural network prediction model, and the recurrent neural network prediction model is built by using a recurrent neural network method (LSTM); embedding dimension m obtained by the C-C algorithm is the number of input units, and delay time tau is the distance between input points;
each LSTM unit consists of 3 gates, which are a forgetting gate (Forget gate), an Input gate (Input gate), and an Output gate (Output gate);
in the used LSTM model, the forgetting gate has the function of discarding some useless information; vector h for output of last time instantt-1And x of the vector input at the current timetInputting together, and discarding irrelevant information in the input; forgetting the door determines the unit state c of the last momentt-1How much to keep current time ct(ii) a The formula for a forget gate is as follows:
ft=σ(Wxf·[ht-1,xt]+bf); (25)
wherein, WxfAnd bfRespectively representing a weight matrix and an offset term of the forgetting gate, sigma being a sigmoid function, [ h ]t-1,xt]Means to concatenate two vectors into one longer vector;
the input gate controls the input information to the network, and the updating of the state of the input gate requires two parts to work together: firstly determining which information should be updated, secondly generating a new vector by the tanh layer, the input gate determining the input x of the network at the current momenttHow many cells to save to cell state ct;itIndicating an input gate,/tIndicating the state of the currently input cell, ctCell state representing the current time:
it=σ(Wxi·[ht-1,xt]+bi); (26)
It=tanh(Wxl·[ht-1,xt]+bl); (27)
wherein, WxiAnd biWeight matrix and offset term, W, representing input gates, respectivelyxlAnd blWeight matrix and bias term respectively representing cell state;
the output gate controls the output information of the LSTM prediction model; it controls the cell state ctHow much current final output value h is output to LSTMt,otValue representing output gate output:
ot=σ(Wxo·[ht-1,xt]+bo); (29)
wherein, WxoAnd boWeight matrix and bias term respectively representing output gate; the final output values for each set of training are as follows:
yt=Why·ht; (31)
Whya weight matrix representing the final LSTM model output.
Further, in step S4.2, a back propagation algorithm is adopted as the LSTM training algorithm; there are 8 sets of parameters to be learned by LSTM, which are: weight matrix W of forgetting gatexfAnd bias term bfWeight matrix W of input gatesxiAnd bias term biWeight matrix W of output gatesoAnd bias term boAnd calculating a weight matrix W of cell statesxlAnd bias term bl(ii) a Since the two parts of the weight matrix use different formulas in the back propagation, in the subsequent derivation, the weight matrix Wxf、Wxi、Wxo、WxlWill be written as two separate matrices: wfh、Wfx、Wih、Wix、Woh、Wox、Wlh、Wlx;
The following formula, if not specified, E represents the loss function; and LSTM has four weighted inputs, respectively ft、it、ct、otFor passing an error term up; the specific steps of step S4.2 are as follows:
s4.2.1, the error term is passed backwards in time, and the error term is passed forward to the equation at any time k:
s4.2.2, pass the error term to the previous layer:
s4.2.3, calculating weight gradient and bias term gradient:
in the above, the error term has been foundo,t、f,t、i,t、l,tThen, the W at time t is determinedoh、Wih、WfhAnd WlhThe final gradient is obtained by adding the gradients at the various times together:
the offset term b followsf、bi、bl、boLike the weight gradient, the gradients at the respective times are added together; the final bias term gradient, i.e., the bias term gradients at each time instant are added together, is as follows:
for Wfx、Wix、Wlx、WoxThe weight gradient of (2) is only required to be based onThe corresponding error term can be directly calculated:
further, in step S4.3, the verification set obtained in step S4.1 is used to verify whether the result of the prediction of the recurrent neural network prediction model is accurate; at present, the judgment of the precision of a prediction model of a recurrent neural network depends on Mean Square Error (MSE) and a decision coefficient R2Two indexes; the mean square error MSE can indicate the deviation degree of the predicted result from the actual result, but the difference of the sample set dimension causes the predicted result not to have readability, and the coefficient R is determined2The superiority of the model compared with the direct averaging is shown, the result is between 0 and 1, and the closer to 1, the better the model effect is;
by adopting a Bayesian hypothesis testing method, the reliability of the model is quantified, and the prior information of the training set can be considered at the same time, specifically as follows:
assume that the data predicted by the LSTM prediction model is
Corresponding to the actual data being y
exp={y
1,y
2,…,y
nN is the number of data; let
Residual representing ith vibration signal and ith prediction signalIf there are n residuals, { e1, e2, …, en }, the residual e is
iObeying a normal distribution N (μ, σ 1)
2) Then mean of residual errors
Obeying a normal distribution N (mu, sigma)
1 2N); establishing the original hypothesis H
0And alternative hypothesis H
1:
H0:μ=0,H1:μ≠0;
Assume a prior probability density of N (0, σ) for μ0 2),σ0 2The verification set is selected as prior information, namely, the mean value of the prediction error of the verification set is calculated in a segmentation mode, and the variance of the mean value is used as sigma0 2(ii) a The probability density function for μ is found to be:
in combination with the regularity of the normal distribution,
the edge density function for g (μ) is:
the bayesian factor is then expressed as:
taking logarithm on two sides of formula (48):
wherein lambda is the confidence of the reliability of the recurrent neural network prediction model,
time lambda → 0, which means that the confidence of the support model is 0%, the recurrent neural network prediction model is unreliable; while
Time λ → ∞ indicates that the confidence of supporting the model is 100%, and the recurrent neural network prediction model is very reliable.
Further, in step S4.4, if the prediction result of the recurrent neural network prediction model is accurate, the residual between the actual value and the model prediction value, which is the value of the real-time monitoring value after denoising, dimensionality reduction and phase space reconstruction, can be used for judgment;
when a unit or a monitoring system has a fault at a certain moment, the fault can be reflected on a signal detected by the monitoring system, and a larger error can be generated between the fault and the signal in a rational health state; the cyclic neural network prediction model gives predictions between the fault moments, and the predicted values given according to the previous health data are regarded as the rational health data, so that a larger difference exists between the predicted values and the monitoring values of the monitoring system, and whether to give out early warning is judged by identifying the large error;
whether an alarm is given is determined by setting a threshold value, wherein the threshold value is the maximum absolute value e of the error between the predicted value and the actual value of the predictive model of the verification centralized circulation neural networkmax(ii) a The training set and the verification set are historical data sampled under the health state of the unit, but the training set is used for building a model, so that the training set naturally has smaller errors on the whole and has no reference; the verification set is an application of the model, if a time threshold is set artificially in the process of monitoring the unit by using the recurrent neural network prediction model, and if the duration time greater than zeta or the total time exceeds the time threshold, the unit is judged to have a potential fault and needs to be repaired.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method for denoising noisy data sets by adopting discrete wavelet packet transformation overcomes the limitation that the conventional wavelet transformation can only transform a low-pass filtering result, the discrete wavelet packet transformation is a more refined signal analysis method, can decompose low-frequency and high-frequency parts at the same time, and adopts a Bayesian estimation method to determine a threshold value, so that the destuffing effect is enhanced, and the prediction precision of an LSTM model can be effectively improved;
(2) the PPCA is adopted to reduce the dimension of the data set, and the limitation that the traditional PCA simply discards other non-principal components is overcome. In PPCA, this discarded information is used as gaussian noise estimation, which can maximally preserve the useful information in the original signal.
(3) By adopting a Bayesian hypothesis testing method, the reliability of the model is quantified, the prior information of a training set can be considered, the influence of uncertain factors on the accuracy of the LSTM prediction model is quantitatively analyzed, quantitative evaluation and error correction are carried out by using the method, the prediction accuracy is improved, and seamless integration of Bayesian inference and the LSTM prediction model is realized.
Drawings
FIG. 1 is a schematic diagram of a three-level wavelet decomposition according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an LSTM unit in an embodiment of the present invention;
FIG. 3 is a schematic diagram of error between a fault signal and a health signal and a predicted signal according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a noise reduction effect of a vibration signal according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the phase space reconstruction of the C-C algorithm of the principal component of the vibration signal according to the embodiment of the present invention;
FIG. 6 is a schematic diagram of a construction process of a prediction model according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating the prediction of the vibration principal component signal according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a reliability test λ value of the prediction model in an embodiment of the present invention;
FIG. 9 is a schematic diagram of the application of a Bayesian LSTM prediction model in monitoring crack failure in an embodiment of the present invention;
fig. 10 is a flowchart of a fault warning method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided by way of a specific example with reference to the accompanying drawings, but the embodiments of the present invention are not limited thereto.
Example 1:
in the embodiment, a fault early warning method for a rotating machine based on a Bayesian LSTM model is characterized in that a high-medium-pressure (HIP) cylinder closing monitoring and control system of a steam turbine is provided, when an abnormal condition occurs in the system, the cylinder closing operation can be stopped, and steam fed into a cylinder body expands to push a rotor to rotate to apply work; steam turbines are large and precise high-speed rotating equipment, and the gap between a rotor and a cylinder body is very narrow, so that a matched monitoring system is required to monitor the operation. When an existing TSI gives an alarm, the unit is usually shut down; in the embodiment, data of 1 month of operation of a high-medium-pressure (HIP) combined cylinder of a steam turbine in a steam turbine instrument monitoring system (TSI) of a certain nuclear power plant in China is taken to explain the algorithm flow of the whole early warning model, and whether the model is effective or not is proved by judging whether a signal can be correctly predicted or not; as shown in fig. 10, the method comprises the following steps:
s1, reading n groups of time-series signal data, and processing the signal data into a p-dimensional matrix Xp×nConveniently processing signals to obtain noise-containing abnormal-free data, decomposing the signals by a discrete wavelet packet threshold denoising method, filtering wavelet coefficients and reconstructing to effectively remove noise;
the read signal data are original signals, namely signals collected by the sensor and not subjected to any processing are usually collected from the sensor, the data recorded by the real-time monitoring system often contain noise, if the data do not undergo denoising processing, the data can influence the next signal analysis, so that the analysis has deviation, and the result has no meaning; therefore, a discrete wavelet packet threshold denoising method is adopted to denoise an original signal, and the method specifically comprises the following steps:
as shown in fig. 1, after a noisy signal is decomposed in multiple layers, its energy is mainly concentrated in partial wavelet packet decomposition coefficients, and the noise energy is distributed in the coefficients of the whole wavelet domain, the amplitude of the wavelet packet transform coefficient of the signal itself is greater than that of the wavelet packet transform coefficient of the noise; setting a proper threshold value, screening out wavelet packet transformation coefficients caused by signals, filtering out coefficients caused by noise, and performing wavelet reconstruction to obtain signals without noise, so that the selection of the threshold value has a decisive influence on the denoising effect of the wavelet packet threshold value;
the threshold is determined by a Bayesian estimation method, the advantage of prior information is considered, the threshold is established in the face of a risk function, and the denoising effect is effectively enhanced;
in discrete wavelet packet transform, the basic wavelet function is typically:
wherein t represents a continuous time variable; pi is a time index; the syndrome of an attack is the frequency index; z is the set of all integers; l is2(R) represents hilbert space; ψ is a basic wavelet function;
in practical applications of the discrete wavelet packet technique, a signal f (t) with n discrete data points is wavelet decomposed:
fj(t) the wavelet series is further decomposed into wavelet packet components:
w
nwavelet function cluster representing the nth discrete data point,
n 0,1,2, …, where k represents the number of layers to continue decomposition, 0 ≦ k ≦ j, coefficient
Referred to as f (t) at resolution j-a coefficient of orthogonal wavelet packet decomposition at k; w is a
n(t) satisfies the two-scale equation:
wherein, { h }
n}
n∈ZIs a conjugate quadrature mirror filter satisfying
g
n=(-1)
kh
1-k;
k,lIs a Kronecker function, and satisfies
k,l∈Z;
When n is 0, w0(t) is a scale function, and when n is 1, w is1(t) is the wavelet function ψ (t), and in this case, equation (1) can be expressed as:
wavelet packet decomposition coefficient
Next layer decomposition coefficient
The following is obtained by a recurrence formula:
if the initial value of the wavelet packet decomposition coefficient is the discrete data, the reconstruction algorithm is as follows:
and (4) carrying out noise-containing abnormal signal decomposition by using the formula (6), obtaining the wavelet packet decomposition coefficient of each node of the binary tree in the 3-level wavelet packet decomposition, filtering out the wavelet packet decomposition coefficient according to a threshold value, and then carrying out signal reconstruction according to the formula (7).
The threshold is determined by using a bayesian estimation method, which specifically includes:
the threshold is determined by adopting a Bayes shock threshold estimation method with self-adaptability, and the calculation formula is as follows:
T=σ2/σx; (8)
wherein σ2Is the variance of the noise, σxIs the original signal variance;
the noisy signal is:
f(t)=g(t)+(t); (9)
wherein g (t) is the original signal and (t) is the noise signal;
writing a wavelet packet decomposition coefficient d obtained by performing discrete wavelet packet transformation on the signal into:
d=dx+d; (10)
wherein d isxIs the noise wavelet coefficient, d is the original signal wavelet coefficient;
the noise standard deviation was estimated as:
wherein,
that is, the wavelet coefficients of each node in the binary tree are averaged by sigma
yRepresenting the noisy signal variance:
the raw signal variance is estimated as:
thus obtaining Bayes threshold value capable of being adaptively adjusted according to scale
The wavelet coefficients can be filtered by selecting a soft threshold function:
in this embodiment, a steam turbine signal of one month is sampled, which includes 29 signals of six types including vibration, expansion difference (cylinder expansion amount-rotor expansion amount), rotor shaft displacement, temperature and rotation speed, that is, 29 sets of time-series signal data, and the signal data is processed into a 720-dimensional matrix X720×29(ii) a Table 1 is a sample signal description:
TABLE 1 sample Signal description
In order to realize the prediction of the HIP cylinder combination monitoring signal, in this embodiment, data before 4 months and 18 days is used as a training set to build a model, a weight is trained, data from 4 months and 19 days to 4 months and 24 days is used as a verification set to verify, and data after 24 days is used as a test set to test the effect of the actual use of the model.
After a noisy abnormal-free data set is obtained, DWPT denoising is carried out to decompose signals, FIG. 4 shows the denoising effect of the vibration signals, and it can be known from the graph that the change trends of the fold lines before and after denoising are basically consistent, so that the algorithm retains useful information while filtering noise, has higher similarity with the original signals, and is more beneficial to predicting model training and finding weight information.
S2, for sample data of p dimension, each dimensionThe device conditions may not be represented independently, and each dimension may be correlated with each other, in order to reduce the dimensions and improve the efficiency, Probability Principal Component Analysis (PPCA) is performed on sample data, and the sample data is reduced to q dimension Xq×n;
Adopting Probability Principal Component Analysis (PPCA) to realize multidimensional data dimension reduction and data processing uncertainty aiming at the denoised signals; probability Principal Component Analysis (PPCA) defines a proper probability model for Principal Component Analysis (PCA), thereby overcoming the limitation that the traditional principal component analysis simply discards other non-principal components; in the probability principal component analysis, the discarded information is used as Gaussian noise estimation, and the useful information in the original signal can be retained to the maximum extent;
with sample X of dimension pp×nIn the probabilistic principal component analysis, a sample X is assumedp×nExpressed as X ═ Wz + μ +, where W is the weight vector, dimension p × q, q ≦ p, and z is the q × N dimension obeying z to N (0, I)q) A random gaussian vector, also known as the result of X dimensionality reduction, μ is the sample mean, which is noise, assuming that the noise follows a gaussian distribution with variance σ:
substituting into Bayes formula to obtain z posterior probability P (z | X) -N (M)-1WT(X-μ),σ2M-1) Wherein M ═ σ2Iq+WTW)-1In Bayesian probabilistic principal component analysis, the expected M of the z posterior probability is considered-1WT(X- μ) is the result of dimension reduction of X, and then only W and σ in the formula are unknown numbers, and W and σ are estimated by the maximum likelihood function, and the estimation result is:
wherein λjIs obtained by decomposing the covariance matrix of the sample X according to the eigenvalues, i.e. Cvj=λjvj,vjIs a feature vector, Uq=(v1,v2,…,vq),Δq=diag(λ1,…,λq);
When q is equal to p, there is the equation W-1X=z+W-1μ+W-1Using W-1The method can be used for carrying out primary fault diagnosis by searching reversely with the main component; w-1Expressed as:
wherein, w12Represents the data sample Xp×nThe contribution rate of the second-dimensional signal of (a) to the first principal component; by utilizing the principle, the influence degree of all signals on the first main component can be obtained, when the main component is abnormal, the probability of the signal with high contribution rate to have fault is the maximum, the signal is diagnosed preferentially, and the problem can be found quickly and effectively.
In this embodiment, in order to reduce data dimensions, sample information is represented by a smaller variable, so that a prediction result is representative, and PPCA analysis is performed on data at present, specifically: only one-dimensional signal types, namely expansion difference and shaft displacement, are reserved, PPCA dimension reduction is only carried out on vibration, temperature and rotating speed, and the dimension reduction effect is shown in table 2. In table wiWhich represents the rate of contribution of each component signal to the first principal component, for three signals subjected to PPCA analysis. The bold part represents a component having a relatively high influence on the principal component, which makes sense in performing fault diagnosis after fault warning, and the health condition of these components can be preferentially judged.
TABLE 2 vibration, temperature, rotational speed PCA weight, contribution ratio
S3, performing phase space reconstruction on the dimensionality reduction signal data by using a C-C algorithm;
as shown in fig. 5, because the artificial neural network has a strong nonlinear mapping capability, the artificial neural network is often combined with a C-C algorithm, and in a reconstructed phase space, the artificial neural network is used to approximate the mapping relationship between the current state and the future state to predict the chaotic time sequence, and meanwhile, the input and output of the neural network are determined by the reconstructed m of the embedded dimension and the delay time τ; for the determination of the delay time τ and the embedding dimension m, the delay time and the embedding dimension can be estimated simultaneously using correlation integration, as follows:
first construct disjoint time series matrices:
assuming that the maximum value of the delay time τ is T, where T is a natural number in [1, T ], k is INT (N/T), N is the length of the time series data set, and each row of the time series matrix is called a sub-time series vector; defining the correlation product of each subsequence as:
where r > 0, M is the number of phase space points M ═ N- (M-1) t, where the embedding dimension M may then be 2,3,4,5, d
bc=||Y
B-Y
C||
∞,
The correlation integral is a cumulative distribution function and represents the probability that the distance between any two points in the phase space is less than the radius r, and the distance between the points is represented by the infinite norm of the difference of vectors; define the detection statistic as:
S(m,N,r,t)=C(m,N,r,t)-Cm(1,N,r,t); (21)
the delay time τ may be obtained by the first local minimum of the following equation:
where Δ S (m, t, N) is the maximum minimum delta of the statistic S (r) corresponding to the radius r:
ΔS(m,t,N)=max{S(m,N,rj,t)}-min{S(m,N,rj,t)}; (23)
while the delay window τw(m-1) τ can be obtained by the minimum value of the following formula:
embedding dimension m, and finally reconstructing a time sequence to obtain a reconstruction matrix F in a formula (19); and taking the element of each row in the matrix as an input node of the neural network, and taking the next tau time of the row corresponding to the last element as an output node, constructing an input and output layer, completing phase space reconstruction, and obtaining reconstructed data, namely sample data.
In this embodiment, the principal component obtained by PPCA analysis already disturbs the original signal component relationship, and is likely to become a chaotic time series. In this case, the principal component needs to be reconstructed in phase space, and the input layer of the prediction model is determined. Fig. 2 shows the response values of the discriminant formula of the principal component of the vibration signal in the C-C algorithm at different delay times, where the delay time τ is 4 or τ is 3, and τ is shown in the figurew7 according to the formula τwIf the dimension m is 4 or m is 4, the number of input units for which the prediction model is to be determined is 4.
S4, dividing the sample data into training, verifying and testing data according to proportion, and facilitating the training, verifying and testing of the prediction model; constructing an LSTM (least squares metric) cyclic neural network, taking a sigmoid function as an activation function, and taking a back propagation algorithm as a training algorithm of the activation function to obtain optimal parameters of a model; verifying the model precision by using the verification sample, judging whether the model precision meets the application condition or not, and continuing training if the model precision does not meet the precision requirement; if the model prediction is correct, applying the model by using a test sample, solving the confidence coefficient of the data predicted by the model by adopting a Bayesian hypothesis reliability test method, and finally giving a judgment method for whether to give out an early warning or not on the basis of the prior information of a historical data set, so that the model can accurately give out an alarm in a unit fault creep stage or when a monitoring system has a problem; as shown in fig. 6, the method specifically includes the following steps:
s4.1, constructing an LSTM prediction model;
dividing a data set obtained after phase space reconstruction into a training set, a verification set and a test set, wherein the training set is used for building a recurrent neural network prediction model, and the recurrent neural network prediction model is built by adopting a recurrent neural network method (LSTM); embedding dimension m obtained by the C-C algorithm is the number of input units, and delay time tau is the distance between input points;
as shown in fig. 2, each LSTM unit consists of 3 gates, namely a forgetting gate (Forget gate), an Input gate (Input gate), and an Output gate (Output gate);
in the used LSTM model, the forgetting gate has the function of discarding some useless information; vector h for output of last time instantt-1And x of the vector input at the current timetInputting together, and discarding irrelevant information in the input; forgetting the door determines the unit state c of the last momentt-1How much to keep current time ct(ii) a The formula for a forget gate is as follows:
ft=σ(Wxf·[ht-1,xt]+bf); (25)
wherein, WxfAnd bfRespectively representing a weight matrix and an offset term of the forgetting gate, sigma being a sigmoid function, [ h ]t-1,xt]Means to concatenate two vectors into one longer vector;
the input gate controls the input information to the network, and the updating of the state of the input gate requires two parts to work together: firstly determining which information should be updated, secondly generating a new vector by the tanh layer, the input gate determining the input x of the network at the current momenttHow many cells to save to cell state ct;itIndicating an input gate,/tIndicating the state of the currently input cell, ctCell state representing the current time:
it=σ(Wxi·[ht-1,xt]+bi); (26)
It=tanh(Wxl·[ht-1,xt]+bl); (27)
wherein, WxiAnd biWeight matrix and offset term, W, representing input gates, respectivelyxlAnd blWeight matrix and bias term respectively representing cell state;
the output gate controls the output information of the LSTM prediction model; it controls the cell state ctHow much current final output value h is output to LSTMt,otValue representing output gate output:
ot=σ(Wxo·[ht-1,xt]+bo); (29)
wherein, WxoAnd boWeight matrix and bias term respectively representing output gate; the final output values for each set of training are as follows:
yt=Why·ht; (31)
Whya weight matrix representing the final LSTM model output.
S4.2, training an LSTM prediction model;
adopting a back propagation algorithm as an LSTM training algorithm; there are 8 sets of parameters to be learned by LSTM, which are: weight matrix W of forgetting gatexfAnd bias term bfWeight matrix W of input gatesxiAnd bias term biWeight matrix W of output gatesoAnd bias term boAnd calculating a weight matrix W of cell statesxlAnd bias term bl(ii) a Since the two parts of the weight matrix use different formulas in the back propagation, in the subsequent derivation, the weight matrix Wxf、Wxi、Wxo、WxlWill be written as two separate matrices: wfh、Wfx、Wih、Wix、Woh、Wox、Wlh、Wlx;
The following formula, if not specified, E represents the loss function; and LSTM has four weighted inputs, respectively ft、it、ct、otFor passing an error term up; the specific steps of step S4.2 are as follows:
s4.2.1, the error term is passed backwards in time, and the error term is passed forward to the equation at any time k:
s4.2.2, pass the error term to the previous layer:
s4.2.3, calculating weight gradient and bias term gradient:
in the above, the error term has been foundo,t、f,t、i,t、l,tThen, the W at time t is determinedoh、Wih、WfhAnd WlhThe final gradient is obtained by adding the gradients at the various times together:
the offset term b followsf、bi、bl、boLike the weight gradient, the gradients at the respective times are added together; the final bias term gradient, i.e., the bias term gradients at each time instant are added together, is as follows:
for Wfx、Wix、Wlx、WoxThe weight gradient of (2) can be directly calculated only according to the corresponding error term:
s4.3, carrying out reliability evaluation on the LSTM model;
the verification set obtained in the step S4.1 is used for verifying whether the result predicted by the recurrent neural network prediction model is accurate or not; at present, the judgment of the precision of a prediction model of a recurrent neural network depends on Mean Square Error (MSE) and a decision coefficient R2Two indexes; the mean square error MSE can indicate the deviation degree of the predicted result from the actual result, but the difference of the sample set dimension causes the predicted result not to have readability, and the coefficient R is determined2The superiority of the model compared with the direct averaging is shown, the result is between 0 and 1, and the closer to 1, the better the model effect is;
by adopting a Bayesian hypothesis testing method, the reliability of the model is quantified, and the prior information of the training set can be considered at the same time, specifically as follows:
assume that the data predicted by the LSTM prediction model is
Corresponding to the actual data being y
exp={y
1,y
2,…,y
nN is the number of data; let
When the residual between the ith vibration signal and the ith prediction signal is represented, n residuals { e }
1,e
2,…,e
n}, residual e
iObey positiveDistribution of states N (mu, sigma)
1 2) Then mean of residual errors
Obeying a normal distribution N (mu, sigma)
1 2N); establishing the original hypothesis H
0And alternative hypothesis H
1:
H0:μ=0,H1:μ≠0;
Assume a prior probability density of N (0, σ) for μ0 2),σ0 2The verification set is selected as prior information, namely, the mean value of the prediction error of the verification set is calculated in a segmentation mode, and the variance of the mean value is used as sigma0 2(ii) a The probability density function for μ is found to be:
in combination with the regularity of the normal distribution,
the edge density function for g (μ) is:
the bayesian factor is then expressed as:
taking logarithm on two sides of formula (48):
wherein lambda is the confidence of the reliability of the recurrent neural network prediction model,
time lambda → 0, which means that the confidence of the support model is 0%, the recurrent neural network prediction model is unreliable; while
Time λ → ∞ indicates that the confidence of supporting the model is 100%, and the recurrent neural network prediction model is very reliable.
In this example, various combinations of delay times and embedding dimensions are enumerated in conjunction with phase space reconstruction, with goodness of fit R2The mean square error is used for judging the precision of the model built by each combination, and Table 4 shows that the difference between the goodness of fit and the mean square error of each delay time is small under different embedding dimensions, R is the mean square error of the model built by each combination, and R is the mean square error of the model built by each combination2All above 0.95 and the mean square error is around 0.01, which shows that the prediction model can accurately predict the future signal value, thereby generating a reference value which is comparable with the monitored actual value and is used for judging the health condition of the equipment.
TABLE 4 prediction model accuracy under different combinations of embedding dimension and delay time
And comprehensively comparing, selecting an embedding dimension of 4 and a delay time of 3 to construct a prediction model. The prediction signal is shown in fig. 7. The probability of assuming the model is accurate is 50%, i.e., π0And (5) solving the value in the prediction model verification set by using the Bayesian reliability verification method provided by the invention when the value is 0.5. As can be seen from FIG. 8, the sample size increases with the accumulation of time, and the confidence level eventually stabilizes at 92%, indicating that the prediction model is relatively reliable.
Furthermore, the delay time determines how far into the future the prediction model can see, the longer the delay time, the longer the prediction model can predict into the future.
S4.4, early warning is carried out by setting a threshold value on the basis of prior information of the historical data set;
if the prediction result of the prediction model of the recurrent neural network is accurate, the residual error between the actual value and the predicted value of the model can be judged by utilizing the value of the real-time monitoring value after noise reduction-dimensionality reduction-phase space reconstruction;
as shown in fig. 3, when a fault occurs in a unit or a monitoring system at a certain time, the fault is reflected in a signal detected by the monitoring system, and a large error is generated between the fault and the signal in a health state; the cyclic neural network prediction model gives predictions between the fault moments, and the predicted values given according to the previous health data are regarded as the rational health data, so that a larger difference exists between the predicted values and the monitoring values of the monitoring system, and whether to give out early warning is judged by identifying the large error;
whether an alarm is given is determined by setting a threshold value, wherein the threshold value is the maximum absolute value e of the error between the predicted value and the actual value of the predictive model of the verification centralized circulation neural networkmax(ii) a The training set and the verification set are historical data sampled under the health state of the unit, but the training set is used for building a model, so that the training set naturally has smaller errors on the whole and has no reference; the verification set is an application of the model, if a time threshold is set artificially in the process of monitoring the unit by using the recurrent neural network prediction model, and if the duration time greater than zeta or the total time exceeds the time threshold, the unit is judged to have a potential fault and needs to be repaired.
Example 2:
in the embodiment, the steam turbine signal is monitored by using the recurrent neural network prediction model, and early warning can be sent out during the steam turbine fault creep or when the sensor breaks down.
Example 1 describes the case of a health signal. In order to illustrate the effectiveness of the fault early warning method of the present invention, the present embodiment is described as a fault case. Generally, the model trained by the health training set is a healthy and fault-free model, so that it is predicted that the signal at a future time point should be healthy and fault-free.
In the embodiment, the vibration signal is adopted, and the crack generation of the secondary impeller of the pneumatic pump in the turbine is analyzed. The solid line portion in fig. 9 shows the signal change for about four months. During which 6 alarms from the monitoring system indicate that the vibration signal exceeds the system threshold 6 times. In addition, the value of the signal from 6 days 5 months to 7 days 5 months has a small abnormality caused by the sensor, and the monitoring system does not give an alarm even if the value exceeds the threshold value of the system. Processing the monitoring signal according to the sequence of discrete wavelet packet denoising, phase space reconstruction and LSTM model prediction; in this embodiment, the health data of 2 months and 10 days before is used as a training set to establish a recurrent neural network prediction model, and the health data of 2 months and 10 days to 3 months and 1 day is used as a verification set. The prediction error of the recurrent neural network prediction model is shown by the dashed line in fig. 9, and the threshold is set to the maximum and minimum value of the validation set error, as shown by the thick dashed line parallel to the abscissa in the figure. As shown in the figure, the recurrent neural network prediction model correctly predicts all 6 system alarms, and the rate of missed diagnosis is 0%. And the recurrent neural network prediction model generates an alarm when the monitoring system does not monitor the abnormality from 5 months and 6 days to 5 months and 7 days. As is evident from the figure, the recurrent neural network predictive model always gives an alarm before the monitoring system, and the alarm is advanced by 31 hours on average, and then the equipment is subjected to shutdown inspection on the day of 5 months and 27 days, and severe crack fracture is observed. The analysis shows that the recurrent neural network prediction model constructed by the invention can predict the alarm and sensor faults of the monitoring system in advance.