Disclosure of Invention
The invention provides a cloud virtual machine load prediction method based on multi-scale analysis and a deep network model, and solves the problem that the traditional prediction method is not accurate enough in prediction of cloud virtual machine load data which runs for a long time and has large data volume.
The technical scheme adopted by the invention is that the cloud virtual machine load prediction method based on the multi-scale analysis and the deep network model is implemented according to the following steps:
step 1, collecting data indexes of load conditions of a cloud virtual machine;
step 2, acquiring time sequence data of cloud virtual machine resources and performance parameters;
step 3, performing wavelet transformation on the sequence data obtained in the step 2, and denoising the original data according to a set threshold;
step 4, preprocessing the cloud virtual machine load data subjected to denoising in the step 3;
step 5, dividing the cloud virtual machine load data preprocessed in the step 4 into a training set and a testing set;
step 6, carrying out normalization processing on the training set and the test set of the cloud virtual machine load data in the step 5;
step 7, constructing a DLSTM prediction model of a cloud virtual machine load data time sequence;
step 8, training the prediction DLSTM model in the step 7 by using the normalized training set data in the step 6;
and 9, predicting the data of the test set after normalization in the step 6 by using the DLSTM prediction model after training in the step 7, and evaluating the performance of the DLSTM prediction model.
The invention is also characterized in that:
the time sequence data of the cloud virtual machine performance in the step 2 is CPU response time;
the specific contents of the wavelet transform denoising method in the step 3 are as follows: performing wavelet transformation by using noisy data, setting a threshold lambda, denoising original data, and reconstructing through inverse wavelet transformation to obtain denoised data;
wherein the wavelet in step 3 is db8 in the Daubechies (dbN) wavelet family; setting a threshold lambda, and then adopting a fixed threshold estimation method to remove dryness, wherein the fixed threshold is as follows:
wherein the pretreatment process in the step 4 specifically comprises the following steps: firstly, carrying out first-order difference on denoised data; recording the sequence after de-noising as X ═ X1,x2,...,xn) (n is the length of the entire time series), and the data series after the difference is Y ═ Y (Y is the length of the entire time series)1,y2,...,yn-1) (ii) a Using the post-value minus the pre-value in the sequence, i.e.:
yi=xi+1-xi (2)
obtaining a first-order difference data sequence Y by using a formula (2), thereby eliminating the time dependence of the time sequence;
secondly, converting the first-order difference data sequence into a time step matrix, wherein each unit in the matrix comprises a data segment with the length of a time step for prediction; the time step used in the scheme is 2, and the construction process is as follows: converting the original sequence into a matrix P of n x 11(ii) a Inserting a 0 before the original sequence, and converting into a matrix P of n x 12(ii) a Will matrix P1And P2Merging into an n x 2 matrix P'; i.e. P1=[y1,y2,...,yn]T,P2=[0,y1,y2,...,yn-1]T,
P′=[P2 P1] (3);
Wherein the normalization process in the step 5 specifically comprises the following steps: use of
Denotes x
iNormalized value, | x | Y
maxIs the maximum value in the absolute value of the data after differenceNormalizing the data in the matrix P' to [ -1, 1 [ ]]An interval;
the structure of DLSTM in step 6 is specifically as follows: DLSTM is stacked from multiple LSTMs, each of which maintains a traditional structure; the input of the DLSTM model input layer at the moment of t is recorded as xtThe output of the output layer is ht(ii) a In order to prevent overfitting of the model, a dropout layer is arranged, so that the activation value of a neuron stops working at a certain probability p during forward propagation, and p is set to be 0.3 in the scheme;
connecting an activation layer behind the hidden layer so that the matrix operation result has nonlinearity; the activation function for the forgetting gate and the output gate in LSTM is a Sigmoid function, i.e.
The function outputs a value of 0 or 1, where outputting 0 indicates discarding the current information and outputting 1 indicates retaining the current information;
the input gate activation function being a tanh function, i.e.
For calculating candidate value vector information;
the input of the i-th layer LSTM of the DLSTM prediction model at the time t is x
tAnd h is output at the time t-1
t-1T-1 Module State C
t-1And i-1 layer LSTM hidden states
The components are combined together; output h at time t
tAnd state C
tTransmitting to the t +1 moment; i-th layer LSTM output at time t
Transferring to the next layer of LSTM for auxiliary prediction until the last layer of LSTM obtains an output value; hidden state of first layer LSTM at time t
Transmitting the LSTM of the second layer as input, and repeating the steps until the last LSTM outputs a result;
wherein the LSTM has a forgetting gate, an input gate and an output gate; the forgetting gate calculation method comprises the following steps:
the input gate calculation method comprises the following steps:
the candidate value vector calculation method comprises the following steps:
the state output is:
the output gate calculation method comprises the following steps:
the output is:
ht=ot*tanh(Ct) (9);
in the expressions (4) to (9), W is weight information, h
tIs the output at time t, x
tFor the input at the time t, the input is,
hidden state of i-th layer LSTM at time t, and b is bias term
Wherein, the output data in the step 8 and the step 9 needs to be subjected to normalization removal and differencing removal to obtain a predicted value; the evaluation prediction model adopts a root mean square error RMSE and a minimized root mean square error RMSPE; the formulas are respectively as follows:
where N is the length of the data, yiTo predict value, xiThe cloud virtual machine load is raw data.
The invention has the beneficial effects that:
the cloud virtual machine load prediction method based on the multi-scale analysis and the deep network model solves the problem that the traditional prediction method is low in accuracy in cloud virtual machine load prediction for long-time operation and large data volume. A wavelet transformation method is provided for carrying out multi-scale decomposition on the data, and the data are decomposed into a high-frequency subsequence and a low-frequency subsequence; denoising the wavelet sequence of each scale by using a proper threshold value; through wavelet inverse transformation reconstruction, denoised data are obtained so as to obtain a better prediction effect; compared with the traditional LSTM method, the DLSTM method has the advantages that each layer of LSTM in the DLSTM runs on different time scales, and the result is transmitted to the next layer of LSTM, so that the DLSTM can effectively utilize each layer of LSTM, and more complex time sequence data can be learned. The prediction accuracy of DLSTM is therefore higher when a large amount of data is predicted.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a cloud virtual machine load prediction method based on multi-scale analysis and a Deep network model, the overall framework is shown in fig. 2, the method specifically comprises a method for predicting cloud virtual machine load by a Wavelet Transform (WT) and Deep Long Short Term Memory (DLSTM) neural network model, and the method is implemented by the following steps:
step 1, collecting data indexes of load conditions of a cloud virtual machine;
step 2, acquiring time series data of cloud virtual machine resources and performance parameters, as shown in fig. 4;
step 3, as shown in fig. 3, performing wavelet denoising on the sequence data obtained in step 2; performing wavelet transformation by using the noisy data; setting a threshold lambda, denoising the cloud virtual machine load original data, and reconstructing through inverse wavelet transform to obtain denoised data, as shown in fig. 5. Wherein the fixed threshold is:
step 4, preprocessing the sequence data after denoising in the step 3; firstly, carrying out first-order difference on data, and specifically comprising the following steps: recording the sequence after de-noising as X ═ X1,x2,...,xn) (n is the length of the time series), and the differentiated data series is as follows: y ═ Y1,y2,...,yn-1) The later value in the sequence is subtracted from the former value to obtain a first-order difference data sequence Y, so that the time dependence of the time sequence is eliminatedSex; namely:
yi=xi+1-xi (2)
secondly, converting the first-order difference data sequence into a time step matrix, wherein each unit in the matrix comprises a data segment with the length of a time step for prediction; the time step used in the invention is 2, and the conversion process is as follows: converting the original sequence into a matrix P of n x 11(ii) a Inserting a 0 before the original sequence, and converting into a matrix P of n x 12(ii) a Will matrix P1And P2Merging into an n x 2 matrix P'; i.e. P1=[y1,y2,...,yn]T,P2=[0,y1,y2,...,yn-1]TThe merging method comprises the following steps:
P′=[P2 P1] (3);
step 5, dividing the cloud virtual machine load data preprocessed in the step 4 into a training set and a testing set;
step 6, carrying out normalization processing on the training set and the test set of the cloud virtual machine load data in the step 5; use of
(
Denotes x
iNormalized value, | x | Y
maxMaximum of absolute values of the data after differentiation) to [ -1, 1]An interval;
step 7, constructing a DLSTM prediction model of a cloud virtual machine load data time sequence; as shown in fig. 1, DLSTM is made up of a stack of LSTMs, each of which maintains a conventional structure. The input of the DLSTM model input layer at the moment of t is recorded as xtThe output of the output layer is ht(ii) a In order to prevent overfitting of the model, a dropout layer is arranged, so that the activation value of a neuron stops working at a certain probability p during forward propagation, and p is set to be 0.3 in the method; the DLSTM hidden layer is used for carrying out multi-level abstraction on input features; for each node's input, the hidden layer has a different connection weightThe neurons of the output layer adjust the weights of the neurons of the hidden layer, so that the output result tends to real data;
the active layer is connected after the hidden layer, so that the matrix operation result has nonlinearity, and the active function used by the forgetting gate and the output gate in the LSTM is a Sigmoid function, namely
The function outputs a value of 0 or 1, where
output 0 indicates discarding the current information and
output 1 indicates retaining the current information. The input gate activation function being a tanh function, i.e.
For calculating candidate value vector information;
the input of the i-th layer LSTM of the DLSTM prediction model at the time t is x
tAnd h is output at the time t-1
t-1T-1 Module State C
t-1And i-1 layer LSTM hidden states
The components are combined together; output h at time t
tAnd state C
tTransmitting to the t +1 moment; i-th layer LSTM output at time t
Pass to the next layer of LSTM to assist prediction until the last layer of LSTM takes an output value. Hidden state of first layer LSTM at time t
Transmitting the LSTM of the second layer as input, and repeating the steps until the last LSTM outputs a result;
each layer of LSTM has a forgetting gate, an input gate and an output gate;
the forgetting gate calculation method comprises the following steps:
the input gate calculation method comprises the following steps:
the candidate value vector calculation method comprises the following steps:
the state output is:
the output gate calculation method comprises the following steps:
the output is:
ht=ot*tanh(Ct) (9);
where W is weight information, h
tIs the output at time t, x
tFor the input at the time t, the input is,
hidden state of i-th layer LSTM at time t, and b is bias term
Step 8, training the prediction DLSTM model in the step 7 by using the normalized training set data in the step 6;
step 9, predicting the data of the test set after normalization in the step 6 by using the DLSTM prediction model after training in the step 7, and evaluating the performance of the DLSTM prediction model, as shown in FIG. 6;
examples
The embodiment adopts the load condition of the cloud virtual machine as an example; comparing the prediction result with the LSTM based on the multi-scale analysis and the DLSTM model, and the result is shown in FIG. 7; taking a mean square root error (RMSE) and a minimized Root Mean Square Prediction Error (RMSPE) as evaluation indexes, wherein the formulas are respectively shown as (10) to (11);
where N is the length of the data, yiTo predict value, xiThe method comprises the steps of obtaining original data of cloud virtual machine load;
the method comprises the following specific steps:
step 1, collecting data indexes of load conditions of a cloud virtual machine;
step 2, acquiring time sequence data of cloud virtual machine resources and performance parameters;
step 3.1, performing wavelet transformation on the sequence data obtained in the step 2;
step 3.2, setting a threshold value, and denoising the cloud virtual machine load sequence data;
step 3.3, obtaining de-noised data through wavelet inverse transformation reconstruction, wherein the fixed threshold value is
Step 4.1, performing first-order difference on the denoised data in the step 3, and specifically comprising the following steps: recording the sequence after de-noising as X ═ X1,x2,...,xn) (n is the length of the entire time series), and the data series after the difference is Y ═ Y (Y is the length of the entire time series)1,y2,...,yn-1) Using the value after the sequence minus the value before, i.e. yi=xi+1-xiObtaining a first-order difference data sequence Y, thereby eliminating the time dependence of the time sequence;
step 4.2, converting the first-order difference data sequence into a time step matrix, wherein each unit in the matrix comprises a data segment with the length of a time step for prediction; the time step used in the invention is 2, and the construction process is as follows: converting the original sequence into a matrix of n x 1P1(ii) a Inserting a 0 before the original sequence, and converting into a matrix P of n x 12I.e. P1=[y1,y2,...,yn]T,P2=[0,y1,y2,...,yn-1]T;
Step 4.3, the matrix P1And P2Are combined into an n x 2 matrix P ', i.e. P' ═ P2 P1];
Step 5, dividing the cloud virtual machine load data preprocessed in the step 4 into a training set and a testing set;
step 6, carrying out normalization processing on the training set and the test set of the cloud virtual machine load data in the step 5; the normalization processing process specifically comprises the following steps: use of
(
Denotes x
iNormalized value, | x | Y
maxFor the maximum of the absolute values of the differenced data) to normalize the data in matrix P' to [ -1, 1]An interval;
step 7.1, constructing a DLSTM prediction model of a cloud virtual machine load data time sequence; DLSTM is stacked from multiple LSTMs, each of which maintains a conventional structure. The input of the DLSTM model input layer at the moment of t is recorded as xtThe output of the output layer is ht(ii) a In order to prevent overfitting of the model, a dropout layer is arranged, so that the activation value of a neuron stops working at a certain probability p during forward propagation, and in the method, p is set to be 0.3, so that the generalization of the model is stronger and the model does not depend on certain local characteristics; the DLSTM hidden layer is used for carrying out multi-level abstraction on input features; for the input of each node, the hidden layer has different connection weights, and the neurons of the output layer adjust the weights of the neurons of the hidden layer, so that the output result tends to real data.
The active layer is connected after the hidden layer so that the matrix operation result has non-linearity. The active function used by the forgetting gate and the output gate in the LSTM is a Sigmoid function, and the function outputs a value of 0 or 1, wherein the output of 0 indicates discarding the current information, and the output of 1 indicates keeping the current information. The input gate activation function is a tanh function and is used for calculating candidate value vector information;
step 8, training the prediction DLSTM model in the step 7 by using the normalized training set data in the step 6;
and 9, predicting the data of the test set after normalization in the step 6 by using the DLSTM prediction model after training in the step 7, and evaluating the performance of the DLSTM prediction model.