Disclosure of Invention
The technical problem to be solved and the technical task to be solved by the invention are to perfect and improve the prior technical scheme and provide a method for predicting the fault quantity of the electric energy meter, so as to achieve the aim. Therefore, the invention adopts the following technical scheme.
A method for predicting the fault quantity of an electric energy meter comprises the following steps:
1) acquiring fault table data;
2) calculating a moving average sequence of the obtained sequence according to the fault table data;
3) obtaining a moving average sequence, determining whether an ARIMA model or an exponential smoothing model is used for prediction according to long-term trend and seasonal variation, wherein most of the prediction data in the ARIMA model is matched;
4) restoring the obtained prediction data to the seasonality of the data;
5) acquiring data after the seasonality is restored, and predicting the number of fault tables according to the data;
wherein the setting of the ARIMA model comprises the steps of:
a) acquiring historical data of a fault table;
b) calculating an autocorrelation function and a partial correlation function according to the acquired historical data; utilizing the autocorrelation function and the partial correlation function to carry out order determination on the ARIMA model;
c) method for moment estimation to determine model parameters
d) Verifying model difference times with significance levels
e) And obtaining the set ARIMA model according to the fixed order, the parameters and the difference times obtained by calculation.
The technical scheme combines time sequence decomposition, an exponential smoothing model and an ARIMA model analysis method; the problem of prediction by a single method is solved; if the time decomposition method is adopted, the time series decomposition is to decompose a long-term trend factor, a seasonal variation factor, a cyclic variation factor and an irregular variation factor and then fit the long-term trend, and the fitting is generally curve fitting, so the effect is poor. Therefore, time sequence decomposition is generally not used independently, simple sequence prediction is inaccurate, a complex sequence model is too thin, and the prediction effect is not ideal in any word, so that analysis is carried out on the basis of time prediction decomposition according to long-term trends and seasonal variations in a moving average sequence, and the analysis and prediction of the roughly decomposed trend are determined by using a mature ARIMA model or an exponential smoothing method model. For ARIMA models and exponential smoothing models. When the model is established, the model is easily influenced by large random fluctuation change of data, and the error of a prediction result is overlarge through data analysis. A time series decomposition model is introduced to solve this problem as much as possible. The first step of the time series decomposition model is used to find the moving average series, i.e. the long-term trend T and the part of seasonal variation C. But the TC portion does not necessarily have a T portion. Seasonal minute fluctuations, i.e., seasonal variations, and occasional variations, i.e., irregular variations, of the data are removed. The fluctuation of the fault number of the electric energy meter can be slowed down, and the final prediction error is reduced. However, the TC part still has seasonal variation through a large amount of data analysis. Analysis is performed based on long-term trends and seasonal variations in the moving average sequence to determine an ARIMA or exponential smoothing model.
And (4) according to actual data, repeatedly predicting and proposing reverse moving average. And the data is processed reversely by adopting moving average, namely the predicted number is increased by two times, then the last number and the next number are subtracted to obtain a result, and for the last number, the last number is increased by one time and the last number is subtracted. And finally, carrying out integral treatment on the prediction result to obtain a final prediction result. Seasonal and irregular variations in the reduced sequence. The technical scheme provides a comprehensive time series prediction model based on three models, namely a time series decomposition model, an ARIMA (single product autoregressive moving average) model and an exponential smoothing model. The time sequence decomposition model reduces the fluctuation of data, the ARIMA model can predict unstable sequences, the exponential smoothing model can predict long-term trend sequences, and the three models in the time sequence are combined appropriately to form a comprehensive time sequence model.
As a preferable technical means: in the step 1), the monthly fault number of a certain batch of electric energy meters is collected through the electricity utilization information collection and remote meter reading system. The collected data can be stored in a central database through a data transmission channel and a data receiving system, and the data in the database is used as a data source for predicting the fault quantity of the electric energy meter.
As a preferable technical means: in the step 2), the data is processed by adopting a moving average, and the data of three months are added by adopting the moving average to calculate the average value, so that the number which does not contain seasonality and has small or no random variation factor is obtained; for each datum, the upper number, the lower number and the number are added and averaged, thus obtaining a sequence without the first and the last datum, and for the last datum, the upper number is added and averaged, the first number is not taken into account, thus obtaining a sequence which only comprises two parts of long-term trend and cyclic variation.
As a preferable technical means: in step 3), the ARIMA model and the exponential smoothing model are easily influenced by large random fluctuation variation of data when being established. A time series decomposition model is introduced to solve this problem as much as possible. The first step of the time series decomposition model is used to find the moving average sequence, i.e. the TC part. But the TC portion does not necessarily have a T portion. Seasonal minute fluctuations, i.e., seasonal variations, and occasional variations, i.e., irregular variations, of the data are removed. The fluctuation of the fault number of the electric energy meter can be slowed down, and the final prediction error is reduced. However, the TC part still has seasonal variation through a large amount of data analysis. Analysis is performed based on long-term trends and seasonal variations in the moving average sequence to determine an ARIMA or exponential smoothing model.
As a preferable technical means: in step 4), the data are processed by adopting a moving average in a reverse direction, the number obtained by prediction is increased by two times, then the last number and the next number are subtracted to obtain a final result, and the last number is increased by one time and subtracted by the last number; and finally, carrying out integral treatment on the prediction result to obtain a final prediction result.
As a preferable technical means: in step b), the method for determining the order p, q value of the model is as follows:
sequence y
tThe autocorrelation function measures y
tAnd y
t-kDegree of linear correlation between them, using p
kExpressed, the definition formula is:
in the formula,r
k=cov(y
t,y
t-k);r
0=cov(y
t,y
t) Represents the variance of the sequence; the autocorrelation function is characterized by y
tAnd y
t-kThe degree of linear correlation between; when y is
tAnd y
t-kThere is a correlation between y
tAnd y
t-kRespectively with their middle part y
t-1,y
t-2,…,y
t-k+1If there is a relationship between given y
t-1,y
t-2,…,y
t-k+1On the premise of (b) for y
tAnd y
t-kThe conditional correlation between them is described by the partial autocorrelation function
Proceeding, the partial autocorrelation function can be represented by the formula:
observing which step of the autocorrelation function trails after the autocorrelation function and the partial autocorrelation function are obtained, and recording as p; marking the step in which the partial correlation function trails as q; the order p and q of the model can be determined; the tail refers to the fact that the model autocorrelation function or partial correlation function exhibits an exponential decay and tends towards 0 with increasing time lag k.
As a preferable technical means: in step c), the method for determining parameters by using moment estimation comprises the following steps:
when k is in the model>q, its autocorrelation coefficient satisfies the equation of the autoregressive section:
using estimated autocorrelation coefficients
Instead of rho
kK is q +1, q +2, … and q + p to obtain p equations, and solving the equations to obtain the self-healingMoment estimation of return coefficients
Order to
Then
Wherein,
then obtained by calculation
Instead of the former
Instead of gamma
kThe following can be obtained:
the formula for ARIMA can be derived:
further to find out
And
then will be
Solving the first two equations by substituting the moving average theta of the ARIMA model
1,θ
2,…,θ
qAnd white noise sequence ε
tMean square error
Estimating the moment of (2);
thus, parameters in the ARIMA model are found: autoregressive coefficient, moving average and white noise sequence epsilontThe mean square error.
In the step d), the difference number of the model is determined by the significance level, wherein the significance level is the probability that the estimation overall parameter is in a certain interval and can make errors, the smaller the α value is, the better the model is, and when d is 0,1,2 and 3, the significance α value of the model is respectively calculated, and the difference number d is determined.
Has the advantages that: according to the technical scheme, the number of the electric energy meters with faults is predicted by combining a time series decomposition method, an exponential smoothing model method and an ARIMA model analysis method, and the final prediction result is obtained by adding seasonality to the prediction result, so that the accuracy of predicting the number of the electric energy meters with faults can be improved. Economically, the quality risk and property loss caused by the overdue and overstock of the electric energy meter are avoided, the cost investment for returning the overdue meter is reduced, and the overstock of the stored electric energy meter is avoided. The time series decomposition model, the exponential smoothing model and the time series ARIMA model are combined for use, and the advantages can be made up for the disadvantages.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
As shown in fig. 1, the present invention comprises the steps of:
1) acquiring fault table data;
2) calculating a moving average sequence of the obtained sequence according to the fault table data;
3) obtaining a moving average sequence, determining whether an ARIMA model or an exponential smoothing model is used for prediction according to long-term trend and seasonal variation, wherein most of the prediction data in the ARIMA model is matched;
4) restoring the obtained prediction data to the seasonality of the data;
5) acquiring data after the seasonality is restored, and predicting the number of fault tables according to the data;
as shown in fig. 2, the setting of ARIMA model includes the steps of:
a) acquiring historical data of a fault table;
b) calculating an autocorrelation function and a partial correlation function according to the acquired historical data; utilizing the autocorrelation function and the partial correlation function to carry out order determination on the ARIMA model;
c) method for moment estimation to determine model parameters
d) Verifying model difference times with significance levels
e) And obtaining the set ARIMA model according to the fixed order, the parameters and the difference times obtained by calculation.
The method is used for predicting and controlling the number of the faults of the electric energy meters, so that preparation is made for better and more accurately rotating the electric energy meters, the number of the electric energy meters needing to be replaced can be roughly determined according to the prediction result of the method, a purchase plan can be made in advance, and the rotating number of the electric energy meters in each region can be effectively distributed.
Specifically, the technical scheme combines a time series decomposition method, an exponential smoothing model and an ARIMA model analysis method; the problem of prediction by a single method is solved; if the time decomposition method is adopted, the time series decomposition is to decompose a long-term trend factor, a seasonal variation factor, a cyclic variation factor and an irregular variation factor and then fit the long-term trend, and the fitting is generally curve fitting, so the effect is poor. Therefore, time sequence decomposition is generally not used independently, simple sequence prediction is inaccurate, a complex sequence model is too thin, and the prediction effect is not ideal in any word, so that analysis is carried out on the basis of time prediction decomposition according to long-term trends and seasonal variations in a moving average sequence, and the analysis and prediction of the roughly decomposed trend are determined by using a mature ARIMA model or an exponential smoothing method model. For ARIMA models and exponential smoothing models. When the model is established, the model is easily influenced by large random fluctuation change of data, and the error of a prediction result is overlarge through data analysis. A time series decomposition model is introduced to solve this problem as much as possible. For an ARIMA model, the order of the model is determined by an autocorrelation function and a partial correlation function, the model parameters are estimated by using moments, and finally, the difference times are determined by using the significance level, so that the rough trend obtained by time series decomposition is predicted after the model is obtained. The prediction result and the seasonality are added to obtain a final prediction result, so that the accuracy of predicting the fault number of the electric energy meter can be improved. Economically, the quality risk and property loss caused by the overdue and overstock of the electric energy meter are avoided, the cost investment for returning the overdue meter is reduced, and the overstock of the stored electric energy meter is avoided. For the ARIMA model, an exponential smoothing method is adopted to analyze long-term trends, seasonal variation analysis can be added according to actual conditions, then the law of each variation is calculated respectively, and finally the laws of each variation are integrated to form the exponential smoothing model for predicting future data. The time series decomposition model, the exponential smoothing model and the time series ARIMA model are combined for use, and the advantages and the disadvantages can be made up. In particular practice most match the ARIMA model.
The following steps are further specified:
1. data source
The monthly fault number of a certain batch of electric energy meters is collected by adopting an electricity utilization information collection and remote meter reading system, and data is stored in a central database through a data transmission channel and a data receiving system to serve as a data source for predicting the fault number of the electric energy meters by the method.
The analysis object of the method is the fault number of the electric energy meter, namely the number of faults of the electric energy meter per month in a certain batch.
2. Removing seasonality of data
And decomposing the model by using a time series to remove seasonality. Namely, the data is processed by the moving average, and the data of three months are added by the moving average to obtain the average value, so that the number is free from seasonality and has little or no irregular variation factor, namely randomness. Since randomness fluctuates around the median value, the three numbers are added, and the positive and negative fluctuations cancel each other to some extent, it can be considered that there has been no randomness therein. For each data, the upper number, the lower number and the number are added and averaged, thus obtaining a sequence without the first and the last data, and for the last data, the upper number is added and averaged, and the first number is not considered. Thus, a sequence is obtained, which includes only two parts of long-term trend and cyclic variation, and the sequence is called a moving average sequence.
3. ARIMA model
A sequence { X
tD-order difference of the sequence is changed into a stable sequence, and the stable sequence after the difference can be modeled by using an ARMA model, so that the sequence is called as { X }
tThe model structure of the method is a summation moving average model, which is called ARIMA (p, d, q) for short, wherein d is the difference order, p is the autoregressive order, and q is the moving average order. The main formula expression of the model is
To apply the model, the model is determined by scaling and then estimating its parameters, i.e., the autoregressive coefficients, the moving average and the white noise sequence ε
tThe mean square error.
4. Determining the order p and q of an ARIMA model
The ARIMA model is adopted to model the existing data, the primary problem is to determine the order of the model, namely the corresponding p and q values, and the identification of the ARIMA model is mainly carried out through the autocorrelation function and the partial autocorrelation function of the sequence. Sequence y
tThe autocorrelation function measures y
tAnd y
t-kDegree of linear correlation between them, using p
kExpressed, the definition formula is:
in the formula, r
k=cov(y
t,y
t-k);r
0=cov(y
t,y
t) Representing the variance of the sequence. The autocorrelation function is characterized by y
tAnd y
t-kDegree of linear correlation between, and sometimes y
tAnd y
t-kThere is a correlation between, probably because of y
tAnd y
t-kRespectively with their middle part y
t-1,y
t-2,…,y
t-k+1If at a given y there is a relationship between
t-1,y
t-2,…,y
t-k+1On the premise of (b) for y
tAnd y
t-kThe conditional correlation between them is described by the partial autocorrelation function
Proceeding, the partial autocorrelation function can be represented by the formula:
and (4) calculating.
Observing which step of the autocorrelation function trails after the autocorrelation function and the partial autocorrelation function are obtained, and recording as p; at which step the partial correlation function trails, denoted as q. The order p and q of the model can be determined. The tail refers to the fact that the model autocorrelation function or partial correlation function exhibits an exponential decay and tends towards 0 with increasing time lag k.
5. Estimation of ARIMA model parameters
For ARIMA model, determining parameters by using moment estimation method, and determining k in the model>q, its autocorrelation coefficient satisfies the equation of the autoregressive section:
using estimated autocorrelation coefficients
Instead of rho
kK is q +1, q +2, …, q + p to obtain p equations, and the moment estimation of the autoregressive coefficient obtained by solving the equation set
Order to
Then
Wherein,
then obtained by calculation
Instead of the former
Instead of gamma
kThe following can be obtained:
the formula for ARIMA can be derived:
further to find out
And
then will be
Solving the first two equations by substituting the moving average theta of the ARIMA model
1,θ
2,…,θ
qAnd white noise sequence ε
tMean square error
Is estimated.
All parameters in the ARIMA model are thus found: autoregressive coefficient, moving average and white noise sequence epsilontThe mean square error.
6. Determining the number of differences d of the model
The difference number of the model is determined by using the significance level, wherein the significance level is the probability that the estimation overall parameter falls in a certain interval and a fault is possibly made, and is represented by α, the model is better when the α value is smaller, when d is 0,1,2 and 3, the significance α value of the model is respectively calculated, the difference number d is determined, the difference number cannot be too large, and otherwise, the original model is changed into another model.
7. Restoring seasonality of data
And processing the data by adopting a moving average in the reverse direction, wherein the predicted number is increased by two times, and then the last number and the next number are subtracted to obtain a final result, and for the last number, the last number is increased by one time and subtracted by the last number. And finally, carrying out integral treatment on the prediction result to obtain a final prediction result.
The effects obtained by the present solution are illustrated by way of example below
The implementation is shown in fig. 3 and 4. Fig. 3 is data of the number of faults of the electric energy meter in a certain batch of electric energy meter in the period from 3 months in 2013 to 9 months in 2018, and fig. 4 is the number of faults in the period from 6 months in 2018 to 8 months predicted by the moving average sequence; the upper right-hand icon is the training sequence, predicted data and actual sequence of the moving average sequence from top to bottom.
For 67 total faults from 2013.3 to 2018.9, the method is adopted for analysis, models are built by using the first 63 faults to obtain an ARIMA (1,1,3) model, the models are used for predicting the faults of a moving average sequence, then the moving average is used reversely, the seasonality is increased, the data are integrated, and the predicted final results from 6 months to 8 months in 2018 are 27,19 and 17; error values are 5,2,2 compared to the actual data 22,21, 19; the error is small, and the model is verified to be usable. The method can predict the failure number of the electric energy meter from 10 months to 12 months in 2018, and the prediction results are 13,15 and 19. The method can provide a scientific basis for predicting the number of faults in the future month for the power consumption department, and approximately provides more accurate data for the number of the electric energy meters to be replaced prepared in the future months of the batch of the electric energy meters.
The method for predicting the number of faults of the electric energy meter shown in fig. 1 and 2 is a specific embodiment of the present invention, which already embodies the essential features and the improvements of the present invention, and can be modified equivalently according to the practical use requirements and under the teaching of the present invention, and all that is within the protection scope of the present solution.