CN112100574B

CN112100574B - AAKR model uncertainty calculation method and system based on resampling

Info

Publication number: CN112100574B
Application number: CN202010852271.5A
Authority: CN
Inventors: 成玮; 张乐; 陈雪峰; 李芸; 周光辉; 高琳; 邢继; 堵树宏; 孙涛; 徐钊; 于方小稚
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2024-10-29
Anticipated expiration: 2040-08-21
Also published as: CN112100574A

Abstract

The present invention discloses a method and system for calculating the uncertainty of an AAKR model based on resampling. A sensor historical state data set is divided into a training data set and a test data set. The training data set is denoised and the noise variance is calculated by a wavelet denoising method to improve data accuracy. Then, the sensor historical state data is randomly selected and replaced to obtain new training data set samples, so as to optimize the AAKR model architecture and the changes between multiple model prediction values to obtain the model prediction variance of multiple model prediction values. The training data is resampled by Bootstrap to calculate the mean square error between the prediction value and the test value. The model deviation is calculated in combination with the prototype model variance to form an uncertainty value of 95%. There is no need to perform modeling and calculation of the noise estimation value using an empirical distribution model, so the resampling process is simplified and the calculation efficiency is improved. In addition, the confidence interval deviation is reduced in combination with the Jackknife method to ensure its reliability, and the estimation efficiency is improved on the basis of maintaining the convergence performance.

Description

A resampling-based AAKR model uncertainty calculation method and system

技术领域Technical Field

本发明涉及AAKR模型不确定性的量化方法，尤其是涉及一种基于重采样的AAKR模型不确定度计算方法及系统。The present invention relates to a method for quantifying the uncertainty of an AAKR model, and in particular to a method and system for calculating the uncertainty of an AAKR model based on resampling.

背景技术Background Art

核电厂关键设备在线状态监测系统，有助于减少灾难性故障的风险，降低由不必要的定期维修而产生的多余成本。其中基于经验模型的状态监测方法，不依赖于对故障机理模型的深入理解，从设备的历史运行数据和运行经验出发，判定设备是否发生异常，随着物联网、大数据技术的迅速发展被广泛应用。但经验模型在用于监控核电关键仪器设备时，涉及影响模型稳定性的不适定问题，必须伴随对其不确定性的估计，同时不确定性区间的准确估计可有效降低设备虚警和漏警率，从而减少设备停机带来的经济损失。而目前对模型回归值不确定分析研究较少，传统的蒙特卡罗不确定度确定方法使用概率分布模拟噪声获取采样数据，需总体分布的先验知识及足够大样本数据，效率低且经济成本高，无法有效确保关键设备传感器状态的预测精度。The online condition monitoring system of key equipment in nuclear power plants helps reduce the risk of catastrophic failures and the excess costs caused by unnecessary regular maintenance. Among them, the condition monitoring method based on the empirical model does not rely on an in-depth understanding of the fault mechanism model. It determines whether the equipment is abnormal based on the historical operating data and operating experience of the equipment. With the rapid development of the Internet of Things and big data technology, it has been widely used. However, when the empirical model is used to monitor key nuclear power equipment, it involves ill-posed problems that affect the stability of the model. It must be accompanied by an estimate of its uncertainty. At the same time, accurate estimation of the uncertainty interval can effectively reduce the false alarm and missed alarm rate of the equipment, thereby reducing the economic losses caused by equipment downtime. However, there is little research on the uncertainty analysis of model regression values. The traditional Monte Carlo uncertainty determination method uses probability distribution to simulate noise to obtain sampling data, which requires prior knowledge of the overall distribution and sufficiently large sample data. It is inefficient and has high economic costs, and cannot effectively ensure the prediction accuracy of the sensor status of key equipment.

发明内容Summary of the invention

本发明的目的在于提供一种基于重采样的AAKR模型不确定度计算方法及系统，以克服现有技术的不足。The purpose of the present invention is to provide a resampling-based AAKR model uncertainty calculation method and system to overcome the deficiencies of the prior art.

为达到上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical scheme:

一种基于重采样的AAKR模型不确定度计算方法，包括以下步骤：A resampling-based AAKR model uncertainty calculation method comprises the following steps:

步骤1)、将传感器历史状态数据集分为训练数据集和测试数据集；Step 1), divide the sensor historical state data set into a training data set and a test data set;

步骤2)、通过小波去噪方法对训练数据集进行去噪并计算噪声方差；Step 2), denoising the training data set by wavelet denoising method and calculating the noise variance;

步骤3)、通过Bootstrap方法对训练数据集进行多次重采样，每次重采样后得到一组新训练数据集，根据采样后各组新训练数据集建立多个新模型，根据多个新模型预测得到多个模型预测值，计算多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差；Step 3), the training data set is resampled multiple times by the Bootstrap method, and a new training data set is obtained after each resampling. Multiple new models are established according to each group of new training data sets after sampling, and multiple model prediction values are obtained according to the prediction of the multiple new models. The changes between the multiple model prediction values can be calculated to obtain the model prediction variance of the multiple model prediction values;

步骤4)、计算模型预测值与测试观察值之间的均方误差；Step 4), calculate the mean square error between the model prediction value and the test observation value;

步骤5)、根据噪声方差、模型预测方差和均方误差计算得到模型偏差；Step 5), the model deviation is calculated based on the noise variance, model prediction variance and mean square error;

步骤6)、根据模型偏差和模型方差进行估计，可得到蒙特卡罗不确定度估计值为模型偏差的平方与模型方差之和的开方值的2倍。Step 6) Estimation is performed based on the model deviation and model variance, and the Monte Carlo uncertainty estimate can be obtained as twice the square root of the sum of the square of the model deviation and the model variance.

进一步的，加载传感器历史数据，并对传感器历史数据进行检测纠正异常值，并将传感器历史状态数据集分为训练数据集和测试数据集。Furthermore, the sensor historical data is loaded, and the sensor historical data is detected and corrected for abnormal values, and the sensor historical state data set is divided into a training data set and a test data set.

进一步的，利用小波去噪方法对训练数据集去噪，Furthermore, the wavelet denoising method is used to denoise the training data set.

其中，ε_i是训练数据集中第i个训练观测值X_i的噪声估计；是训练数据集中第i个训练观测值X_i的真实值的估计值；训练数据集中i个变量噪声的方差为：Where ε _i is the noise estimate of the i-th training observation _Xi in the training data set; is the estimated value of the true value of the i-th training observation _Xi in the training data set; the variance of the i-variable noise in the training data set is:

n_trn是训练观察次数；是噪声估计的期望值；是训练数据集噪声方差。n _trn is the number of training observations; is the expected value of the noise estimate; is the noise variance of the training dataset.

进一步的，利用下式计算多个模型预测值之间的变化即模型预测方差：Furthermore, the following formula is used to calculate the change between multiple model prediction values, namely the model prediction variance:

其中，为第j个变量的第i个观测值的方差；得到n_tst×p维方差估计，每个p变量的方差按升序排列，选择第95个百分位数最大值来保守估计单点方差。in, is the variance of the i-th observation of the j-th variable; we get an n _tst ×p-dimensional variance estimate, the variances of each p variable are arranged in ascending order, and the 95th percentile maximum value is selected to conservatively estimate the single-point variance.

进一步的，每个重采样训练数据集建立的新模型均可给出一个模型预测值即计算新模型预测值与测试观察值之间的均方误差MSE：Furthermore, each new model established by resampling the training data set can give a model prediction value, namely Calculate the mean squared error (MSE) between the new model's predictions and the test observations:

其中X_tst,i和分别是第i个新模型的测试观察值和模型预测值。MSE的维数为1×p，N个预测值就会产生N个1×p维MSE。where _Xtst,i and are the test observations and model predictions of the i-th new model. The dimension of MSE is 1×p, and N predictions will produce N 1×p-dimensional MSEs.

进一步的，模型偏差为： Furthermore, the model deviation is:

进一步的，根据蒙特卡罗不确定度估计值，计算95％置信水平对应的置信区间和预测区间，利用Jackknife偏差估计方法对AAKR模型预测的置信区间(CI)进行纠偏以及计算预测区间。Furthermore, the confidence interval and prediction interval corresponding to the 95% confidence level were calculated based on the Monte Carlo uncertainty estimate, and the Jackknife deviation estimation method was used to correct the confidence interval (CI) predicted by the AAKR model and calculate the prediction interval.

进一步的，置信区间的一般方程为：Furthermore, the general equation for the confidence interval is:

其中，是对模型预测值期望θ的估计，则其偏差为：in, is an estimate of the expected value θ of the model prediction, then its deviation is:

模型预测值是n_tst×p维时间状态序列；表示去掉第i(i＝1,2,...,N)个预测值后的估计量，对其求均值得到那么Jackknife偏差估计为：Model prediction value is an n _tst ×p-dimensional time state sequence; It represents the estimated value after removing the i-th (i＝1,2,...,N) predicted value, and the average value is obtained Then the Jackknife deviation is estimated as:

由此得到的纠偏估计量： From this we get The correction estimate of :

所以纠偏后的置信区间为： So the confidence interval after correction is:

95％置信水平的预测区间为：The prediction interval at the 95% confidence level is:

一种基于重采样的AAKR模型不确定度计算系统，包括数据获取模块、数据去噪模块和数据处理模块；A resampling-based AAKR model uncertainty calculation system includes a data acquisition module, a data denoising module and a data processing module;

数据获取模块用于获取传感器历史状态数据集并将获取到的数据集分为训练数据集和测试数据集，将训练数据集传输至数据去噪模块；The data acquisition module is used to acquire the sensor historical state data set and divide the acquired data set into a training data set and a test data set, and transmit the training data set to the data denoising module;

数据去噪模块将接收到的训练数据集进行去噪并计算噪声方差并将噪声方差传输至数据去噪模块，同时将去噪后的训练数据传输至数据处理模块；The data denoising module denoises the received training data set and calculates the noise variance and transmits the noise variance to the data denoising module, and transmits the denoised training data to the data processing module;

数据处理模块对训练数据集通过数据获取模块进行多次重采样，每次重采样后得到一组新训练数据集，根据采样后各组新训练数据集建立多个新模型，根据多个新模型预测得到多个模型预测值，计算多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差；同时计算模型预测值与测试观察值之间的均方误差；最后根据噪声方差、模型预测方差和均方误差计算得到模型偏差；以模型偏差和模型方差进行估计，可得到模型偏差的平方与模型方差之和的开方值的2倍值为蒙特卡罗不确定度估计值并输出。The data processing module resamples the training data set multiple times through the data acquisition module, and obtains a new training data set after each resampling. Multiple new models are established based on each group of new training data sets after sampling. Multiple model prediction values are obtained based on the multiple new model predictions. The model prediction variance of the multiple model prediction values can be obtained by calculating the changes between the multiple model prediction values; at the same time, the mean square error between the model prediction value and the test observation value is calculated; finally, the model deviation is calculated based on the noise variance, model prediction variance and mean square error; the model deviation and model variance are used for estimation, and the square root of the sum of the square of the model deviation and the model variance is twice the value as the Monte Carlo uncertainty estimate and output.

与现有技术相比，本发明具有以下有益的技术效果：Compared with the prior art, the present invention has the following beneficial technical effects:

本发明一种基于重采样的AAKR模型不确定度计算方法，利用传感器历史状态数据集分为训练数据集和测试数据集，通过小波去噪方法对训练数据集进行去噪并计算噪声方差，提高数据精度，然后对传感器历史状态数据随机选择并进行替换，得到新训练数据集样本，以优化AAKR模型架构及多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差，利用Bootstrap重采样训练数据，开发和测试多个原型模型，通过原型模型及测试数据得到预测值，计算预测值和测试值之间的均方误差；结合原型模型方差计算模型偏差，形成95％的不确定度值，不需要进行经验分布模型对噪声估计值建模计算，简化了重采样过程，提高了计算效率，并且结合Jackknife方法降低了置信区间偏差保证其可靠性，在保持收敛性能基础上提高了估计效率，为核电厂关键设备经验模型不确定度估计提供了一套可靠、高效、完整的方法流程，对关键设备传感器状态预测的准确性提高具有重要的工程应用价值。The invention discloses an AAKR model uncertainty calculation method based on resampling. The method divides a sensor historical state data set into a training data set and a test data set, performs denoising on the training data set and calculates the noise variance by using a wavelet denoising method, improves data accuracy, and then randomly selects and replaces the sensor historical state data to obtain new training data set samples, so as to optimize the AAKR model architecture and the changes between multiple model prediction values to obtain the model prediction variance of multiple model prediction values, resamples training data by using Bootstrap, develops and tests multiple prototype models, obtains prediction values by using the prototype models and test data, and calculates the mean square error between the prediction values and the test values; calculates the model deviation in combination with the prototype model variance to form an uncertainty value of 95%, does not need to perform modeling and calculation of noise estimation values by using an empirical distribution model, simplifies the resampling process, improves the calculation efficiency, and reduces the confidence interval deviation in combination with a Jackknife method to ensure its reliability, improves the estimation efficiency on the basis of maintaining the convergence performance, provides a set of reliable, efficient and complete method flows for uncertainty estimation of empirical models of key equipment in nuclear power plants, and has important engineering application value for improving the accuracy of sensor state prediction of key equipment.

进一步的，通过计算95％置信水平的置信区间和预测区间，并利用Jackknife偏差估计方法对置信区间分布偏差进行修正，简化了分析过程、降低了置信区间偏差，在保持收敛性能基础上提高了估计效率。Furthermore, by calculating the confidence interval and prediction interval at the 95% confidence level and correcting the confidence interval distribution deviation using the Jackknife deviation estimation method, the analysis process is simplified, the confidence interval deviation is reduced, and the estimation efficiency is improved while maintaining the convergence performance.

本发明一种基于重采样的AAKR模型不确定度计算系统，结构简单，系统通过获取传感器历史状态正常数据集进行训练，基于Bootstrap重采样的经验模型不确定度计算方法，简化了分析过程，结合Jackknife方法降低了置信区间偏差保证其可靠性。The present invention discloses an AAKR model uncertainty calculation system based on resampling, which has a simple structure. The system is trained by acquiring a normal data set of the historical state of the sensor. The uncertainty calculation method of the empirical model based on Bootstrap resampling simplifies the analysis process. The confidence interval deviation is reduced by combining the Jackknife method to ensure its reliability.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例中AAKR模型不确定度计算流程示意图。FIG. 1 is a schematic diagram of the uncertainty calculation process of the AAKR model in an embodiment of the present invention.

图2为本发明实施例中不确定度的收敛性分析曲线图。FIG. 2 is a convergence analysis curve diagram of uncertainty in an embodiment of the present invention.

图3为本发明实施例中AAKR模型的纠偏前置信区间图。FIG. 3 is a pre-correction confidence interval diagram of the AAKR model in an embodiment of the present invention.

图4为本发明实施例中AAKR模型的纠偏后置信区间图。FIG. 4 is a confidence interval diagram of the AAKR model after correction in an embodiment of the present invention.

图5为本发明实施例中AAKR模型的预测区间。FIG. 5 is a prediction interval of the AAKR model in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明做进一步详细描述：The present invention is further described in detail below in conjunction with the accompanying drawings:

如图1所示，一种基于重采样的AAKR模型不确定度计算方法，包括以下步骤：As shown in FIG1 , a resampling-based AAKR model uncertainty calculation method includes the following steps:

具体的，加载传感器历史数据，并对传感器历史数据进行检测纠正异常值，并将数据集分为训练数据集和测试数据集。Specifically, the sensor historical data is loaded, and the sensor historical data is detected and corrected for outliers, and the data set is divided into a training data set and a test data set.

具体的，利用小波去噪方法对训练数据集去噪，Specifically, the wavelet denoising method is used to denoise the training data set.

其中，ε_i是训练数据集中第i个训练观测值X_i的噪声估计；是训练数据集中第i个训练观测值X_i的真实值的估计值，该值是对训练观测值进行小波去噪得到的结果；训练数据集中i个变量噪声的方差为：Where ε _i is the noise estimate of the i-th training observation _Xi in the training data set; is the estimated value of the true value of the i-th training observation _Xi in the training data set, which is the result of wavelet denoising of the training observation; the variance of the i-variable noise in the training data set is:

n_trn是训练观察次数；是噪声估计的期望值，对于无漂移数据，其值为零或接近零；是训练数据集噪声方差。n _trn is the number of training observations; is the expected value of the noise estimate, which is zero or close to zero for drift-free data; is the noise variance of the training dataset.

变量噪声的方差度量模型预测的随机误差；The variance of the variable noise measures the random error in the model predictions;

步骤3)、通过Bootstrap方法对训练数据集进行N次重采样(即取一个训练测试观察值替代当前位置训练数据集的值)，得到N组重采样后的训练数据集，根据N组重采样后的训练数据集建立N个新模型，根据N个新模型预测得到N个模型预测值，计算N个模型预测值之间的变化即可得到N个模型预测值的模型预测方差；Step 3), resample the training data set N times by the Bootstrap method (i.e., take a training test observation value to replace the value of the current position training data set), obtain N groups of resampled training data sets, establish N new models based on the N groups of resampled training data sets, obtain N model prediction values based on the N new model predictions, and calculate the changes between the N model prediction values to obtain the model prediction variance of the N model prediction values;

具体包括以下步骤：The specific steps include:

对一组训练数据集进行N次Bootstrap重采样，通过每个得到的新采样数据集建立一个新的模型，即有N个新模型，从这N个新模型中估计模型预测值之间的变化；Perform N Bootstrap resampling on a set of training data sets, and build a new model with each newly sampled data set, that is, there are N new models, and estimate the changes between the model prediction values from these N new models;

其中X_i表示系统的一个状态，X_j表示一个监测变量的时间状态序列。Where _Xi represents a state of the system, and _Xj represents a time state sequence of a monitoring variable.

Bootstrap重采样：对于监测变量X_j的时间状态序列，有放回的随机抽样n_trn次，即得到X的Bootstrap重采样样本X^*。Bootstrap resampling: For the time state series of the monitoring variable _Xj , random sampling with replacement is performed n _trn times, that is, the Bootstrap resampling sample X ^* of X is obtained.

而采用LHS重采样技术：对训练数据应用小波去噪方法得到其“真实”值，从原始训练数据中减去“真实”值，得到噪声的估计值；将噪声概率分布建模为正态分布，通过将分布分割成n_trn个不重叠的间隔(又称为“箱”)，每个箱具有的概率；从每个箱中以相等的频率选择随机值，最终噪声分布被均匀地采样，以构建原型训练集。The LHS resampling technique is used: the wavelet denoising method is applied to the training data to obtain its "true" value, and the "true" value is subtracted from the original training data to obtain the estimated value of the noise; the noise probability distribution is modeled as a normal distribution, by dividing the distribution into n _trn non-overlapping intervals (also called "boxes"), each box has ; random values are selected from each bin with equal frequency, and the final noise distribution is uniformly sampled to construct the prototype training set.

每次由重采样训练数据集建立的AAKR模型均可给出测试观察值X_tst的模型预测值模型预测值是对测试观察值X_tst的估计,测试观察值X_tst包含n_tst个观察值；Each time the AAKR model built from the resampled training data set gives the model prediction value of the test observation value _Xtst Model prediction value is an estimate of the test observation X _tst , which _{contains n tst} _observations ;

其中，是测试观察值X_tst第k个原型模型的预测，这里表示第j个变量的第i个观测值；第j个变量的第i个观测值的预测值期望是N个新模型预测值的平均值，即：in, is the prediction of the k-th prototype model for the test observation X _tst , where express The i-th observation of the j-th variable; the predicted value of the i-th observation of the j-th variable is expected to be the average of the N new model predictions, that is:

即X_tst的预测值表示为：That is, the predicted value of X _tst is expressed as:

同样地，第j个变量的第i个观测值的方差：Similarly, the variance of the ith observation of the jth variable is:

即模型预测方差可以写成：That is, the model prediction variance can be written as:

简化后：利用下式计算N个模型预测值之间的变化即模型预测方差：Simplified: The change between N model prediction values, i.e., model prediction variance, is calculated using the following formula:

得到n_tst×p维方差估计，每个p变量的方差按升序排列，选择第95个百分位数最大值来保守估计单点方差；Obtain n _tst ×p-dimensional variance estimates, the variances of each p variable are arranged in ascending order, and the 95th percentile maximum is selected to conservatively estimate the single-point variance;

模型预测方差定义为参数与其期望值平方差的期望，所以模型预测方差也可表示为：The model prediction variance is defined as the expected square difference between the parameter and its expected value, so the model prediction variance can also be expressed as:

步骤4)、计算N个模型预测值与测试观察值之间的均方误差(MSE)；Step 4), calculate the mean square error (MSE) between the N model prediction values and the test observation values;

每个重采样训练数据集建立的新模型均可给出一个模型预测值即计算新模型预测值与测试观察值之间的均方误差MSE：Each new model established by resampling the training data set can give a model prediction value, that is, Calculate the mean squared error (MSE) between the new model's predictions and the test observations:

其中X_tst,i和分别是第i个新模型的测试观察值和模型预测值。MSE的维数为1×p，N个预测值就会产生N个1×p维MSE，对p个变量，分别取第95个百分位数最大值作为MSE的单点估计值。where _Xtst,i and are the test observations and model predictions of the ith new model, respectively. The dimension of MSE is 1×p, and N predictions will generate N 1×p-dimensional MSEs. For p variables, the maximum value of the 95th percentile is taken as the single point estimate of MSE.

偏差度量任何系统误差。Bias measures any systematic error.

模型的预测性能由均方误差MSE量化，均方误差MSE根据模型预测值计算：The predictive performance of the model is quantified by the mean squared error (MSE), which is calculated based on the model's predicted values:

其中，为可约误差，ε_irr为不可约误差。in, is the reducible error, and ε _irr is the irreducible error.

E[ε_irr]＝0， E[ε _irr ]＝0,

可约误差是模型预测值与测试观察值X_tst的真实模型M(X_tst)之间距离平方的期望；不可约误差是真实参数值与被测参数值之间的差，由随机过程和测量噪声引起，并且由于不能被确定地建模，所以称为不可约误差。The reducible error is the model prediction value The expectation of the square of the distance between the true model M(X _tst ) and the test observation X _tst ; The irreducible error is the difference between the true parameter value and the measured parameter value, which is caused by random processes and measurement noise, and is called irreducible error because it cannot be deterministically modeled.

可约误差解释了模型如何充分地表示真实模型M(X_tst)，其只取决于所选择的模型体系结构、训练过程和数据集。可约误差可进一步分解为偏差分量和方差分量。The reducible error explains how well the model represents the true model M(X _tst ), which only depends on the chosen model architecture, training process, and dataset. The reducible error can be further decomposed into a bias component and a variance component.

模型偏差定义为模型的系统误差，作为模型的预期预测值与真实目标值之间的差异，可以表示为：Model bias is defined as the systematic error of the model as the difference between the expected predicted value of the model and the true target value, which can be expressed as:

总不确定度是模型偏差、模型预测方差和不可约误差的组合：Total uncertainty is a combination of model bias, model prediction variance, and irreducible error:

又总不确定度由MSE量化，具有MSE的值，即Total uncertainty Quantified by MSE, with the value of MSE, i.e.

设模型偏差对于每个变量是恒定的，并可由其期望值近似。那么，若偏差平方的期望为负(即，模型预测方差和估计噪声方差之和大于MSE)，将其设置为零；Assume that the model bias is constant for each variable and can be approximated by its expected value. Then, if the expected square of the bias is negative (i.e., the sum of the model prediction variance and the estimated noise variance is greater than the MSE), set it to zero;

最终可得模型偏差具体如下所示：The final model deviation is as follows:

步骤6)、根据模型偏差和模型方差进行估计，可得到蒙特卡罗不确定度估计值为及其随原型模型个数变化的收敛情况，结果如图2所示。Step 6) According to the model bias and model variance, the Monte Carlo uncertainty estimate can be obtained as The results of the convergence of the proposed method with the number of prototype models are shown in Figure 2.

实施例：Example:

根据蒙特卡罗不确定度估计值，计算95％置信水平对应的置信区间和预测区间，利用Jackknife偏差估计方法对AAKR模型预测的置信区间(CI)进行纠偏以及计算预测区间，AAKR模型的纠偏前置信区间图如图3所示。According to the Monte Carlo uncertainty estimate, the confidence interval and prediction interval corresponding to the 95% confidence level are calculated. The confidence interval (CI) predicted by the AAKR model is corrected and the prediction interval is calculated using the Jackknife deviation estimation method. The confidence interval diagram of the AAKR model before correction is shown in Figure 3.

置信区间的一般方程由以下公式给出：The general equation for the confidence interval is given by:

模型预测值是n_tst×p维时间状态序列，表示去掉第i(i＝1,2,...,N)个预测值后的估计量，对其求均值得到那么Jackknife偏差估计为：Model prediction value is an n _tst ×p dimensional time state sequence, It represents the estimated value after removing the i-th (i＝1,2,...,N) predicted value, and the average value is obtained Then the Jackknife deviation is estimated as:

由此得到的纠偏估计量 From this we get The correction estimate

所以纠偏后的CI为： So the CI after correction is:

CI不包含噪声项其仅估计模型预期预测中的不确定性，而不考虑所建模值的自然变化，AAKR模型的纠偏后置信区间如图4所示；CI does not include noise terms It only estimates the uncertainty in the model's expected predictions, without considering the natural variation of the modeled values. The corrected confidence interval of the AAKR model is shown in Figure 4;

95％置信水平的PI表示为：The PI at a 95% confidence level is expressed as:

由于PI包含噪声方差项，从定义上讲包含CI，所以是模型不确定性的更保守估计；AAKR模型的预测区间如图5所示Since PI includes the noise variance term, it includes CI by definition and is therefore a more conservative estimate of model uncertainty; the prediction interval of the AAKR model is shown in Figure 5

本发明基于Bootstrap重采样的自联想核回归模型(AAKR)不确定度计算方法，该方法对传感器历史状态数据随机选择并进行替换，得到Bootstrap样本，以优化AAKR模型架构及确定当前预测值的不确定度，方法包括：加载历史数据、检测并纠正异常值，将数据分为训练和测试数据集；利用Bootstrap重采样训练数据，开发和测试多个原型模型，通过原型模型及测试数据得到预测值，计算预测值和测试观察值之间的均方误差(MSE)；利用小波去噪方法对训练数据去噪以估计噪声方差；结合原型模型方差计算模型偏差，形成95％的不确定度值估计，计算95％置信水平的置信区间和预测区间，并利用Jackknife偏差估计方法对置信区间分布偏差进行修正，本发明简化了分析过程、降低了置信区间偏差，在保持收敛性能基础上提高了估计效率。The invention discloses an uncertainty calculation method for an auto-associative kernel regression (AAKR) model based on Bootstrap resampling. The method randomly selects and replaces historical state data of a sensor to obtain Bootstrap samples, so as to optimize the AAKR model framework and determine the uncertainty of a current prediction value. The method comprises the following steps: loading historical data, detecting and correcting abnormal values, and dividing the data into training and test data sets; utilizing Bootstrap resampling training data, developing and testing multiple prototype models, obtaining prediction values through the prototype models and test data, and calculating the mean square error (MSE) between the prediction values and the test observation values; utilizing a wavelet denoising method to denoise the training data to estimate the noise variance; calculating a model deviation in combination with the prototype model variance to form a 95% uncertainty value estimate, calculating a confidence interval and a prediction interval at a 95% confidence level, and utilizing a Jackknife deviation estimation method to correct the confidence interval distribution deviation. The invention simplifies the analysis process, reduces the confidence interval deviation, and improves the estimation efficiency while maintaining the convergence performance.

Claims

1. A resampling-based AAKR model uncertainty calculation method, comprising the steps of:

Step 1), dividing a sensor history state data set into a training data set and a test data set;

loading sensor historical data, detecting and correcting abnormal values of the sensor historical data, and dividing a corrected data set into a training data set and a test data set;

step 2), denoising the training data set by a wavelet denoising method and calculating noise variance;

The training data set is denoised using a wavelet denoising method,

Where ε _i is the noise estimate for the ith training observation X _i in the training dataset; Is the estimated value of the true value of the ith training observed value X _i in the training data set; the variance of the ith variable noise in the training dataset is:

n _trn is the number of training observations; Is the expected value of the noise estimate; is the training dataset noise variance;

Step 3), resampling the training data set for a plurality of times by a Bootstrap method, obtaining a group of new training data sets after resampling each time, establishing a plurality of new models according to each group of new training data sets after sampling, predicting a plurality of model predicted values according to the plurality of new models, and calculating the change among the plurality of model predicted values to obtain model predicted variances of the plurality of model predicted values;

The variance of model predictions, which is the variation between model predictions, is calculated using the following equation:

Wherein, Variance of the ith observation for the jth variable; obtaining n _tst multiplied by p-dimensional variance estimation according to model prediction variances, arranging variances of each p variable in an ascending order, and selecting the 95 th percentile maximum value to conservatively estimate single-point variances; Representation of The ith observation of the jth variable,Is the predicted value of the kth prototype model of test observations X _tst, test observations X _tst contains n _tst observations; A predicted value expectation representing an ith observed value of a jth variable;

；

Step 4), calculating the mean square error between the model predicted value and the test observed value;

Step 5), calculating to obtain model deviation according to the noise variance, the model prediction variance and the mean square error;

the new model established by each resampling training data set can give a model predictive value, namely Calculating a mean square error MSE between the new model predictive value and the test observed value:

wherein X _tst,i and The test observation value and the model prediction value of the ith new model are respectively; the dimension of the MSE is 1×p, and N predicted values will generate N1×p-dimension MSEs;

The model bias is:

step 6), estimating according to the model deviation and the model prediction variance, and obtaining a Monte Carlo uncertainty estimated value which is 2 times of an evolution value of the sum of the square of the model deviation and the model prediction variance;

Calculating a confidence interval and a prediction interval corresponding to the 95% confidence level according to the Monte Carlo uncertainty estimation value, correcting the confidence interval predicted by the AAKR model by using a Jackknife deviation estimation method, and calculating the prediction interval;

The general equation for the confidence interval is:

Wherein, Is an estimate of the model predictor expectation θ, then its bias is:

Model predictive value Is an n _tst x p dimensional time state sequence; Represents the estimated quantity after the i-th predicted value is removed, i=1, 2,.. Then Jackknife bias estimates as:

Thereby obtaining Is estimated by the correction:

The confidence interval after correction is:

the prediction interval for the 95% confidence level is:

2. a resampled AAKR model uncertainty computing system based on the method of claim 1, comprising a data acquisition module, a data denoising module, and a data processing module;

the data acquisition module is used for acquiring a sensor historical state data set, dividing the acquired data set into a training data set and a test data set, and transmitting the training data set to the data denoising module;

The data denoising module denoises the received training data set, calculates noise variance and transmits the noise variance to the data processing module, and simultaneously transmits the denoised training data to the data processing module;

The data processing module resamples the training data set for a plurality of times through the data acquisition module, a group of new training data sets are obtained after resampling each time, a plurality of new models are built according to the sampled groups of new training data sets, a plurality of model predicted values are obtained according to the plurality of new model predictions, and model prediction variances of the plurality of model predicted values are obtained by calculating changes among the plurality of model predicted values; meanwhile, calculating the mean square error between the model predicted value and the test observed value; finally, calculating according to the noise variance, the model prediction variance and the mean square error to obtain a model deviation; and estimating by using the model deviation and the model prediction variance, and obtaining and outputting a 2-time value of the square value of the model deviation and the sum of the model prediction variance as a Monte Carlo uncertainty estimated value.