[go: up one dir, main page]

CN112100574B - AAKR model uncertainty calculation method and system based on resampling - Google Patents

AAKR model uncertainty calculation method and system based on resampling Download PDF

Info

Publication number
CN112100574B
CN112100574B CN202010852271.5A CN202010852271A CN112100574B CN 112100574 B CN112100574 B CN 112100574B CN 202010852271 A CN202010852271 A CN 202010852271A CN 112100574 B CN112100574 B CN 112100574B
Authority
CN
China
Prior art keywords
model
value
data set
variance
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010852271.5A
Other languages
Chinese (zh)
Other versions
CN112100574A (en
Inventor
成玮
张乐
陈雪峰
李芸
周光辉
高琳
邢继
堵树宏
孙涛
徐钊
于方小稚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010852271.5A priority Critical patent/CN112100574B/en
Publication of CN112100574A publication Critical patent/CN112100574A/en
Application granted granted Critical
Publication of CN112100574B publication Critical patent/CN112100574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

本发明公开了一种基于重采样的AAKR模型不确定度计算方法及系统,利用传感器历史状态数据集分为训练数据集和测试数据集,通过小波去噪方法对训练数据集进行去噪并计算噪声方差,提高数据精度,然后对传感器历史状态数据随机选择并进行替换,得到新训练数据集样本,以优化AAKR模型架构及多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差,利用Bootstrap重采样训练数据,计算预测值和测试值之间的均方误差;结合原型模型方差计算模型偏差,形成95%的不确定度值,不需要进行经验分布模型对噪声估计值建模计算,简化了重采样过程,提高了计算效率,并且结合Jackknife方法降低了置信区间偏差保证其可靠性,在保持收敛性能基础上提高了估计效率。

The present invention discloses a method and system for calculating the uncertainty of an AAKR model based on resampling. A sensor historical state data set is divided into a training data set and a test data set. The training data set is denoised and the noise variance is calculated by a wavelet denoising method to improve data accuracy. Then, the sensor historical state data is randomly selected and replaced to obtain new training data set samples, so as to optimize the AAKR model architecture and the changes between multiple model prediction values to obtain the model prediction variance of multiple model prediction values. The training data is resampled by Bootstrap to calculate the mean square error between the prediction value and the test value. The model deviation is calculated in combination with the prototype model variance to form an uncertainty value of 95%. There is no need to perform modeling and calculation of the noise estimation value using an empirical distribution model, so the resampling process is simplified and the calculation efficiency is improved. In addition, the confidence interval deviation is reduced in combination with the Jackknife method to ensure its reliability, and the estimation efficiency is improved on the basis of maintaining the convergence performance.

Description

一种基于重采样的AAKR模型不确定度计算方法及系统A resampling-based AAKR model uncertainty calculation method and system

技术领域Technical Field

本发明涉及AAKR模型不确定性的量化方法,尤其是涉及一种基于重采样的AAKR模型不确定度计算方法及系统。The present invention relates to a method for quantifying the uncertainty of an AAKR model, and in particular to a method and system for calculating the uncertainty of an AAKR model based on resampling.

背景技术Background Art

核电厂关键设备在线状态监测系统,有助于减少灾难性故障的风险,降低由不必要的定期维修而产生的多余成本。其中基于经验模型的状态监测方法,不依赖于对故障机理模型的深入理解,从设备的历史运行数据和运行经验出发,判定设备是否发生异常,随着物联网、大数据技术的迅速发展被广泛应用。但经验模型在用于监控核电关键仪器设备时,涉及影响模型稳定性的不适定问题,必须伴随对其不确定性的估计,同时不确定性区间的准确估计可有效降低设备虚警和漏警率,从而减少设备停机带来的经济损失。而目前对模型回归值不确定分析研究较少,传统的蒙特卡罗不确定度确定方法使用概率分布模拟噪声获取采样数据,需总体分布的先验知识及足够大样本数据,效率低且经济成本高,无法有效确保关键设备传感器状态的预测精度。The online condition monitoring system of key equipment in nuclear power plants helps reduce the risk of catastrophic failures and the excess costs caused by unnecessary regular maintenance. Among them, the condition monitoring method based on the empirical model does not rely on an in-depth understanding of the fault mechanism model. It determines whether the equipment is abnormal based on the historical operating data and operating experience of the equipment. With the rapid development of the Internet of Things and big data technology, it has been widely used. However, when the empirical model is used to monitor key nuclear power equipment, it involves ill-posed problems that affect the stability of the model. It must be accompanied by an estimate of its uncertainty. At the same time, accurate estimation of the uncertainty interval can effectively reduce the false alarm and missed alarm rate of the equipment, thereby reducing the economic losses caused by equipment downtime. However, there is little research on the uncertainty analysis of model regression values. The traditional Monte Carlo uncertainty determination method uses probability distribution to simulate noise to obtain sampling data, which requires prior knowledge of the overall distribution and sufficiently large sample data. It is inefficient and has high economic costs, and cannot effectively ensure the prediction accuracy of the sensor status of key equipment.

发明内容Summary of the invention

本发明的目的在于提供一种基于重采样的AAKR模型不确定度计算方法及系统,以克服现有技术的不足。The purpose of the present invention is to provide a resampling-based AAKR model uncertainty calculation method and system to overcome the deficiencies of the prior art.

为达到上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical scheme:

一种基于重采样的AAKR模型不确定度计算方法,包括以下步骤:A resampling-based AAKR model uncertainty calculation method comprises the following steps:

步骤1)、将传感器历史状态数据集分为训练数据集和测试数据集;Step 1), divide the sensor historical state data set into a training data set and a test data set;

步骤2)、通过小波去噪方法对训练数据集进行去噪并计算噪声方差;Step 2), denoising the training data set by wavelet denoising method and calculating the noise variance;

步骤3)、通过Bootstrap方法对训练数据集进行多次重采样,每次重采样后得到一组新训练数据集,根据采样后各组新训练数据集建立多个新模型,根据多个新模型预测得到多个模型预测值,计算多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差;Step 3), the training data set is resampled multiple times by the Bootstrap method, and a new training data set is obtained after each resampling. Multiple new models are established according to each group of new training data sets after sampling, and multiple model prediction values are obtained according to the prediction of the multiple new models. The changes between the multiple model prediction values can be calculated to obtain the model prediction variance of the multiple model prediction values;

步骤4)、计算模型预测值与测试观察值之间的均方误差;Step 4), calculate the mean square error between the model prediction value and the test observation value;

步骤5)、根据噪声方差、模型预测方差和均方误差计算得到模型偏差;Step 5), the model deviation is calculated based on the noise variance, model prediction variance and mean square error;

步骤6)、根据模型偏差和模型方差进行估计,可得到蒙特卡罗不确定度估计值为模型偏差的平方与模型方差之和的开方值的2倍。Step 6) Estimation is performed based on the model deviation and model variance, and the Monte Carlo uncertainty estimate can be obtained as twice the square root of the sum of the square of the model deviation and the model variance.

进一步的,加载传感器历史数据,并对传感器历史数据进行检测纠正异常值,并将传感器历史状态数据集分为训练数据集和测试数据集。Furthermore, the sensor historical data is loaded, and the sensor historical data is detected and corrected for abnormal values, and the sensor historical state data set is divided into a training data set and a test data set.

进一步的,利用小波去噪方法对训练数据集去噪,Furthermore, the wavelet denoising method is used to denoise the training data set.

其中,εi是训练数据集中第i个训练观测值Xi的噪声估计;是训练数据集中第i个训练观测值Xi的真实值的估计值;训练数据集中i个变量噪声的方差为:Where ε i is the noise estimate of the i-th training observation Xi in the training data set; is the estimated value of the true value of the i-th training observation Xi in the training data set; the variance of the i-variable noise in the training data set is:

ntrn是训练观察次数;是噪声估计的期望值;是训练数据集噪声方差。n trn is the number of training observations; is the expected value of the noise estimate; is the noise variance of the training dataset.

进一步的,利用下式计算多个模型预测值之间的变化即模型预测方差:Furthermore, the following formula is used to calculate the change between multiple model prediction values, namely the model prediction variance:

其中, 为第j个变量的第i个观测值的方差;得到ntst×p维方差估计,每个p变量的方差按升序排列,选择第95个百分位数最大值来保守估计单点方差。in, is the variance of the i-th observation of the j-th variable; we get an n tst ×p-dimensional variance estimate, the variances of each p variable are arranged in ascending order, and the 95th percentile maximum value is selected to conservatively estimate the single-point variance.

进一步的,每个重采样训练数据集建立的新模型均可给出一个模型预测值即计算新模型预测值与测试观察值之间的均方误差MSE:Furthermore, each new model established by resampling the training data set can give a model prediction value, namely Calculate the mean squared error (MSE) between the new model's predictions and the test observations:

其中Xtst,i分别是第i个新模型的测试观察值和模型预测值。MSE的维数为1×p,N个预测值就会产生N个1×p维MSE。where Xtst,i and are the test observations and model predictions of the i-th new model. The dimension of MSE is 1×p, and N predictions will produce N 1×p-dimensional MSEs.

进一步的,模型偏差为: Furthermore, the model deviation is:

进一步的,根据蒙特卡罗不确定度估计值,计算95%置信水平对应的置信区间和预测区间,利用Jackknife偏差估计方法对AAKR模型预测的置信区间(CI)进行纠偏以及计算预测区间。Furthermore, the confidence interval and prediction interval corresponding to the 95% confidence level were calculated based on the Monte Carlo uncertainty estimate, and the Jackknife deviation estimation method was used to correct the confidence interval (CI) predicted by the AAKR model and calculate the prediction interval.

进一步的,根据蒙特卡罗不确定度估计值,计算95%置信水平对应的置信区间和预测区间,利用Jackknife偏差估计方法对AAKR模型预测的置信区间(CI)进行纠偏以及计算预测区间。Furthermore, the confidence interval and prediction interval corresponding to the 95% confidence level were calculated based on the Monte Carlo uncertainty estimate, and the Jackknife deviation estimation method was used to correct the confidence interval (CI) predicted by the AAKR model and calculate the prediction interval.

进一步的,置信区间的一般方程为:Furthermore, the general equation for the confidence interval is:

其中,是对模型预测值期望θ的估计,则其偏差为:in, is an estimate of the expected value θ of the model prediction, then its deviation is:

模型预测值是ntst×p维时间状态序列;表示去掉第i(i=1,2,...,N)个预测值后的估计量,对其求均值得到那么Jackknife偏差估计为:Model prediction value is an n tst ×p-dimensional time state sequence; It represents the estimated value after removing the i-th (i=1,2,...,N) predicted value, and the average value is obtained Then the Jackknife deviation is estimated as:

由此得到的纠偏估计量: From this we get The correction estimate of :

所以纠偏后的置信区间为: So the confidence interval after correction is:

95%置信水平的预测区间为:The prediction interval at the 95% confidence level is:

一种基于重采样的AAKR模型不确定度计算系统,包括数据获取模块、数据去噪模块和数据处理模块;A resampling-based AAKR model uncertainty calculation system includes a data acquisition module, a data denoising module and a data processing module;

数据获取模块用于获取传感器历史状态数据集并将获取到的数据集分为训练数据集和测试数据集,将训练数据集传输至数据去噪模块;The data acquisition module is used to acquire the sensor historical state data set and divide the acquired data set into a training data set and a test data set, and transmit the training data set to the data denoising module;

数据去噪模块将接收到的训练数据集进行去噪并计算噪声方差并将噪声方差传输至数据去噪模块,同时将去噪后的训练数据传输至数据处理模块;The data denoising module denoises the received training data set and calculates the noise variance and transmits the noise variance to the data denoising module, and transmits the denoised training data to the data processing module;

数据处理模块对训练数据集通过数据获取模块进行多次重采样,每次重采样后得到一组新训练数据集,根据采样后各组新训练数据集建立多个新模型,根据多个新模型预测得到多个模型预测值,计算多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差;同时计算模型预测值与测试观察值之间的均方误差;最后根据噪声方差、模型预测方差和均方误差计算得到模型偏差;以模型偏差和模型方差进行估计,可得到模型偏差的平方与模型方差之和的开方值的2倍值为蒙特卡罗不确定度估计值并输出。The data processing module resamples the training data set multiple times through the data acquisition module, and obtains a new training data set after each resampling. Multiple new models are established based on each group of new training data sets after sampling. Multiple model prediction values are obtained based on the multiple new model predictions. The model prediction variance of the multiple model prediction values can be obtained by calculating the changes between the multiple model prediction values; at the same time, the mean square error between the model prediction value and the test observation value is calculated; finally, the model deviation is calculated based on the noise variance, model prediction variance and mean square error; the model deviation and model variance are used for estimation, and the square root of the sum of the square of the model deviation and the model variance is twice the value as the Monte Carlo uncertainty estimate and output.

与现有技术相比,本发明具有以下有益的技术效果:Compared with the prior art, the present invention has the following beneficial technical effects:

本发明一种基于重采样的AAKR模型不确定度计算方法,利用传感器历史状态数据集分为训练数据集和测试数据集,通过小波去噪方法对训练数据集进行去噪并计算噪声方差,提高数据精度,然后对传感器历史状态数据随机选择并进行替换,得到新训练数据集样本,以优化AAKR模型架构及多个模型预测值之间的变化即可得到多个模型预测值的模型预测方差,利用Bootstrap重采样训练数据,开发和测试多个原型模型,通过原型模型及测试数据得到预测值,计算预测值和测试值之间的均方误差;结合原型模型方差计算模型偏差,形成95%的不确定度值,不需要进行经验分布模型对噪声估计值建模计算,简化了重采样过程,提高了计算效率,并且结合Jackknife方法降低了置信区间偏差保证其可靠性,在保持收敛性能基础上提高了估计效率,为核电厂关键设备经验模型不确定度估计提供了一套可靠、高效、完整的方法流程,对关键设备传感器状态预测的准确性提高具有重要的工程应用价值。The invention discloses an AAKR model uncertainty calculation method based on resampling. The method divides a sensor historical state data set into a training data set and a test data set, performs denoising on the training data set and calculates the noise variance by using a wavelet denoising method, improves data accuracy, and then randomly selects and replaces the sensor historical state data to obtain new training data set samples, so as to optimize the AAKR model architecture and the changes between multiple model prediction values to obtain the model prediction variance of multiple model prediction values, resamples training data by using Bootstrap, develops and tests multiple prototype models, obtains prediction values by using the prototype models and test data, and calculates the mean square error between the prediction values and the test values; calculates the model deviation in combination with the prototype model variance to form an uncertainty value of 95%, does not need to perform modeling and calculation of noise estimation values by using an empirical distribution model, simplifies the resampling process, improves the calculation efficiency, and reduces the confidence interval deviation in combination with a Jackknife method to ensure its reliability, improves the estimation efficiency on the basis of maintaining the convergence performance, provides a set of reliable, efficient and complete method flows for uncertainty estimation of empirical models of key equipment in nuclear power plants, and has important engineering application value for improving the accuracy of sensor state prediction of key equipment.

进一步的,通过计算95%置信水平的置信区间和预测区间,并利用Jackknife偏差估计方法对置信区间分布偏差进行修正,简化了分析过程、降低了置信区间偏差,在保持收敛性能基础上提高了估计效率。Furthermore, by calculating the confidence interval and prediction interval at the 95% confidence level and correcting the confidence interval distribution deviation using the Jackknife deviation estimation method, the analysis process is simplified, the confidence interval deviation is reduced, and the estimation efficiency is improved while maintaining the convergence performance.

本发明一种基于重采样的AAKR模型不确定度计算系统,结构简单,系统通过获取传感器历史状态正常数据集进行训练,基于Bootstrap重采样的经验模型不确定度计算方法,简化了分析过程,结合Jackknife方法降低了置信区间偏差保证其可靠性。The present invention discloses an AAKR model uncertainty calculation system based on resampling, which has a simple structure. The system is trained by acquiring a normal data set of the historical state of the sensor. The uncertainty calculation method of the empirical model based on Bootstrap resampling simplifies the analysis process. The confidence interval deviation is reduced by combining the Jackknife method to ensure its reliability.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例中AAKR模型不确定度计算流程示意图。FIG. 1 is a schematic diagram of the uncertainty calculation process of the AAKR model in an embodiment of the present invention.

图2为本发明实施例中不确定度的收敛性分析曲线图。FIG. 2 is a convergence analysis curve diagram of uncertainty in an embodiment of the present invention.

图3为本发明实施例中AAKR模型的纠偏前置信区间图。FIG. 3 is a pre-correction confidence interval diagram of the AAKR model in an embodiment of the present invention.

图4为本发明实施例中AAKR模型的纠偏后置信区间图。FIG. 4 is a confidence interval diagram of the AAKR model after correction in an embodiment of the present invention.

图5为本发明实施例中AAKR模型的预测区间。FIG. 5 is a prediction interval of the AAKR model in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明做进一步详细描述:The present invention is further described in detail below in conjunction with the accompanying drawings:

如图1所示,一种基于重采样的AAKR模型不确定度计算方法,包括以下步骤:As shown in FIG1 , a resampling-based AAKR model uncertainty calculation method includes the following steps:

步骤1)、将传感器历史状态数据集分为训练数据集和测试数据集;Step 1), divide the sensor historical state data set into a training data set and a test data set;

具体的,加载传感器历史数据,并对传感器历史数据进行检测纠正异常值,并将数据集分为训练数据集和测试数据集。Specifically, the sensor historical data is loaded, and the sensor historical data is detected and corrected for outliers, and the data set is divided into a training data set and a test data set.

步骤2)、通过小波去噪方法对训练数据集进行去噪并计算噪声方差;Step 2), denoising the training data set by wavelet denoising method and calculating the noise variance;

具体的,利用小波去噪方法对训练数据集去噪,Specifically, the wavelet denoising method is used to denoise the training data set.

其中,εi是训练数据集中第i个训练观测值Xi的噪声估计;是训练数据集中第i个训练观测值Xi的真实值的估计值,该值是对训练观测值进行小波去噪得到的结果;训练数据集中i个变量噪声的方差为:Where ε i is the noise estimate of the i-th training observation Xi in the training data set; is the estimated value of the true value of the i-th training observation Xi in the training data set, which is the result of wavelet denoising of the training observation; the variance of the i-variable noise in the training data set is:

ntrn是训练观察次数;是噪声估计的期望值,对于无漂移数据,其值为零或接近零;是训练数据集噪声方差。n trn is the number of training observations; is the expected value of the noise estimate, which is zero or close to zero for drift-free data; is the noise variance of the training dataset.

变量噪声的方差度量模型预测的随机误差;The variance of the variable noise measures the random error in the model predictions;

步骤3)、通过Bootstrap方法对训练数据集进行N次重采样(即取一个训练测试观察值替代当前位置训练数据集的值),得到N组重采样后的训练数据集,根据N组重采样后的训练数据集建立N个新模型,根据N个新模型预测得到N个模型预测值,计算N个模型预测值之间的变化即可得到N个模型预测值的模型预测方差;Step 3), resample the training data set N times by the Bootstrap method (i.e., take a training test observation value to replace the value of the current position training data set), obtain N groups of resampled training data sets, establish N new models based on the N groups of resampled training data sets, obtain N model prediction values based on the N new model predictions, and calculate the changes between the N model prediction values to obtain the model prediction variance of the N model prediction values;

具体包括以下步骤:The specific steps include:

对一组训练数据集进行N次Bootstrap重采样,通过每个得到的新采样数据集建立一个新的模型,即有N个新模型,从这N个新模型中估计模型预测值之间的变化;Perform N Bootstrap resampling on a set of training data sets, and build a new model with each newly sampled data set, that is, there are N new models, and estimate the changes between the model prediction values from these N new models;

其中Xi表示系统的一个状态,Xj表示一个监测变量的时间状态序列。Where Xi represents a state of the system, and Xj represents a time state sequence of a monitoring variable.

Bootstrap重采样:对于监测变量Xj的时间状态序列,有放回的随机抽样ntrn次,即得到X的Bootstrap重采样样本X*Bootstrap resampling: For the time state series of the monitoring variable Xj , random sampling with replacement is performed n trn times, that is, the Bootstrap resampling sample X * of X is obtained.

而采用LHS重采样技术:对训练数据应用小波去噪方法得到其“真实”值,从原始训练数据中减去“真实”值,得到噪声的估计值;将噪声概率分布建模为正态分布,通过将分布分割成ntrn个不重叠的间隔(又称为“箱”),每个箱具有的概率;从每个箱中以相等的频率选择随机值,最终噪声分布被均匀地采样,以构建原型训练集。The LHS resampling technique is used: the wavelet denoising method is applied to the training data to obtain its "true" value, and the "true" value is subtracted from the original training data to obtain the estimated value of the noise; the noise probability distribution is modeled as a normal distribution, by dividing the distribution into n trn non-overlapping intervals (also called "boxes"), each box has ; random values are selected from each bin with equal frequency, and the final noise distribution is uniformly sampled to construct the prototype training set.

每次由重采样训练数据集建立的AAKR模型均可给出测试观察值Xtst的模型预测值模型预测值是对测试观察值Xtst的估计,测试观察值Xtst包含ntst个观察值;Each time the AAKR model built from the resampled training data set gives the model prediction value of the test observation value Xtst Model prediction value is an estimate of the test observation X tst , which contains n tst observations ;

其中,是测试观察值Xtst第k个原型模型的预测,这里表示第j个变量的第i个观测值;第j个变量的第i个观测值的预测值期望是N个新模型预测值的平均值,即:in, is the prediction of the k-th prototype model for the test observation X tst , where express The i-th observation of the j-th variable; the predicted value of the i-th observation of the j-th variable is expected to be the average of the N new model predictions, that is:

即Xtst的预测值表示为:That is, the predicted value of X tst is expressed as:

同样地,第j个变量的第i个观测值的方差:Similarly, the variance of the ith observation of the jth variable is:

即模型预测方差可以写成:That is, the model prediction variance can be written as:

简化后:利用下式计算N个模型预测值之间的变化即模型预测方差:Simplified: The change between N model prediction values, i.e., model prediction variance, is calculated using the following formula:

得到ntst×p维方差估计,每个p变量的方差按升序排列,选择第95个百分位数最大值来保守估计单点方差;Obtain n tst ×p-dimensional variance estimates, the variances of each p variable are arranged in ascending order, and the 95th percentile maximum is selected to conservatively estimate the single-point variance;

模型预测方差定义为参数与其期望值平方差的期望,所以模型预测方差也可表示为:The model prediction variance is defined as the expected square difference between the parameter and its expected value, so the model prediction variance can also be expressed as:

步骤4)、计算N个模型预测值与测试观察值之间的均方误差(MSE);Step 4), calculate the mean square error (MSE) between the N model prediction values and the test observation values;

每个重采样训练数据集建立的新模型均可给出一个模型预测值即计算新模型预测值与测试观察值之间的均方误差MSE:Each new model established by resampling the training data set can give a model prediction value, that is, Calculate the mean squared error (MSE) between the new model's predictions and the test observations:

其中Xtst,i分别是第i个新模型的测试观察值和模型预测值。MSE的维数为1×p,N个预测值就会产生N个1×p维MSE,对p个变量,分别取第95个百分位数最大值作为MSE的单点估计值。where Xtst,i and are the test observations and model predictions of the ith new model, respectively. The dimension of MSE is 1×p, and N predictions will generate N 1×p-dimensional MSEs. For p variables, the maximum value of the 95th percentile is taken as the single point estimate of MSE.

步骤5)、根据噪声方差、模型预测方差和均方误差计算得到模型偏差;Step 5), the model deviation is calculated based on the noise variance, model prediction variance and mean square error;

偏差度量任何系统误差。Bias measures any systematic error.

模型的预测性能由均方误差MSE量化,均方误差MSE根据模型预测值计算:The predictive performance of the model is quantified by the mean squared error (MSE), which is calculated based on the model's predicted values:

其中,为可约误差,εirr为不可约误差。in, is the reducible error, and ε irr is the irreducible error.

E[εirr]=0, E[ε irr ]=0,

可约误差是模型预测值与测试观察值Xtst的真实模型M(Xtst)之间距离平方的期望;不可约误差是真实参数值与被测参数值之间的差,由随机过程和测量噪声引起,并且由于不能被确定地建模,所以称为不可约误差。The reducible error is the model prediction value The expectation of the square of the distance between the true model M(X tst ) and the test observation X tst ; The irreducible error is the difference between the true parameter value and the measured parameter value, which is caused by random processes and measurement noise, and is called irreducible error because it cannot be deterministically modeled.

可约误差解释了模型如何充分地表示真实模型M(Xtst),其只取决于所选择的模型体系结构、训练过程和数据集。可约误差可进一步分解为偏差分量和方差分量。The reducible error explains how well the model represents the true model M(X tst ), which only depends on the chosen model architecture, training process, and dataset. The reducible error can be further decomposed into a bias component and a variance component.

模型偏差定义为模型的系统误差,作为模型的预期预测值与真实目标值之间的差异,可以表示为:Model bias is defined as the systematic error of the model as the difference between the expected predicted value of the model and the true target value, which can be expressed as:

总不确定度是模型偏差、模型预测方差和不可约误差的组合:Total uncertainty is a combination of model bias, model prediction variance, and irreducible error:

又总不确定度由MSE量化,具有MSE的值,即Total uncertainty Quantified by MSE, with the value of MSE, i.e.

设模型偏差对于每个变量是恒定的,并可由其期望值近似。那么,若偏差平方的期望为负(即,模型预测方差和估计噪声方差之和大于MSE),将其设置为零;Assume that the model bias is constant for each variable and can be approximated by its expected value. Then, if the expected square of the bias is negative (i.e., the sum of the model prediction variance and the estimated noise variance is greater than the MSE), set it to zero;

最终可得模型偏差具体如下所示:The final model deviation is as follows:

步骤6)、根据模型偏差和模型方差进行估计,可得到蒙特卡罗不确定度估计值为及其随原型模型个数变化的收敛情况,结果如图2所示。Step 6) According to the model bias and model variance, the Monte Carlo uncertainty estimate can be obtained as The results of the convergence of the proposed method with the number of prototype models are shown in Figure 2.

实施例:Example:

根据蒙特卡罗不确定度估计值,计算95%置信水平对应的置信区间和预测区间,利用Jackknife偏差估计方法对AAKR模型预测的置信区间(CI)进行纠偏以及计算预测区间,AAKR模型的纠偏前置信区间图如图3所示。According to the Monte Carlo uncertainty estimate, the confidence interval and prediction interval corresponding to the 95% confidence level are calculated. The confidence interval (CI) predicted by the AAKR model is corrected and the prediction interval is calculated using the Jackknife deviation estimation method. The confidence interval diagram of the AAKR model before correction is shown in Figure 3.

置信区间的一般方程由以下公式给出:The general equation for the confidence interval is given by:

其中,是对模型预测值期望θ的估计,则其偏差为:in, is an estimate of the expected value θ of the model prediction, then its deviation is:

模型预测值是ntst×p维时间状态序列,表示去掉第i(i=1,2,...,N)个预测值后的估计量,对其求均值得到那么Jackknife偏差估计为:Model prediction value is an n tst ×p dimensional time state sequence, It represents the estimated value after removing the i-th (i=1,2,...,N) predicted value, and the average value is obtained Then the Jackknife deviation is estimated as:

由此得到的纠偏估计量 From this we get The correction estimate

所以纠偏后的CI为: So the CI after correction is:

CI不包含噪声项其仅估计模型预期预测中的不确定性,而不考虑所建模值的自然变化,AAKR模型的纠偏后置信区间如图4所示;CI does not include noise terms It only estimates the uncertainty in the model's expected predictions, without considering the natural variation of the modeled values. The corrected confidence interval of the AAKR model is shown in Figure 4;

95%置信水平的PI表示为:The PI at a 95% confidence level is expressed as:

由于PI包含噪声方差项,从定义上讲包含CI,所以是模型不确定性的更保守估计;AAKR模型的预测区间如图5所示Since PI includes the noise variance term, it includes CI by definition and is therefore a more conservative estimate of model uncertainty; the prediction interval of the AAKR model is shown in Figure 5

本发明基于Bootstrap重采样的自联想核回归模型(AAKR)不确定度计算方法,该方法对传感器历史状态数据随机选择并进行替换,得到Bootstrap样本,以优化AAKR模型架构及确定当前预测值的不确定度,方法包括:加载历史数据、检测并纠正异常值,将数据分为训练和测试数据集;利用Bootstrap重采样训练数据,开发和测试多个原型模型,通过原型模型及测试数据得到预测值,计算预测值和测试观察值之间的均方误差(MSE);利用小波去噪方法对训练数据去噪以估计噪声方差;结合原型模型方差计算模型偏差,形成95%的不确定度值估计,计算95%置信水平的置信区间和预测区间,并利用Jackknife偏差估计方法对置信区间分布偏差进行修正,本发明简化了分析过程、降低了置信区间偏差,在保持收敛性能基础上提高了估计效率。The invention discloses an uncertainty calculation method for an auto-associative kernel regression (AAKR) model based on Bootstrap resampling. The method randomly selects and replaces historical state data of a sensor to obtain Bootstrap samples, so as to optimize the AAKR model framework and determine the uncertainty of a current prediction value. The method comprises the following steps: loading historical data, detecting and correcting abnormal values, and dividing the data into training and test data sets; utilizing Bootstrap resampling training data, developing and testing multiple prototype models, obtaining prediction values through the prototype models and test data, and calculating the mean square error (MSE) between the prediction values and the test observation values; utilizing a wavelet denoising method to denoise the training data to estimate the noise variance; calculating a model deviation in combination with the prototype model variance to form a 95% uncertainty value estimate, calculating a confidence interval and a prediction interval at a 95% confidence level, and utilizing a Jackknife deviation estimation method to correct the confidence interval distribution deviation. The invention simplifies the analysis process, reduces the confidence interval deviation, and improves the estimation efficiency while maintaining the convergence performance.

Claims (2)

1. A resampling-based AAKR model uncertainty calculation method, comprising the steps of:
Step 1), dividing a sensor history state data set into a training data set and a test data set;
loading sensor historical data, detecting and correcting abnormal values of the sensor historical data, and dividing a corrected data set into a training data set and a test data set;
step 2), denoising the training data set by a wavelet denoising method and calculating noise variance;
The training data set is denoised using a wavelet denoising method,
Where ε i is the noise estimate for the ith training observation X i in the training dataset; Is the estimated value of the true value of the ith training observed value X i in the training data set; the variance of the ith variable noise in the training dataset is:
n trn is the number of training observations; Is the expected value of the noise estimate; is the training dataset noise variance;
Step 3), resampling the training data set for a plurality of times by a Bootstrap method, obtaining a group of new training data sets after resampling each time, establishing a plurality of new models according to each group of new training data sets after sampling, predicting a plurality of model predicted values according to the plurality of new models, and calculating the change among the plurality of model predicted values to obtain model predicted variances of the plurality of model predicted values;
The variance of model predictions, which is the variation between model predictions, is calculated using the following equation:
Wherein, Variance of the ith observation for the jth variable; obtaining n tst multiplied by p-dimensional variance estimation according to model prediction variances, arranging variances of each p variable in an ascending order, and selecting the 95 th percentile maximum value to conservatively estimate single-point variances; Representation of The ith observation of the jth variable,Is the predicted value of the kth prototype model of test observations X tst, test observations X tst contains n tst observations; A predicted value expectation representing an ith observed value of a jth variable;
Step 4), calculating the mean square error between the model predicted value and the test observed value;
Step 5), calculating to obtain model deviation according to the noise variance, the model prediction variance and the mean square error;
the new model established by each resampling training data set can give a model predictive value, namely Calculating a mean square error MSE between the new model predictive value and the test observed value:
wherein X tst,i and The test observation value and the model prediction value of the ith new model are respectively; the dimension of the MSE is 1×p, and N predicted values will generate N1×p-dimension MSEs;
The model bias is:
step 6), estimating according to the model deviation and the model prediction variance, and obtaining a Monte Carlo uncertainty estimated value which is 2 times of an evolution value of the sum of the square of the model deviation and the model prediction variance;
Calculating a confidence interval and a prediction interval corresponding to the 95% confidence level according to the Monte Carlo uncertainty estimation value, correcting the confidence interval predicted by the AAKR model by using a Jackknife deviation estimation method, and calculating the prediction interval;
The general equation for the confidence interval is:
Wherein, Is an estimate of the model predictor expectation θ, then its bias is:
Model predictive value Is an n tst x p dimensional time state sequence; Represents the estimated quantity after the i-th predicted value is removed, i=1, 2,.. Then Jackknife bias estimates as:
Thereby obtaining Is estimated by the correction:
The confidence interval after correction is:
the prediction interval for the 95% confidence level is:
2. a resampled AAKR model uncertainty computing system based on the method of claim 1, comprising a data acquisition module, a data denoising module, and a data processing module;
the data acquisition module is used for acquiring a sensor historical state data set, dividing the acquired data set into a training data set and a test data set, and transmitting the training data set to the data denoising module;
The data denoising module denoises the received training data set, calculates noise variance and transmits the noise variance to the data processing module, and simultaneously transmits the denoised training data to the data processing module;
The data processing module resamples the training data set for a plurality of times through the data acquisition module, a group of new training data sets are obtained after resampling each time, a plurality of new models are built according to the sampled groups of new training data sets, a plurality of model predicted values are obtained according to the plurality of new model predictions, and model prediction variances of the plurality of model predicted values are obtained by calculating changes among the plurality of model predicted values; meanwhile, calculating the mean square error between the model predicted value and the test observed value; finally, calculating according to the noise variance, the model prediction variance and the mean square error to obtain a model deviation; and estimating by using the model deviation and the model prediction variance, and obtaining and outputting a 2-time value of the square value of the model deviation and the sum of the model prediction variance as a Monte Carlo uncertainty estimated value.
CN202010852271.5A 2020-08-21 2020-08-21 AAKR model uncertainty calculation method and system based on resampling Active CN112100574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010852271.5A CN112100574B (en) 2020-08-21 2020-08-21 AAKR model uncertainty calculation method and system based on resampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010852271.5A CN112100574B (en) 2020-08-21 2020-08-21 AAKR model uncertainty calculation method and system based on resampling

Publications (2)

Publication Number Publication Date
CN112100574A CN112100574A (en) 2020-12-18
CN112100574B true CN112100574B (en) 2024-10-29

Family

ID=73753164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010852271.5A Active CN112100574B (en) 2020-08-21 2020-08-21 AAKR model uncertainty calculation method and system based on resampling

Country Status (1)

Country Link
CN (1) CN112100574B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926656A (en) * 2021-02-25 2021-06-08 西安交通大学 Method, system and equipment for predicting state of circulating water pump of nuclear power plant
CN113486473A (en) * 2021-07-27 2021-10-08 上海电气风电集团股份有限公司 State monitoring method and system of wind generating set and computer readable storage medium
CN114996830B (en) * 2022-08-03 2022-11-18 华中科技大学 Visual safety assessment method and equipment for shield tunnel to pass through existing tunnel
CN115542237B (en) * 2022-11-29 2023-04-07 北京志翔科技股份有限公司 Uncertainty determination method and device and electronic equipment
CN118051851A (en) * 2023-02-24 2024-05-17 江苏相实网络科技有限公司 Enterprise data monitoring method and system based on big data architecture
CN120180047A (en) * 2025-05-21 2025-06-20 中国铁道科学研究院集团有限公司电子计算技术研究所 Railway roadbed deformation monitoring data interval prediction method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110531054A (en) * 2019-09-29 2019-12-03 河南省农业科学院农业经济与信息研究所 Soil organic matter uncertainty in traffic estimating and measuring method based on Bootstrap sampling

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7401041B2 (en) * 2000-12-15 2008-07-15 The Trustees Of Columbia University Systems and methods for providing robust investment portfolios
US8170841B2 (en) * 2004-04-16 2012-05-01 Knowledgebase Marketing, Inc. Predictive model validation
KR101104893B1 (en) * 2009-09-28 2012-01-12 한국수력원자력 주식회사 Method for predicting radial creep pressure pipe
KR101998553B1 (en) * 2012-08-01 2019-07-10 한국전력공사 Prediction Method of Short-Term Wind Speed and Wind Power and Power Supply Line Voltage Prediction Method Therefore
CN104166787B (en) * 2014-07-17 2017-06-13 南京航空航天大学 A kind of aero-engine method for predicting residual useful life based on multistage information fusion
CN104408317A (en) * 2014-12-02 2015-03-11 大连理工大学 Metallurgy enterprise gas flow interval predicting method based on Bootstrap echo state network integration
CN104915518B (en) * 2015-06-30 2017-12-12 中南大学 A kind of construction method of blast furnace molten iron silicon content two dimension forecasting model and application
US10223331B2 (en) * 2016-04-08 2019-03-05 Goodrich Corporation Warp models for registering multi-spectral imagery
CN105975444A (en) * 2016-05-24 2016-09-28 南京大学 Method for quantitatively analyzing underground water numerical simulation uncertainty based on information entropy
CN106126944B (en) * 2016-06-28 2018-05-25 山东大学 A kind of power transformer top-oil temperature interval prediction method and system
WO2020000248A1 (en) * 2018-06-27 2020-01-02 大连理工大学 Space reconstruction based method for predicting key performance parameters of transition state acceleration process of aircraft engine
CN110096805B (en) * 2019-04-30 2022-09-20 福建农林大学 Bridge structure parameter uncertainty quantification and transfer method based on improved self-service method
CN110866314B (en) * 2019-10-22 2022-11-15 东南大学 Rotating Machinery Remaining Lifetime Prediction Method Based on Multilayer Bidirectionally Gated Recurrent Unit Networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110531054A (en) * 2019-09-29 2019-12-03 河南省农业科学院农业经济与信息研究所 Soil organic matter uncertainty in traffic estimating and measuring method based on Bootstrap sampling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Uncertainty Quantification Techniques for Sensor Calibration Monitoring in Nuclear Power Plants;P Ramuhalli等;《National Technical Information Service,U.S. Department of Commerce》;20140430;第1-53页 *

Also Published As

Publication number Publication date
CN112100574A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112100574B (en) AAKR model uncertainty calculation method and system based on resampling
CN111582551B (en) Wind power plant short-term wind speed prediction method and system and electronic equipment
CN109948833A (en) A Deterioration Trend Prediction Method of Hydroelectric Units Based on Long Short-Term Memory Network
CN115062864A (en) Method and system for decomposing-integrating prediction of medium-term load of power distribution network
WO2023165006A1 (en) Predictive maintenance method and apparatus for industrial equipment based on health status index, and electronic device
CN106777814A (en) Method for predicting reliability with faulty physical is updated based on multi-source hierarchical information
CN111680398B (en) A single machine performance degradation prediction method based on Holt-Winters model
CN103389360B (en) Based on the debutanizing tower butane content soft measuring method of probability principal component regression model
CN114676622A (en) Short-term photovoltaic power prediction method based on self-encoder deep learning model
CN119180211A (en) Digital twin simulation method of automatic verification system of electric energy meter
CN110879927A (en) Sea clutter amplitude statistical distribution field modeling method for sea target detection
CN116595327A (en) Sluice deformation monitoring data preprocessing system and method
CN115879607A (en) Electric energy meter state prediction method, system, equipment and storage medium
CN113151842A (en) Method and device for determining conversion efficiency of wind-solar complementary water electrolysis hydrogen production
CN101673096B (en) Soft-measuring method for density in concentration process of salvia miltiorrhiza injection production
CN116859255A (en) A method, device, equipment and medium for predicting the health status of energy storage batteries
CN114564487B (en) Meteorological raster data update method combined with forecast and forecast
CN119696190A (en) A method and system for intelligent sensor network monitoring for high and low voltage switch cabinets
CN120185193A (en) Intelligent power distribution cabinet state dynamic monitoring method and system
CN112766076B (en) Ultra-short-term prediction method, system, equipment and storage medium for power load
CN118971349A (en) Intelligent monitoring method and system for power distribution network based on data center and edge computing
CN117495435B (en) Electricity sales interval prediction method and device based on FIG-IRELM
CN108932197A (en) Software failure time forecasting methods based on parameter Bootstrap double sampling
CN118520396A (en) Abnormality detection method based on multi-scale time convolution network and seasonal decomposition
CN117521460A (en) A Bayesian finite element model correction method considering the uncertainty of environmental disturbances

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant