[go: up one dir, main page]

CN111220565B - CPLS-based infrared spectrum measuring instrument calibration migration method - Google Patents

CPLS-based infrared spectrum measuring instrument calibration migration method Download PDF

Info

Publication number
CN111220565B
CN111220565B CN202010045812.3A CN202010045812A CN111220565B CN 111220565 B CN111220565 B CN 111220565B CN 202010045812 A CN202010045812 A CN 202010045812A CN 111220565 B CN111220565 B CN 111220565B
Authority
CN
China
Prior art keywords
center
matrix
spectrum
data set
cpls
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010045812.3A
Other languages
Chinese (zh)
Other versions
CN111220565A (en
Inventor
赵煜辉
刘晓东
李雪晶
芦鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202010045812.3A priority Critical patent/CN111220565B/en
Publication of CN111220565A publication Critical patent/CN111220565A/en
Application granted granted Critical
Publication of CN111220565B publication Critical patent/CN111220565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2201/00Features of devices classified in G01N21/00
    • G01N2201/12Circuits of general importance; Signal processing
    • G01N2201/127Calibration; base line adjustment; drift compensation

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

本发明涉及机器学习模块下的迁移学习技术领域,提供一种基于CPLS的红外光谱测量仪器标定迁移方法。首先采集源域数据集{Xm,Y}和目标域数据集{Xs,Y},并对其进行中心化处理,得到中心化处理后的源域数据集{Xm_center,Ycenter}和目标域数据集{Xs_center,Ycenter};接着基于CPLS算法对矩阵Xm_center、Ycenter进行主成分分析,并对矩阵Xs_center进行主成分分析;再计算转移矩阵Mtrans_pre和转移矩阵Mtrans;最后,对被测对象的物质浓度变量进行预测。本发明能够清除主仪器测量的随机噪声,提高数据利用率和建模精度,降低时间复杂度。

Figure 202010045812

The invention relates to the technical field of migration learning under a machine learning module, and provides a calibration and migration method of an infrared spectrum measuring instrument based on CPLS. First, the source domain dataset {X m , Y} and the target domain dataset {X s , Y} are collected, and they are centrally processed to obtain the centrally processed source domain dataset {X m_center , Y center } and Target domain data set {X s_center , Y center }; Then, based on the CPLS algorithm, the matrix X m_center and Y center are subjected to principal component analysis, and the matrix X s_center is subjected to principal component analysis; The transition matrix M trans_pre and the transition matrix M trans are calculated again; Finally, predict the substance concentration variable of the measured object. The invention can clear the random noise measured by the main instrument, improve the data utilization rate and modeling accuracy, and reduce the time complexity.

Figure 202010045812

Description

一种基于CPLS的红外光谱测量仪器标定迁移方法A calibration migration method for infrared spectroscopy measuring instruments based on CPLS

技术领域technical field

本发明涉及机器学习模块下的迁移学习技术领域,特别是涉及一种基于CPLS的红外光谱测量仪器标定迁移方法。The invention relates to the technical field of migration learning under a machine learning module, in particular to a calibration and migration method of an infrared spectrum measuring instrument based on CPLS.

背景技术Background technique

近红外光谱(NIRS)分析技术具备仪器操作简单、分析数据速度快、成本较低、不污染样品等优势,已在各领域得到了普遍应用。在生产过程中,使用近红外光谱分析技术进行建模,由于测量条件和仪器硬件性能往往并不稳定,会导致已有的标定模型失效。Near-infrared spectroscopy (NIRS) analysis technology has the advantages of simple instrument operation, fast data analysis, low cost, and no contamination of samples, and has been widely used in various fields. In the production process, the near-infrared spectroscopy analysis technology is used for modeling, because the measurement conditions and instrument hardware performance are often unstable, which will lead to the failure of the existing calibration model.

迁移学习的主要目的是从源域的一项或多项任务中提取分类或回归知识,并将这些知识应用到目标域任务中,如果一个任务的知识成功地转移到另一个任务中,那么新任务的模型可以在没有太多新样本的情况下获得。利用在一个或多个源域学习的知识,提高目标域的学习性能,解决了目标域标签缺失、标签成本高、学习过程耗时等问题,达到提高学习性能的目的。The main purpose of transfer learning is to extract classification or regression knowledge from one or more tasks in the source domain and apply this knowledge to the target domain task, if the knowledge of one task is successfully transferred to another task, then the new A model for the task can be obtained without too many new samples. Using the knowledge learned in one or more source domains to improve the learning performance of the target domain, it solves the problems of missing labels in the target domain, high label cost, and time-consuming learning process, and achieves the purpose of improving the learning performance.

标定迁移方法指的是在不同测量仪器或测量状态下的多元标定模型的迁移。这种方法利用不同来源的光谱数据间的线性关系,对新仪器或新状态下测得光谱样本进行转换,进而可以直接利用原有模型对新样本进行预测。迁移研究可以应用于相关领域而不是同一个领域之间,实现对迁移、域间转换的有用信息,从而可以保持原有模型的有效性或利用原有信息加快建模速度,避免用大量的目标域样本或模型再次对目标域进行采样或建模,从而提高模型的有效性,在很大程度上降低了成本,加快了建模速度。The calibration migration method refers to the migration of multivariate calibration models under different measuring instruments or measurement states. This method uses the linear relationship between spectral data from different sources to convert spectral samples measured in new instruments or in new states, and then directly use the original model to predict new samples. Migration research can be applied to related fields instead of the same field, to achieve useful information for migration and inter-domain conversion, so that the validity of the original model can be maintained or the original information can be used to speed up the modeling speed and avoid using a large number of targets. Domain samples or models again sample or model the target domain, thereby increasing the effectiveness of the model, reducing costs to a large extent and speeding up modeling.

已有的标定迁移方法存在着预测精度不高、限制应用场合等问题。如基于PLS的标定迁移方法中,偏最小二乘(partial least-regression,PLS)是数据信息提取和过程监控中常用的算法之一,通过提取过程变量与质量变量相关性最大的特征信息并对过程变量进行划分,将过程变量和质量变量转化为主元子空间和剩余子空间,实现了数据的压缩和提取。然而,PLS算法首先使用主成分分析法分别提取过程变量和质量变量的主元,二者主元没有关联。它默认为所有进程变量对质量变量都起作用,忽略了内部变量的状态信息。在许多情况下,由于过程数据缺乏激励,存在大量未测量的过程和质量扰动,当质量变量的剩余信息发生变化时,会出现报警失效现象,导致PLS预测输出较差。实际上,相较于过程变量,对质量变量信息变化的监控更加重要。另一方面,建立PLS模型所涉及的优化目标是在不受残差约束的情况下,最大化过程变量与质量变量之间的主成分相关性,使过程变量与质量变量之间的残差方差达到最大。变量不能保证是最小的,这可能会导致大量的过程变量和质量变量的信息残留。再者,目前近红外光谱建模处理数据量大,串行偏最小二乘算法时间复杂度高、训练和测试过程长。The existing calibration migration methods have problems such as low prediction accuracy and limited application occasions. For example, in the calibration migration method based on PLS, partial least-regression (PLS) is one of the commonly used algorithms in data information extraction and process monitoring. The process variables are divided, and the process variables and quality variables are converted into the main subspace and the residual subspace, which realizes the compression and extraction of data. However, the PLS algorithm first uses the principal component analysis method to extract the principal components of the process variable and the quality variable respectively, and the two principal components are not related. It defaults to all process variables acting on quality variables, ignoring state information for internal variables. In many cases, due to the lack of excitation of the process data, there are a large number of unmeasured process and quality disturbances, when the remaining information of the quality variable changes, the alarm failure phenomenon occurs, resulting in poor PLS prediction output. In fact, the monitoring of changes in quality variable information is more important than process variables. On the other hand, the optimization goal involved in building a PLS model is to maximize the principal component correlation between the process variable and the quality variable without being constrained by the residual, so that the residual variance between the process variable and the quality variable to reach maximum. The variables are not guaranteed to be minimal, which may result in a large amount of information remaining on process variables and quality variables. Furthermore, the current near-infrared spectrum modeling processing data volume is large, the time complexity of the serial partial least squares algorithm is high, and the training and testing process is long.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的问题,本发明提供一种基于CPLS的红外光谱测量仪器标定迁移方法,能够清除主仪器测量的随机噪声,提高数据利用率和建模精度,降低时间复杂度。Aiming at the problems existing in the prior art, the present invention provides a calibration and migration method for an infrared spectrum measuring instrument based on CPLS, which can remove random noise measured by the main instrument, improve data utilization and modeling accuracy, and reduce time complexity.

本发明的技术方案为:The technical scheme of the present invention is:

一种基于CPLS的红外光谱测量仪器标定迁移方法,其特征在于,包括下述步骤:A kind of infrared spectroscopic measuring instrument calibration migration method based on CPLS, is characterized in that, comprises the following steps:

步骤1:将红外光谱测量主仪器对应到源域、将红外光谱测量从仪器对应到目标域,使用红外光谱测量主仪器、红外光谱测量从仪器采集每个样本的光谱,分别得到主光谱、从光谱,对主光谱、从光谱分别在波长范围内间隔anm提取光谱数据,并采集每个样本的物质浓度变量值,得到源域数据集{Xm,Y}和目标域数据集{Xs,Y};Step 1: Correspond the main infrared spectrum measurement instrument to the source domain and the infrared spectrum measurement slave instrument to the target domain. Use the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument to collect the spectrum of each sample, and obtain the main spectrum and the slave spectrum respectively. spectrum, extract spectral data at intervals of anm in the wavelength range for the main spectrum and the secondary spectrum, and collect the variable value of the substance concentration of each sample to obtain the source domain data set {X m , Y} and the target domain data set {X s , Y};

其中,Xm=(Xm1,Xm2,...,Xmi,...,XmI)T,Xmi=(xmi1,xmi2,...,xmij,...,xmiJ),Xs=(Xs1,Xs2,...,Xsi,...,XsI)T,Xsi=(xsi1,xsi2,...,xsij,...,xsiJ),xmij、xsij分别为第i个样本的第j个主光谱数据、从光谱数据,i=1,2,...,I,j=1,2,...,J,I为样本总数,J为提取的光谱数据点总数;Y=(Y1,Y2,...,Yi,...,YI)T,Yi=(yi1,yi2,...,yik,...,yiK),yik为第i个样本的第k个物质浓度变量的值,k=1,2,...,K,K为物质浓度变量总数;Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T , X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ), X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,..., x siJ ), x mij , x sij are the j-th master spectral data and slave spectral data of the ith sample, respectively, i=1,2,...,I, j=1,2,...,J , I is the total number of samples, J is the total number of spectral data points extracted; Y=(Y 1 ,Y 2 ,...,Y i ,...,Y I ) T ,Y i =(y i1 ,y i2 , ...,y ik ,...,y iK ), y ik is the value of the kth substance concentration variable of the ith sample, k=1,2,...,K, K is the total number of substance concentration variables ;

步骤2:对源域数据集和目标域数据集进行中心化处理,得到中心化处理后的源域数据集{Xm_center,Ycenter}和目标域数据集{Xs_center,Ycenter};Step 2: Centralize the source domain data set and the target domain data set, and obtain the centrally processed source domain data set {X m_center , Y center } and target domain data set {X s_center , Y center };

步骤3:基于CPLS算法对矩阵Xm_center、Ycenter进行主成分分析:Step 3: Perform principal component analysis on the matrices X m_center and Y center based on the CPLS algorithm:

步骤3.1:基于PLS算法对数据集{Xm_center,Ycenter}建立标定模型Ycenter=Xm_centerB,计算得到系数矩阵B、Xm_center的得分矩阵T、Xm_center的载荷矩阵P、Ycenter的得分矩阵U、Ycenter的载荷矩阵Q,引入矩阵R使T=Xm_centerR,并确定潜在变量个数l;Step 3.1: Based on the PLS algorithm, establish a calibration model Y center = X m_center B for the data set {X m_center , Y center }, and calculate the coefficient matrix B, the score matrix T of X m_center , the load matrix P of X m_center , and the score of Y center Load matrix Q of matrix U, Y center , introduce matrix R to make T=X m_center R, and determine the number of latent variables l;

步骤3.2:计算可预测的物质浓度变量为Step 3.2: Calculate the predictable substance concentration variable as

Figure BDA0002369344620000021
Figure BDA0002369344620000021

对可预测的物质浓度变量进行奇异值分解,得到Singular value decomposition of the predictable species concentration variable yields

Figure BDA0002369344620000031
Figure BDA0002369344620000031

其中,Uc为左奇异矩阵,Dc为奇异值对角矩阵,Vc为右奇异矩阵,Vc是正交矩阵;Qc=VcDc T,包括降序的lc个非零奇异值和相应的右奇异向量;Among them, U c is a left singular matrix, D c is a singular value diagonal matrix, V c is a right singular matrix, and V c is an orthogonal matrix; Q c =V c D c T , including lc non-zero singularities in descending order value and the corresponding right singular vector;

由式(2)得到It can be obtained by formula (2)

Figure BDA0002369344620000032
Figure BDA0002369344620000032

得到get

Rc=RQTVcDc -1 (4)R c =RQ T V c D c -1 (4)

步骤3.3:计算不可预测的物质浓度变量为Step 3.3: Calculate the unpredictable substance concentration variable as

Figure BDA0002369344620000033
Figure BDA0002369344620000033

对不可预测的物质浓度变量进行主成分提取,得到ly个主成分数为Principal component extraction is performed on the unpredictable substance concentration variable, and ly principal components are obtained as

Figure BDA0002369344620000034
Figure BDA0002369344620000034

其中,

Figure BDA0002369344620000035
Figure BDA0002369344620000036
的输出残差矩阵;in,
Figure BDA0002369344620000035
for
Figure BDA0002369344620000036
The output residual matrix of ;

通过式(6)求出矩阵

Figure BDA0002369344620000037
The matrix is obtained by formula (6)
Figure BDA0002369344620000037

步骤3.4:通过在空间上Rc投影,得到与物质浓度变量无关的输入变量为Step 3.4: Through the projection of R c in space, the input variable independent of the substance concentration variable is obtained as

Figure BDA0002369344620000038
Figure BDA0002369344620000038

其中,Rc *=(Rc TRc)-1Rc TWherein, R c * = (R c T R c ) -1 R c T ;

对与物质浓度变量无关的输入变量进行主成分提取,得到lx个主成分数为Principal component extraction is performed on the input variables unrelated to the substance concentration variable, and the number of l x principal components is

Figure BDA0002369344620000039
Figure BDA0002369344620000039

其中,

Figure BDA00023693446200000310
Figure BDA00023693446200000311
的输入残差矩阵;in,
Figure BDA00023693446200000310
for
Figure BDA00023693446200000311
The input residual matrix of ;

通过式(8)求出矩阵

Figure BDA00023693446200000312
The matrix is obtained by formula (8)
Figure BDA00023693446200000312

步骤3.5:由步骤3.1至步骤3.4,得到Xm_center、Ycenter的经PLS算法提取的主成分分别为Xm_pre=TPT、Ypre=UQT,Xm_center、Ycenter的残差分别为Xm_res_c=Xm_center-Xm_pre、Yres_c=Ycenter-Ypre,也即得到Step 3.5: From step 3.1 to step 3.4, the principal components extracted by the PLS algorithm of X m_center and Y center are respectively X m_pre =TP T , Y pre =UQ T , and the residuals of X m_center and Y center are respectively X m_res_c =X m_center -X m_pre , Y res_c =Y center -Y pre , that is, to get

Figure BDA0002369344620000041
Figure BDA0002369344620000041

Figure BDA0002369344620000042
Figure BDA0002369344620000042

步骤4:采用与步骤3中相同的方法对矩阵Xs_center进行主成分分析,得到Xs_center的残差为Xs_res_cStep 4: adopt the same method as in step 3 to carry out principal component analysis on the matrix X s_center , and obtain the residual of X s_center as X s_res_c ;

步骤5:计算主光谱经PLS算法提取主成分后源域数据集的得分Tm_pre=Xm_centerR,计算从光谱经PLS算法提取主成分后目标域数据集的得分Ts_pre=Xs_centerR,根据Tm_pre、Ts_pre基于最小二乘法计算转移矩阵Mtrans_pre;计算主光谱对残差提取主成分后源域数据集的得分Tm=Xm_res_cP,计算从光谱对残差提取主成分后目标域数据集的得分Ts=Xs_res_cP,根据Tm、Ts基于最小二乘法计算转移矩阵MtransStep 5: Calculate the score T m_pre =X m_center R of the source domain data set after the principal components are extracted from the main spectrum by the PLS algorithm, calculate the score T s_pre =X s_center R of the target domain data set after the principal components are extracted from the spectrum by the PLS algorithm, according to T m_pre , T s_pre calculate the transition matrix M trans_pre based on the least squares method; calculate the score T m =X m_res_c P of the source domain data set after the main spectral pair residuals extract the principal components, calculate the target domain after extracting the principal components from the spectral pair residuals The score of the dataset T s =X s_res_c P, and the transition matrix M trans is calculated based on the least squares method according to T m and T s ;

步骤6:对被测对象的物质浓度变量进行预测:Step 6: Predict the substance concentration variable of the measured object:

步骤6.1:使用红外光谱测量从仪器采集被测对象的光谱,使用与步骤1中相同的方法提取光谱数据,得到被测对象的J个从光谱数据构成的矩阵Xs_testStep 6.1: use infrared spectrum measurement to collect the spectrum of the measured object from the instrument, extract the spectral data using the same method as in step 1, and obtain J matrix X s_test formed from the spectral data of the measured object;

步骤6.2:基于CPLS算法对Xs_test进行主成分分析,得到Xs_test的残差为Xs_res_c_testStep 6.2: perform principal component analysis on X s_test based on the CPLS algorithm, and obtain the residual of X s_test as X s_res_c_test ;

步骤6.3:预测被测对象的物质浓度变量构成的矩阵为Ytest_predict=(Xs_test*R*Mtrans_pre*PT+Xs_res_c_test*R*Mtrans*PT)*B。Step 6.3: The matrix formed by predicting the substance concentration variables of the tested object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B.

进一步地,所述步骤1中,所述样本为谷物,所述光谱数据为吸收度,所述物质浓度变量包括谷物的水分含量、油分含量、蛋白质含量、淀粉含量。Further, in the step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variable includes the moisture content, oil content, protein content, and starch content of the grain.

本发明的有益效果为:The beneficial effects of the present invention are:

本发明基于CPLS算法对源域数据集和目标域数据集进行一次主成分提取后,对残差再进行一次主成分提取,在两次主成分提取的基础上计算转移矩阵,清除了主仪器测量的随机噪声,提高了数据利用率和建模精度,降低了时间复杂度,提高了训练和测试的速度。Based on the CPLS algorithm, the invention extracts the principal components of the source domain data set and the target domain data set once, and then extracts the principal components for the residuals again. On the basis of the two principal component extractions, the transfer matrix is calculated, and the measurement of the main instrument is eliminated. The random noise improves the data utilization and modeling accuracy, reduces the time complexity, and improves the speed of training and testing.

附图说明Description of drawings

图1为本发明的基于CPLS的红外光谱测量仪器标定迁移方法的流程图。FIG. 1 is a flow chart of a method for calibrating migration of an infrared spectrometer measuring instrument based on CPLS of the present invention.

图2为本发明的基于CPLS的红外光谱测量仪器标定迁移方法中基于CPLS对源域数据集进行主成分分析的流程图。FIG. 2 is a flow chart of performing principal component analysis on a source domain data set based on CPLS in the method for calibrating and migrating an infrared spectrometer measuring instrument based on CPLS of the present invention.

图3为本发明的基于CPLS的红外光谱测量仪器标定迁移方法中求转移矩阵的流程图。FIG. 3 is a flow chart of finding a transfer matrix in the method for calibrating and transferring an infrared spectrometer measuring instrument based on CPLS of the present invention.

图4为本发明的基于CPLS的红外光谱测量仪器标定迁移方法中对被测对象的物质浓度变量进行预测的流程图。FIG. 4 is a flow chart of predicting the substance concentration variable of the measured object in the method for calibrating migration of an infrared spectrometer measuring instrument based on CPLS of the present invention.

图5为具体实施方式中玉米数据集上油分的交叉验证误差随主成分数变化的示意图。FIG. 5 is a schematic diagram of the cross-validation error of the oil content on the corn dataset in the specific embodiment as a function of the principal component fraction.

图6为具体实施方式中mp6spec-mp5spec的拟合结果图。FIG. 6 is a fitting result diagram of mp6spec-mp5spec in the specific embodiment.

图7为具体实施方式中m5spec-mp5spec的拟合结果图。FIG. 7 is a graph of the fitting result of m5spec-mp5spec in a specific embodiment.

具体实施方式Detailed ways

下面将结合附图和具体实施方式,对本发明作进一步描述。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

本发明提出一种基于CPLS的红外光谱测量仪器标定迁移方法。因为在对数据处理上,PLS只是简单地对X和Y进行一次主成分提取,但通常X和Y的残差中也包含有效信息,由于提取不充分导致建立的模型误差较大,因此提出并行偏最小二乘(Concurrent PLS,CPLS)算法,在PLS的基础上,对残差再进行一次主成分提取,这样建立的模型误差更小,线性关系更接近于真实情况。但是由于现实中,样本的采集非常昂贵、费时,因此又在CPLS的基础上提出迁移学习,通过在源域和目标域的标准集上建立映射关系,完成对目标域测试集的预测。The invention proposes a calibration migration method of an infrared spectrum measuring instrument based on CPLS. Because in data processing, PLS simply performs principal component extraction on X and Y, but usually the residuals of X and Y also contain valid information. Due to insufficient extraction, the established model has a large error. Therefore, a parallel method is proposed. Partial least squares (Concurrent PLS, CPLS) algorithm, on the basis of PLS, extracts the principal components of the residual again, so that the model error is smaller and the linear relationship is closer to the real situation. However, in reality, the collection of samples is very expensive and time-consuming, so transfer learning is proposed on the basis of CPLS, and the prediction of the test set of the target domain is completed by establishing a mapping relationship between the standard set of the source domain and the target domain.

本发明采用的CPLS算法对PLS算法进一步改进,对与质量变量不相关的过程变量信息、无法分别预测信息的质量进行主成分分析,划分为5个子空间:过程变量与质量变量相关信息的子空间(相关主元子空间)、过程变量主元空间、过程变量残差空间、质量变量主元空间、质量变量残差子空间。The CPLS algorithm adopted by the present invention further improves the PLS algorithm, and performs principal component analysis on the process variable information that is not related to the quality variable and the quality of the information that cannot be predicted separately, and is divided into 5 subspaces: the subspace of the information related to the process variable and the quality variable. (Relevant Pivot Subspace), Process Variable Pivot Space, Process Variable Residual Space, Quality Variable Pivot Space, Quality Variable Residual Subspace.

CPLS模型实现了三个目标:(1)从标准PLS投影中提取与输出的可预测变化直接相关的分数,并且这些得分向量构成了共变子空间(CVS);(2)进一步将未预测的输出变化投影到输出主元子空间(OPS)和输出残差子空间(ORS),以监测这些子空间的异常变化;(3)与预测输出无关的输入变化被进一步投影到输入主元子空间(IPS)和输入残差子空间(IRS),以监视这些子空间中的异常变化。The CPLS model achieves three goals: (1) extract scores directly related to predictable changes in the output from standard PLS projections, and these score vectors constitute the covariant subspace (CVS); (2) further convert the unpredicted The output changes are projected to the output principal component subspace (OPS) and the output residual subspace (ORS) to monitor abnormal changes in these subspaces; (3) input changes unrelated to the predicted output are further projected to the input principal component subspace (IPS) and Input Residual Subspaces (IRS) to monitor anomalous changes in these subspaces.

CPLS算法设置过程变量数据分为两个主要部分,其中一部分是与质量变量有关的信息,另一部分是与质量变量无关的信息。质量变量数据也分为两个主要部分,一部分是属于可由过程变量预测的信息,另一部分是不能由过程变量预测的信息。因此,基于CPLS监控方法提供了一个完整的监控框架,能够监控过程变量和质量变量以及信息的其他部分。The CPLS algorithm sets the process variable data into two main parts, one of which is the information related to the quality variable, and the other part is the information not related to the quality variable. The quality variable data is also divided into two main parts, one is the information that can be predicted by the process variable, and the other is the information that cannot be predicted by the process variable. Therefore, the CPLS-based monitoring method provides a complete monitoring framework capable of monitoring process variables and quality variables as well as other parts of the information.

如图1所示,本发明的基于CPLS的红外光谱测量仪器标定迁移方法,包括下述步骤:As shown in Figure 1, the CPLS-based infrared spectroscopy measuring instrument calibration migration method of the present invention comprises the following steps:

步骤1:将红外光谱测量主仪器对应到源域、将红外光谱测量从仪器对应到目标域,使用红外光谱测量主仪器、红外光谱测量从仪器采集每个样本的光谱,分别得到主光谱、从光谱,对主光谱、从光谱分别在波长范围内间隔anm提取光谱数据,并采集每个样本的物质浓度变量值,得到源域数据集{Xm,Y}和目标域数据集{Xs,Y};Step 1: Correspond the main infrared spectrum measurement instrument to the source domain and the infrared spectrum measurement slave instrument to the target domain. Use the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument to collect the spectrum of each sample, and obtain the main spectrum and the slave spectrum respectively. spectrum, extract spectral data at intervals of anm in the wavelength range for the main spectrum and the secondary spectrum, and collect the variable value of the substance concentration of each sample to obtain the source domain data set {X m , Y} and the target domain data set {X s , Y};

其中,Xm=(Xm1,Xm2,...,Xmi,...,XmI)T,Xmi=(xmi1,xmi2,...,xmij,...,xmiJ),Xs=(Xs1,Xs2,...,Xsi,...,XsI)T,Xsi=(xsi1,xsi2,...,xsij,...,xsiJ),xmij、xsij分别为第i个样本的第j个主光谱数据、从光谱数据,i=1,2,...,I,j=1,2,...,J,I为样本总数,J为提取的光谱数据点总数;Y=(Y1,Y2,…,Yi,…,YI)T,Yi=(yi1,yi2,…,yik,...,yiK),yik为第i个样本的第k个物质浓度变量的值,k=1,2,…,K,K为物质浓度变量总数。Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T , X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ), X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,..., x siJ ), x mij , x sij are the j-th master spectral data and slave spectral data of the ith sample, respectively, i=1,2,...,I, j=1,2,...,J , I is the total number of samples, J is the total number of spectral data points extracted; Y=(Y 1 ,Y 2 ,…,Y i ,…,Y I ) T ,Y i =(y i1 ,y i2 ,…,y ik ,...,y iK ), y ik is the value of the kth substance concentration variable of the ith sample, k=1,2,...,K, K is the total number of substance concentration variables.

本实施例中,样本为谷物类中的玉米,光谱数据为吸收度,物质浓度变量包括玉米的水分含量、油分含量、蛋白质含量、淀粉含量。三种光谱仪器对相同的I=80个样本测得的数据构成玉米数据集。用红外光谱测量仪器m5、mp5、mp6在1100-2498nm波长范围内每隔a=2nm测量红外光谱,共J=700个属性。第一次实验的主光谱-从光谱为m5spec-mp6spec,也即将m5测得的光谱作为主光谱,对应的光谱数据集作为初始源域数据集;由于mp6测得的光谱与m5测得的差异大些,被选为从光谱,对应的光谱数据集作为初始目标域数据集。然后在mp5spec-mp6spec、mp6spec-mp5spec、m5spec-mp5spec、mp5spec-m5spec、mp6spec-m5spec上依次又进行了五次实验。In this embodiment, the sample is corn in cereals, the spectral data is the absorbance, and the substance concentration variables include the moisture content, oil content, protein content, and starch content of the corn. The data measured by the three spectrometers on the same I=80 samples constituted the corn dataset. The infrared spectrum is measured every a=2nm in the wavelength range of 1100-2498nm with the infrared spectrum measuring instruments m5, mp5, and mp6, with a total of J=700 attributes. The main spectrum of the first experiment - the secondary spectrum is m5spec-mp6spec, that is, the spectrum measured by m5 is used as the main spectrum, and the corresponding spectral data set is used as the initial source domain data set; due to the difference between the spectrum measured by mp6 and that measured by m5 The larger one is selected from the spectrum, and the corresponding spectral dataset is used as the initial target domain dataset. Then, five more experiments were performed in sequence on mp5spec-mp6spec, mp6spec-mp5spec, m5spec-mp5spec, mp5spec-m5spec, and mp6spec-m5spec.

本实施例中,采用Kennard-Stone(KS)算法对玉米数据集进行分割。首先,提取初始源域数据集和初始目标域数据集中20%的数据作为测试样本,分别为16个样本的数据。利用目标域的测试样本对标定迁移模型进行测试。然后,提取初始源域数据集和初始目标域数据集中剩余的80%的数据作为训练样本,分别为64个样本的数据。利用源域的训练样本建立参考模型,对目标域的迁移样本进行预测;并利用目标域的训练样本建立目标域的标准模型,以便于对比其他迁移模型的性能。接着,从源域的训练样本和目标域的训练样本中,采用KS算法分别提取20%的数据构成源域的标准样本集和目标域的标准样本集,分别作为本发明的方法中使用的源域数据集{Xm,Y}和目标域数据集{Xs,Y},来建立源域样本与目标域样本之间的传递关系。In this embodiment, the Kennard-Stone (KS) algorithm is used to segment the corn dataset. First, 20% of the data in the initial source domain dataset and the initial target domain dataset are extracted as test samples, which are data of 16 samples respectively. The calibration transfer model is tested with test samples from the target domain. Then, the remaining 80% of the data in the initial source domain dataset and the initial target domain dataset are extracted as training samples, which are data of 64 samples respectively. Use the training samples of the source domain to establish a reference model to predict the migration samples of the target domain; and use the training samples of the target domain to establish a standard model of the target domain, so as to compare the performance of other migration models. Next, from the training samples of the source domain and the training samples of the target domain, KS algorithm is used to extract 20% of the data to form the standard sample set of the source domain and the standard sample set of the target domain, respectively, as the source used in the method of the present invention. Domain dataset {X m , Y} and target domain dataset {X s , Y} to establish the transfer relationship between source domain samples and target domain samples.

步骤2:对源域数据集和目标域数据集进行中心化处理,也即对每一列数据求均值,然后用每列的原始数据减去该列的均值,得到中心化处理后的源域数据集{Xm_center,Ycenter}和目标域数据集{Xs_center,Ycenter},这样可以有效避免由于数值差异较大引起的偏差。Step 2: Centralize the source domain data set and the target domain data set, that is, calculate the mean value of each column of data, and then subtract the mean value of the column from the original data of each column to obtain the centrally processed source domain data Set {X m_center , Y center } and target domain dataset {X s_center , Y center }, which can effectively avoid the deviation caused by large numerical differences.

步骤3:如图2所示,基于CPLS算法对矩阵Xm_center、Ycenter进行主成分分析:Step 3: As shown in Figure 2, perform principal component analysis on the matrices X m_center and Y center based on the CPLS algorithm:

步骤3.1:基于PLS算法对数据集{Xm_center,Ycenter}建立标定模型Ycenter=Xm_centerB,计算得到系数矩阵B、Xm_center的得分矩阵T、Xm_center的载荷矩阵P、Ycenter的得分矩阵U、Ycenter的载荷矩阵Q,引入矩阵R使T=Xm_centerR,并确定潜在变量个数l;Step 3.1: Based on the PLS algorithm, establish a calibration model Y center = X m_center B for the data set {X m_center , Y center }, and calculate the coefficient matrix B, the score matrix T of X m_center , the load matrix P of X m_center , and the score of Y center Load matrix Q of matrix U, Y center , introduce matrix R to make T=X m_center R, and determine the number of latent variables l;

步骤3.2:计算可预测的物质浓度变量为Step 3.2: Calculate the predictable substance concentration variable as

Figure BDA0002369344620000071
Figure BDA0002369344620000071

对可预测的物质浓度变量进行奇异值(SVD,Singular Value Decomposition)分解,得到The Singular Value Decomposition (SVD, Singular Value Decomposition) decomposition of the predictable substance concentration variable yields

Figure BDA0002369344620000072
Figure BDA0002369344620000072

其中,Uc为左奇异矩阵,Dc为奇异值对角矩阵,Vc为右奇异矩阵,Vc是正交矩阵;Qc=VcDc T,包括降序的lc个非零奇异值和相应的右奇异向量;Among them, U c is a left singular matrix, D c is a singular value diagonal matrix, V c is a right singular matrix, and V c is an orthogonal matrix; Q c =V c D c T , including lc non-zero singularities in descending order value and the corresponding right singular vector;

由式(2)得到It can be obtained by formula (2)

Figure BDA0002369344620000073
Figure BDA0002369344620000073

得到get

Rc=RQTVcDc -1 (4)R c =RQ T V c D c -1 (4)

步骤3.3:计算不可预测的物质浓度变量为Step 3.3: Calculate the unpredictable substance concentration variable as

Figure BDA0002369344620000074
Figure BDA0002369344620000074

对不可预测的物质浓度变量进行主成分提取(PCA),得到ly个主成分数为Principal component extraction (PCA) is performed on the unpredictable substance concentration variable, and the number of principal components is obtained as

Figure BDA0002369344620000075
Figure BDA0002369344620000075

其中,

Figure BDA0002369344620000076
Figure BDA0002369344620000077
的输出残差矩阵;in,
Figure BDA0002369344620000076
for
Figure BDA0002369344620000077
The output residual matrix of ;

通过式(6)求出矩阵

Figure BDA0002369344620000078
The matrix is obtained by formula (6)
Figure BDA0002369344620000078

步骤3.4:通过在空间上Rc投影,得到与物质浓度变量无关的输入变量为Step 3.4: Through the projection of R c in space, the input variable independent of the substance concentration variable is obtained as

Figure BDA0002369344620000081
Figure BDA0002369344620000081

其中,Rc *=(Rc TRc)-1Rc TWherein, R c * = (R c T R c ) -1 R c T ;

对与物质浓度变量无关的输入变量进行主成分提取,得到lx个主成分数为Principal component extraction is performed on the input variables unrelated to the substance concentration variable, and the number of l x principal components is

Figure BDA0002369344620000082
Figure BDA0002369344620000082

其中,

Figure BDA0002369344620000083
Figure BDA0002369344620000084
的输入残差矩阵;in,
Figure BDA0002369344620000083
for
Figure BDA0002369344620000084
The input residual matrix of ;

通过式(8)求出矩阵

Figure BDA0002369344620000085
The matrix is obtained by formula (8)
Figure BDA0002369344620000085

步骤3.5:由步骤3.1至步骤3.4,得到Xm_center、Ycenter的经PLS算法提取的主成分分别为Xm_pre=TPT、Ypre=UQT,Xm_center、Ycenter的残差分别为Xm_res_c=Xm_center-Xm_pre、Yres_c=Ycenter-Ypre,也即得到Step 3.5: From step 3.1 to step 3.4, the principal components extracted by the PLS algorithm of X m_center and Y center are respectively X m_pre =TP T , Y pre =UQ T , and the residuals of X m_center and Y center are respectively X m_res_c =X m_center -X m_pre , Y res_c =Y center -Y pre , that is, to get

Figure BDA0002369344620000086
Figure BDA0002369344620000086

Figure BDA0002369344620000087
Figure BDA0002369344620000087

根据CPLS的算法流程,可以明显看出Xm_center、Ycenter被划分为三部分:经PLS算法提取的主成分、对残差提取的主成分、不可预测的误差。CPLS算法流程说明相较于PLS算法,它的优点在于多了对残差提取主成分的处理,提高了数据利用率。According to the algorithm flow of CPLS, it can be clearly seen that X m_center and Y center are divided into three parts: principal components extracted by the PLS algorithm, principal components extracted from residuals, and unpredictable errors. Compared with the PLS algorithm, the CPLS algorithm has the advantage of more processing of the principal components of the residual error extraction, which improves the data utilization rate.

步骤4:采用与步骤3中相同的方法对矩阵Xs_center进行主成分分析,得到Xs_center的残差为Xs_res_cStep 4: Perform principal component analysis on the matrix X s_center using the same method as in Step 3, and obtain the residual of X s_center as X s_res_c .

本实施例中,PLS算法最佳主成分数的选择结果分析如下:采用10折交叉验证方法对PLS方法的主成分数进行选取,以油这一成分为例,主成分数变化引起的玉米数据集中目标域训练集的油分含量模型交叉验证误差的变化情况如图5所示。从图5中可以看到,玉米集上油分的交叉验证误差在主成分数为12时达到了全局最小,因此我们对油分的最佳主成分数设为12。其他三种成分的最佳主成分数选择方法与此方法相同。In this embodiment, the analysis of the selection result of the optimal number of principal components of the PLS algorithm is as follows: the principal component number of the PLS method is selected by the 10-fold cross-validation method, taking the component of oil as an example, the corn data caused by the change of the number of principal components The variation of the cross-validation error of the oil content model in the training set of the centralized target domain is shown in Figure 5. As can be seen from Figure 5, the cross-validation error of oil on the corn set reaches the global minimum when the number of principal components is 12, so we set the optimal number of principal components for oil as 12. The optimal principal component fraction selection method for the other three components is the same as this method.

步骤5:如图3所示,使用最小二乘算法建立使目标域潜结构映射到源域潜结构的转移矩阵:计算主光谱经PLS算法提取主成分后源域数据集的得分Tm_pre=Xm_centerR,计算从光谱经PLS算法提取主成分后目标域数据集的得分Ts_pre=Xs_centerR,根据Tm_pre、Ts_pre基于最小二乘法计算转移矩阵Mtrans_pre;计算主光谱对残差提取主成分后源域数据集的得分Tm=Xm_res_cP,计算从光谱对残差提取主成分后目标域数据集的得分Ts=Xs_res_cP,根据Tm、Ts基于最小二乘法计算转移矩阵MtransStep 5: As shown in Figure 3, use the least squares algorithm to establish a transition matrix that maps the latent structure of the target domain to the latent structure of the source domain: Calculate the score of the source domain dataset after the principal spectrum is extracted by the PLS algorithm T m_pre =X m_center R, calculate the score T s_pre =X s_center R of the target domain data set after extracting the principal components from the spectrum through the PLS algorithm, calculate the transition matrix M trans_pre based on the least squares method according to T m_pre , T s_pre ; The score T m =X m_res_c P of the source domain dataset after the composition, calculate the score T s =X s_res_c P of the target domain dataset after extracting the principal components from the spectral pair residuals, calculate the transfer based on T m , T s based on the least squares method matrix M trans .

步骤6:如图4所示,对被测对象的物质浓度变量进行预测:Step 6: As shown in Figure 4, predict the substance concentration variable of the measured object:

步骤6.1:使用红外光谱测量从仪器采集被测对象的光谱,使用与步骤1中相同的方法提取光谱数据,得到被测对象的J个从光谱数据构成的矩阵Xs_testStep 6.1: use infrared spectrum measurement to collect the spectrum of the measured object from the instrument, extract the spectral data using the same method as in step 1, and obtain J matrix X s_test formed from the spectral data of the measured object;

步骤6.2:基于CPLS算法对Xs_test进行主成分分析,得到Xs_test的残差为Xs_res_c_testStep 6.2: perform principal component analysis on X s_test based on the CPLS algorithm, and obtain the residual of X s_test as X s_res_c_test ;

步骤6.3:预测被测对象的物质浓度变量构成的矩阵为Ytest_predict=(Xs_test*R*Mtrans_pre*PT+Xs_res_c_test*R*Mtrans*PT)*B。Step 6.3: The matrix formed by predicting the substance concentration variables of the tested object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B.

本实施例中,使用模型对数据进行预测,玉米数据集中不同主仪器-从仪器组合下的预测误差RMSEP结果如下表1所示:In this embodiment, the model is used to predict the data, and the prediction error RMSEP results under different master-slave-instrument combinations in the corn data set are shown in Table 1 below:

表1Table 1

Figure BDA0002369344620000091
Figure BDA0002369344620000091

分析表1可知:总的来说,在光谱mp5spec和光谱mp6spec之间利用本发明的运算效果普遍比另外两组要好,这是因为相比较而言,mp5spec和mp6spec的相似度比较高,这两组与光谱m5spec的区别比较大,因此在这两个之间迁移学习更有意义,因此结果误差比较小。且不难看出,以mp6spec为主光谱、mp5spec为从光谱,对水分、油分、蛋白质和淀粉的测量误差基本都是这六组实验中最小的,而m5spec和mp5spec、mp6spec之间的迁移结果则是六组中误差最大的。Analysis of Table 1 shows that: in general, the operation effect of the present invention between the spectrum mp5spec and the spectrum mp6spec is generally better than that of the other two groups. This is because the similarity between mp5spec and mp6spec is relatively high. The difference between the group and the spectral m5spec is relatively large, so it makes more sense to transfer learning between the two, so the error in the results is relatively small. It is not difficult to see that with mp6spec as the main spectrum and mp5spec as the secondary spectrum, the measurement errors of moisture, oil, protein and starch are basically the smallest among these six groups of experiments, while the migration results between m5spec and mp5spec and mp6spec are It is the largest error among the six groups.

如图6和图7所示,分别为本实施例中mp6spec-mp5spec、m5spec-mp5spec的拟合结果图。对比图6和图7,可以明显看出两组拟合效果的好坏。光谱mp6spec和光谱mp5spec之间,相似度较高,拟合度较好,对比光谱m5spec和光谱mp5spec之间的迁移学习,可以看出前者大部分点落在拟合线附近或者拟合线上,后者所有点都落在拟合直线的下方,表明前者迁移学习的效果明显好于后者,后者两个光谱之间其实没有迁移的必要,因为预测的效果一点都不好。As shown in FIG. 6 and FIG. 7 , the fitting result diagrams of mp6spec-mp5spec and m5spec-mp5spec in this embodiment are respectively. Comparing Figure 6 and Figure 7, it can be clearly seen that the fitting effect of the two groups is good or bad. Between spectral mp6spec and spectral mp5spec, the similarity is high and the fitting degree is good. Comparing the transfer learning between spectral m5spec and spectral mp5spec, it can be seen that most points of the former fall near the fitting line or on the fitting line. All the points of the latter fall below the fitted straight line, indicating that the effect of the former transfer learning is significantly better than that of the latter, and there is no need to transfer between the two spectra of the latter, because the prediction effect is not good at all.

由于光谱mp6spec-mp5spec之间的迁移效果最好,因此这里选用这组光谱进行实验与其他的算法进行对比,这里所述的其他算法分别是:多元散射校正(MultiplicativeScatter/Signal Correction,MSC)、典型相关分析(Canonical Correlation Analysis,CCA)、斜率偏差校正(Slope and Bias Correction,SBC)、分段直接标准化(PiecewiseDirect Standardization,PDS)。如表2所示,为玉米数据集中mp6spec-m5spec在各算法下的RMSEP对比结果。由表2可以看出,总的来说,本发明的基于CPLS的红外光谱测量仪器标定迁移方法的迁移效果是很好的:相比较MSC、CCA和PDS算法,本发明对四个成分的预测都是远远优于此三种算法的;和SBC算法相比,对水分、油分的预测效果比较好,而对蛋白质和淀粉的预测效果相差不大。Since the migration effect between the spectra mp6spec-mp5spec is the best, this group of spectra is selected for experiments and compared with other algorithms. The other algorithms described here are: Multiplicative Scatter/Signal Correction (MSC), typical Correlation analysis (Canonical Correlation Analysis, CCA), Slope and Bias Correction (Slope and Bias Correction, SBC), Piecewise Direct Standardization (Piecewise Direct Standardization, PDS). As shown in Table 2, it is the RMSEP comparison result of mp6spec-m5spec in the corn dataset under each algorithm. As can be seen from Table 2, in general, the migration effect of the CPLS-based infrared spectroscopy measuring instrument calibration migration method of the present invention is very good: compared with MSC, CCA and PDS algorithms, the present invention predicts four components. They are far superior to these three algorithms; compared with the SBC algorithm, the prediction effect of moisture and oil content is better, but the prediction effect of protein and starch is not much different.

表2Table 2

Figure BDA0002369344620000101
Figure BDA0002369344620000101

总之,通过在玉米数据集上做的六组实验,根据得出的实验结果,并分别与MSC算法、CCA算法、SBC算法、PDS算法作比较,都可以看出本发明的CPLS算法结合迁移学习的预测效果与SBC的效果相近,但远远优于MSC算法、CCA算法、PDS算法。可见,本发明清除了主仪器测量的随机噪声,提高了数据利用率和建模精度。In a word, through the six groups of experiments on the corn data set, according to the obtained experimental results, and comparing with the MSC algorithm, the CCA algorithm, the SBC algorithm and the PDS algorithm, it can be seen that the CPLS algorithm of the present invention is combined with the transfer learning. The prediction effect is similar to that of SBC, but far superior to MSC algorithm, CCA algorithm and PDS algorithm. It can be seen that the present invention removes the random noise measured by the main instrument, and improves the data utilization rate and modeling accuracy.

显然,上述实施例仅仅是本发明的一部分实施例,而不是全部的实施例。上述实施例仅用于解释本发明,并不构成对本发明保护范围的限定。基于上述实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,也即凡在本申请的精神和原理之内所作的所有修改、等同替换和改进等,均落在本发明要求的保护范围内。Obviously, the above-mentioned embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. The above embodiments are only used to explain the present invention, and do not constitute a limitation on the protection scope of the present invention. Based on the above-mentioned embodiments, all other embodiments obtained by those skilled in the art without creative work, that is, all modifications, equivalent replacements and improvements made within the spirit and principle of the present application, are fall within the scope of protection required by the present invention.

Claims (2)

1. A CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,...,Y i ,...,Y I ) T ,Y i =(y i1 ,y i2 ,...,y ik ,...,y iK ),y ik Is the value of the kth species concentration variable for the ith sample, K being 1,2The total number of concentration variables;
step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { X m_center ,Y center And a target domain data set { X } s_center ,Y center };
And 3, step 3: CPLS algorithm based matrix X m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure FDA0002369344610000011
Performing singular value decomposition on predictable substance concentration variables to obtain
Figure FDA0002369344610000012
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure FDA0002369344610000013
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure FDA0002369344610000021
Extracting main components from unpredictable substance concentration variables to obtain l y The main component number is
Figure FDA0002369344610000022
Wherein,
Figure FDA0002369344610000023
is composed of
Figure FDA0002369344610000024
The output residual matrix of (3);
The matrix is obtained by equation (6)
Figure FDA0002369344610000025
Step 3.4: by spatially R c Projection of an input variable independent of the material concentration variable as
Figure FDA0002369344610000026
Wherein R is c * =(R c T R c ) -1 R c T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Figure FDA0002369344610000027
Wherein,
Figure FDA0002369344610000028
is composed of
Figure FDA0002369344610000029
The input residual matrix of (3);
obtaining a matrix by equation (8)
Figure FDA00023693446100000210
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
Figure FDA00023693446100000211
Figure FDA00023693446100000212
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c
And 5: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured object s_test
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
2. The CPLS-based Infrared Spectroscopy measurement instrument calibration migration method according to claim 1, wherein in the step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables comprise moisture content, oil content, protein content and starch content of grain.
CN202010045812.3A 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method Active CN111220565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010045812.3A CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010045812.3A CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Publications (2)

Publication Number Publication Date
CN111220565A CN111220565A (en) 2020-06-02
CN111220565B true CN111220565B (en) 2022-07-29

Family

ID=70827000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010045812.3A Active CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Country Status (1)

Country Link
CN (1) CN111220565B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113959979B (en) * 2021-10-29 2022-07-29 燕山大学 Near infrared spectrum model migration method based on deep Bi-LSTM network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606164A (en) * 1996-01-16 1997-02-25 Boehringer Mannheim Corporation Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN107064054A (en) * 2017-02-28 2017-08-18 浙江大学 A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN108960329A (en) * 2018-07-06 2018-12-07 浙江科技学院 A kind of chemical process fault detection method comprising missing data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7098037B2 (en) * 1998-10-13 2006-08-29 Inlight Solutions, Inc. Accommodating subject and instrument variations in spectroscopic determinations
IL146786A (en) * 2000-03-31 2005-03-20 Japan Government Method and apparatus for detecting mastitis by using visible light and/or near infrared light

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606164A (en) * 1996-01-16 1997-02-25 Boehringer Mannheim Corporation Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN107064054A (en) * 2017-02-28 2017-08-18 浙江大学 A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN108960329A (en) * 2018-07-06 2018-12-07 浙江科技学院 A kind of chemical process fault detection method comprising missing data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring;Zimmerman N;《Atmospheric Measurement Techniques》;20181231;第11卷(第1期);全文 *
Qualitative analysis of maize haploid kernels based on calibration transfer by near-infrared spectroscopy;Li J;《Analytical Letters》;20191231;第52卷(第2期);全文 *
基于Si-cPLS的小麦种子发芽率近红外模型优化研究;吴静珠等;《光谱学与光谱分析》;20170415(第04期);全文 *
基于校正分布差异的标定迁移方法研究;赵煜辉;《东北大学学报(自然科学版)》;20210331;第42卷(第3期);全文 *
平均分布差异最小化的NIR标定迁移方法研究;赵煜辉;《光谱学与光谱分析》;20211031;第41卷(第10期);全文 *
迁移学习在食用油光谱模型转移中的应用;刘翠玲;《食品科学技术学报》;20190731;第37卷(第4期);全文 *

Also Published As

Publication number Publication date
CN111220565A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN106815643B (en) Infrared spectroscopy Model Transfer method based on random forest transfer learning
US12163882B2 (en) Method for selection of calibration set and validation set based on spectral similarity and modeling
CN106680238B (en) A Method for Analyzing Substance Content Based on Infrared Spectroscopy
CN111563436B (en) Infrared spectrum measuring instrument calibration migration method based on CT-CDD
CN107958267B (en) A Prediction Method of Oil Properties Based on Spectral Linear Representation
CN105842190B (en) A kind of method for transferring near infrared model returned based on spectrum
CN114611582B (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
CN106248621B (en) A kind of evaluation method and system
Fan et al. Direct calibration transfer to principal components via canonical correlation analysis
CN114067169A (en) A Raman Spectral Analysis Method Based on Convolutional Neural Network
WO2023207453A1 (en) Traditional chinese medicine ingredient analysis method and system based on spectral clustering
CN107290305B (en) A kind of near infrared spectrum quantitative modeling method based on integrated study
CN102135496A (en) Infrared spectrum quantitative analysis method and infrared spectrum quantitative analysis device based on multi-scale regression
CN111220565B (en) CPLS-based infrared spectrum measuring instrument calibration migration method
Metz et al. RoBoost-PLS2-R: an extension of RoBoost-PLSR method for multi-response
CN111125629B (en) Domain-adaptive PLS regression model modeling method
CN112651173B (en) A non-destructive testing method and generalizable system for agricultural product quality based on cross-domain spectral information
CN109145403B (en) A Modeling Method for Near Infrared Spectroscopy Based on Sample Consensus
CN112649390A (en) Adhesive moisture content monitoring method based on near infrared spectrum
CN119534370A (en) Coal quality detection method, electronic equipment and storage medium
CN115015162B (en) Near infrared spectroscopy model matching method
CN113945524B (en) Vegetable index-based carotenoid and chlorophyll ratio inversion method and system
CN113916817B (en) Spectrum method chromaticity online measurement method for urban living drinking water
CN110632024B (en) Quantitative analysis method, device and equipment based on infrared spectrum and storage medium
CN104181125A (en) Method for rapidly determining Kol-bach value of beer malt

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant