[go: up one dir, main page]

CN116166973A - Portable near infrared spectrum quantitative model and unknown sample matching degree judging method - Google Patents

Portable near infrared spectrum quantitative model and unknown sample matching degree judging method Download PDF

Info

Publication number
CN116166973A
CN116166973A CN202310185204.6A CN202310185204A CN116166973A CN 116166973 A CN116166973 A CN 116166973A CN 202310185204 A CN202310185204 A CN 202310185204A CN 116166973 A CN116166973 A CN 116166973A
Authority
CN
China
Prior art keywords
principal component
unknown sample
matching degree
quantitative model
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310185204.6A
Other languages
Chinese (zh)
Inventor
李光尧
张国宏
刘浩
贾利红
王毅
闫晓剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Cric Technology Co ltd
Original Assignee
Sichuan Cric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Cric Technology Co ltd filed Critical Sichuan Cric Technology Co ltd
Priority to CN202310185204.6A priority Critical patent/CN116166973A/en
Publication of CN116166973A publication Critical patent/CN116166973A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Chemical & Material Sciences (AREA)
  • Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a method for judging the matching degree of a portable near infrared spectrum quantitative model and an unknown sample, which comprises the steps of constructing a quantitative model by using a partial least square method to obtain a load matrix and a score matrix of a PLS main component; respectively calculating score threshold values of the first main component and the second main component, and constructing an elliptic polar coordinate expression by using the two threshold values; mapping the acquired unknown sample spectrum into a main component to obtain a score value; substituting the score value of the first principal component into an elliptic polar coordinate expression to obtain predicted score values of two principal components to form a judging section; and detecting whether the score value of the second principal component of the unknown sample is in a judging section, so as to judge the matching degree of the unknown sample and the quantitative model. The invention judges the matching degree with the quantitative model in the unknown sample prediction process, distinguishes the abnormal sample, can prompt to update the model in time, and ensures the stability and accuracy of the spectrometer in long-time use.

Description

便携式近红外光谱定量模型与未知样本匹配度判别方法Discrimination method of matching degree between portable near-infrared spectroscopy quantitative model and unknown samples

技术领域technical field

本发明涉及近红外异常光谱样本判别技术领域,尤其涉及一种便携式近红外光谱定量模型与未知样本匹配度判别方法。The invention relates to the technical field of discrimination of near-infrared abnormal spectrum samples, in particular to a method for discriminating the matching degree between a portable near-infrared spectrum quantitative model and an unknown sample.

背景技术Background technique

近红外光谱技术广泛应用于物质成分的检测,其具有快速、无损、高效以及成本较低等特点。便携式近红外光谱仪是指能够手持或者手提的近红外分析仪器。这类近红外光谱仪轻便、易携带、功耗低、检测速度块、成本低,广泛应用于农业、医药、地质和环境等多个领域。Near-infrared spectroscopy is widely used in the detection of material components, and it has the characteristics of fast, non-destructive, high efficiency and low cost. Portable near-infrared spectrometer refers to a near-infrared analysis instrument that can be hand-held or portable. This kind of near-infrared spectrometer is light, easy to carry, low power consumption, fast detection speed, low cost, and is widely used in many fields such as agriculture, medicine, geology and environment.

但是便携式近红外光谱仪易受光源、检测器、检测方法、环境条件等影响,造成采集的光谱数据稳定性差,精度低,进而影响其光谱预测分析能力。在实际应用过程中,由于光谱仪自身因素和样本变化情况的影响,便携式近红外光谱设备采集获取的光谱数据较容易出现与模型匹配度不高的情况,进而影响光谱仪对未知样本的预测精度。同时,在利用便携式光谱仪采集光谱样本时,由于人工操作、样品状态、仪器状态等原因,对未知样本的光谱采集过程中,会出现光谱异常的情况,对光谱异常的未知样本进行预测时,会导致预测值异常。However, portable near-infrared spectrometers are easily affected by light sources, detectors, detection methods, and environmental conditions, resulting in poor stability and low precision of the collected spectral data, which in turn affects its spectral prediction and analysis capabilities. In practical applications, due to the influence of spectrometer itself and sample changes, the spectral data collected by portable near-infrared spectroscopy equipment is more likely to have a low matching degree with the model, which will affect the prediction accuracy of the spectrometer for unknown samples. At the same time, when using a portable spectrometer to collect spectral samples, due to reasons such as manual operation, sample status, and instrument status, spectral abnormalities will occur during the spectral collection of unknown samples. When predicting unknown samples with spectral abnormalities, there will be leading to abnormal predictions.

发明内容Contents of the invention

本发明的目的在于提供一种便携式近红外光谱定量模型与未知样本匹配度判别方法,用于判别定量模型与未知样本匹配程度,进而提高利用定量模型预测未知样本的准确率。The purpose of the present invention is to provide a portable method for judging the matching degree between a quantitative model of near-infrared spectrum and an unknown sample, which is used to judge the matching degree between a quantitative model and an unknown sample, and then improve the accuracy of predicting an unknown sample by using a quantitative model.

本发明通过下述技术方案解决上述问题:The present invention solves the above problems through the following technical solutions:

一种便携式近红外光谱定量模型与未知样本匹配度判别方法,包括如下步骤:A method for discriminating the matching degree between a portable near-infrared spectrum quantitative model and an unknown sample, comprising the following steps:

步骤a.基于训练集的光谱数据,利用偏最小二乘法PLS构建定量模型,获得PLS主成分一和主成分二的每个变量的载荷矩阵X_loadings和每个训练集样本的得分矩阵Xt_scores;Step a. Based on the spectral data of the training set, utilize the partial least squares method PLS to construct a quantitative model, obtain the loading matrix X_loadings of each variable of the PLS principal component one and the principal component two and the score matrix Xt_scores of each training set sample;

步骤b.利用多变量t检验方法分别计算出主成分一和主成分二的得分阈值,将两个阈值作为长半径和短半径带入椭圆极坐标公式,构建椭圆极坐标表达式;Step b. Using the multivariate t-test method to calculate the score thresholds of principal component 1 and principal component 2 respectively, and bring the two thresholds into the ellipse polar coordinate formula as the long radius and short radius to construct the ellipse polar coordinate expression;

步骤c.利用步骤a所获得的载荷矩阵X_loadings,将采集到的未知样本光谱映射到主成分一和主成分二中,获得该未知样本对应的主成分一和主成分二的得分值;Step c. Using the loading matrix X_loadings obtained in step a, map the collected unknown sample spectrum to principal component 1 and principal component 2, and obtain the score values of principal component 1 and principal component 2 corresponding to the unknown sample;

步骤d.将主成分一的得分值代入步骤b所构建的椭圆极坐标表达式中,获得两个数值相同符号相反的主成分二的预测得分值,由所获得的两个主成分二的预测得分值构成判断区间;Step d. Substituting the score value of principal component 1 into the ellipse polar coordinate expression constructed in step b, and obtaining the predicted score values of two principal components 2 with the same value and opposite sign, the obtained two principal components 2 The predicted score value constitutes the judgment interval;

步骤e.逐一检测未知样本的主成分二的得分值是否在判断区间内,从而判断未知样品与定量模型的匹配度。Step e. Checking one by one whether the score value of principal component 2 of the unknown sample is within the judgment interval, so as to judge the matching degree between the unknown sample and the quantitative model.

作为本发明的进一步改进,所述步骤e中,判断未知样品与定量模型的匹配度包括:As a further improvement of the present invention, in the step e, judging the degree of matching between the unknown sample and the quantitative model includes:

(1)、当未知样本的主成分二得分值在判断区间内时,则判断该未知样品与定量模型的匹配度高,对该未知样品进行正常预测;(1), when the principal component 2 score value of the unknown sample is within the judgment interval, it is judged that the matching degree between the unknown sample and the quantitative model is high, and the normal prediction is performed on the unknown sample;

(2)、当未知样本的主成分二得分值不在判断区间内时,则判断该未知样本与定量模型匹配度低,重新进行该未知样本的光谱采集。(2) When the principal component 2 score of the unknown sample is not within the judgment interval, it is judged that the matching degree between the unknown sample and the quantitative model is low, and the spectrum acquisition of the unknown sample is performed again.

作为本发明的进一步改进,所述匹配度判别方法还包括:As a further improvement of the present invention, the matching degree discrimination method also includes:

步骤f.经二次光谱采集后,如果同样判定该未知样本与定量模型的匹配度低,则视该未知样本为边缘样本进行正常预测;Step f. After secondary spectrum collection, if it is also determined that the matching degree between the unknown sample and the quantitative model is low, then the unknown sample is regarded as a marginal sample for normal prediction;

步骤g.进行一个时间段的未知样本预测后,检测该采集时间段的未知样本集合中定量模型和未知样本的匹配度情况,并设置阈值判断总体匹配度。Step g. After predicting the unknown samples in a time period, detect the matching degree between the quantitative model and the unknown samples in the collection of unknown samples in the collection time period, and set a threshold to judge the overall matching degree.

作为本发明的进一步改进,所述步骤f中,边缘样本为自身标定值超出训练集样本标定值变化范围的样本。As a further improvement of the present invention, in the step f, the edge samples are samples whose calibration values exceed the variation range of the calibration values of the samples in the training set.

作为本发明的进一步改进,所述步骤g中,包括:As a further improvement of the present invention, in the step g, including:

1)、当该时间段的未知样本集合总体匹配度大于阈值时,在下一个时间段沿用原定量模型对未知样本进行判别和正常预测;1) When the overall matching degree of the unknown sample set in this time period is greater than the threshold, the original quantitative model is used to distinguish and predict the unknown samples in the next time period;

2)、当该时间段的未知样本集合总体匹配度小于阈值时,进行定量模型更新。2) When the overall matching degree of the unknown sample set in the time period is less than the threshold, the quantitative model is updated.

作为本发明的进一步改进,所述步骤b包括:As a further improvement of the present invention, said step b includes:

b1)、根据多变量t检验方法计算得出每一个样本点对主成分一和主成分二的得分值经过变化后服从F分布;b1), calculated according to the multivariate t-test method, the score values of each sample point to principal component 1 and principal component 2 obey the F distribution after being changed;

b2)、设置显著水平,得到主成分一和主成分二上的得分阈值,得分阈值的计算公式为:b2), set the significant level, and obtain the score threshold on principal component 1 and principal component 2, the calculation formula of the score threshold is:

threshold=(sh·F2,n-2,α·2(n2-1)/(n(n-2)))0.5threshold=(s h F 2,n-2,α 2(n 2 -1)/(n(n-2))) 0.5 ;

其中,threshold为得分阈值;sh为第h主成分的方差,h取1或2;F2,n-2,α为F分布拒绝域边界,可通过查表得知;α为显著水平;n为样本数。Among them, threshold is the score threshold; s h is the variance of the h-th principal component, and h takes 1 or 2; F 2,n-2, α is the boundary of the rejection domain of the F distribution, which can be known by looking up the table; α is the significant level; n is the number of samples.

7.根据权利要求6所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述步骤b中,根据多变量t检验方法计算得出每一个样本点对主成分一和主成分二的得分值经过变化后服从F分布,公式为:7. according to claim 6, a kind of portable near-infrared spectrum quantitative model and unknown sample matching degree discriminant method, it is characterized in that, in described step b, according to multivariate t test method, calculate and draw each sample point to principal component The score values of 1 and principal component 2 obey the F distribution after being changed, and the formula is:

Figure BDA0004103471820000031
Figure BDA0004103471820000031

其中,thi为训练集中第i个样本对第h个主成分的得分值,h为1或2,sh为第h主成分的方差,n为样本数,m为变量数。Among them, t hi is the score value of the i-th sample in the training set for the h-th principal component, h is 1 or 2, s h is the variance of the h-th principal component, n is the number of samples, and m is the number of variables.

本发明与现有技术相比,具有以下优点及有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:

本发明提供一种便携式近红外光谱定量模型与未知样本匹配度判别方法,该方法通过判断未知样本预测过程中的与定量模型的匹配度,能够对异常样本进行辨别,提示重新采集光谱,避免人为采集因素,当样品状态,内外部环境,光谱仪状态发生变化时,未知样本集合中显示匹配度不高的样本数量比例较大,能够及时提示更新模型,保证光谱仪长时间使用的稳定性和准确性。The invention provides a portable near-infrared spectrum quantitative model and unknown sample matching degree discrimination method, the method can identify abnormal samples by judging the matching degree of the unknown sample and the quantitative model in the prediction process, prompting to re-collect the spectrum, and avoid artificial Acquisition factors, when the sample status, internal and external environment, and the status of the spectrometer change, the proportion of samples with a low matching degree in the unknown sample set is relatively large, and it can prompt to update the model in time to ensure the stability and accuracy of the spectrometer for long-term use .

附图说明Description of drawings

图1是本发明一种便携式近红外光谱定量模型与未知样本匹配度判别方法的示意图。Fig. 1 is a schematic diagram of a method for discriminating the matching degree between a portable near-infrared spectrum quantitative model and an unknown sample according to the present invention.

图2是本发明未知样本与定量模型匹配度判断的示意。Fig. 2 is a schematic diagram of judging the degree of matching between an unknown sample and a quantitative model in the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例Example

图1为一种便携式近红外光谱定量模型与未知样本匹配度判别方法,包括如下步骤:Figure 1 is a method for discriminating the matching degree between a portable near-infrared spectrum quantitative model and an unknown sample, which includes the following steps:

S101、基于训练集的光谱数据,利用偏最小二乘法PLS构建定量模型,获得PLS主成分一和主成分二的每个变量的载荷矩阵X_loadings和每个训练集样本的得分矩阵Xt_scores。S101. Based on the spectral data of the training set, use the partial least squares method PLS to construct a quantitative model, and obtain the loading matrix X_loadings of each variable of the PLS principal component 1 and principal component 2 and the score matrix Xt_scores of each training set sample.

具体的,采用偏最小二乘法PLS将可能存在相关性的总体变量转换为一组线性不相关的变量,转换的这组变量为主成分;训练集中包含每条样本的光谱和标定值(真实值),未知样本中只有光谱,通过对训练集的光谱与标定值的相关系数矩阵进行特征分解,得到主成分的载荷矩阵和得分矩阵,保留低阶主成分,忽略高阶主成分,可以实现保留相关性大的特征的同时,降低维度。本实施例中,只保留主成分一和主成分二两个主成分。Specifically, the partial least squares method PLS is used to convert the overall variables that may be correlated into a set of linearly uncorrelated variables, and the converted set of variables is the principal component; the training set contains the spectra and calibration values of each sample (true value ), there is only spectrum in the unknown sample, by performing eigendecomposition on the correlation coefficient matrix of the spectrum of the training set and the calibration value, the loading matrix and score matrix of the principal components are obtained, the low-order principal components are retained, and the high-order principal components are ignored. While reducing the dimensionality of features with high correlation. In this embodiment, only two principal components, principal component one and principal component two, are kept.

在构建定量模型过程中需要同时处理多条样本,在处理过程中表现的形式为矩阵,训练集内多条样本的光谱集合作为光谱矩阵,光谱矩阵经过PLS之后可以通过下式表示:In the process of building a quantitative model, multiple samples need to be processed at the same time. The form of the processing is a matrix. The spectral collection of multiple samples in the training set is used as a spectral matrix. After PLS, the spectral matrix can be expressed by the following formula:

Figure BDA0004103471820000051
Figure BDA0004103471820000051

其中,X为n×m的训练集光谱矩阵,n为样本数,m为变量数,Xtscores是n×p的得分矩阵,p为主成分数,由于这里只保留主成分一和主成分二,所以P为2,Xloadings是m×p的载荷矩阵,E为残差阵,T为转置。Among them, X is the n×m training set spectral matrix, n is the number of samples, m is the number of variables, Xt scores is the score matrix of n×p, p is the number of main components, because only the main component 1 and the main component 2 are reserved here , so P is 2, X loadings is the m×p loading matrix, E is the residual matrix, and T is the transpose.

S102、利用多变量t检验方法分别计算出主成分一和主成分二的得分阈值,将两个阈值作为长半径和短半径带入椭圆极坐标公式,以构建椭圆极坐标表达式;S102. Using the multivariate t-test method to calculate the score thresholds of principal component 1 and principal component 2 respectively, and taking the two thresholds as the long radius and short radius into the ellipse polar coordinate formula to construct the ellipse polar coordinate expression;

具体步骤包括:Specific steps include:

b1)、根据多变量t检验方法计算得出每一个样本点对主成分一和主成分二的得分值经过变化后服从F分布,即:b1), according to the multivariate t-test method, it is calculated that the scores of each sample point for the first and second principal components obey the F distribution after being changed, that is:

Figure BDA0004103471820000052
Figure BDA0004103471820000052

其中,thi为训练集中第i个样本对第h个主成分的得分值,h为1或2,sh为第h主成分的方差。Among them, t hi is the score value of the i-th sample in the training set for the h-th principal component, h is 1 or 2, and s h is the variance of the h-th principal component.

b2)、将显著水平设置为0.05,可以得到主成分一和主成分二的得分值阈值threshold,即:b2), set the significance level to 0.05, the score threshold threshold of principal component 1 and principal component 2 can be obtained, namely:

threshold=(sh·F2,n-2,0.05·2(n2-1)/(n(n-2)))0.5threshold=(s h F 2,n-2,0.05 2(n 2 -1)/(n(n-2))) 0.5 ;

其中,threshold为得分阈值;sh为第h主成分的方差,h取1或2;F2,n-2,α为F分布拒绝域边界,可通过查表得知;α为显著水平;n为样本数。Among them, threshold is the score threshold; s h is the variance of the h-th principal component, and h takes 1 or 2; F 2,n-2, α is the boundary of the rejection domain of the F distribution, which can be known by looking up the table; α is the significant level; n is the number of samples.

b3)、得到的得分阈值threshold包含主成分一和主成分二的得分阈值的矩阵,将其代入椭圆极坐标矩阵,以构建椭圆极坐标表达式。b3) The obtained score threshold threshold includes a matrix of score thresholds of principal component 1 and principal component 2, which is substituted into the ellipse polar coordinate matrix to construct an ellipse polar coordinate expression.

将其代入椭圆极坐标矩阵中:Substitute this into the elliptic polar coordinate matrix:

x=threshold(1)×cos(θ);x=threshold(1)×cos(θ);

y=threshold(2)×sin(θ);y=threshold(2)×sin(θ);

其中,θ为角度,取值范围为[0°,360°];threshold(1)为主成分一的得分值阈值;threshold(2)为主成分二的得分值阈值;x和y表示该公式的自变量和因变量,具体含义为主成分一的得分值和主成分二的得分值。Among them, θ is the angle, and the value range is [0°, 360°]; threshold(1) is the score threshold of the main component 1; threshold(2) is the score threshold of the main component 2; x and y represent The independent variable and dependent variable of the formula specifically mean the score value of principal component 1 and the score value of principal component 2.

将未知样本的主成分一作为x带入公式中,会得到两个大小相同符号相反的y值,可作为一个范围阈值,当未知样本的真实主成分二得分值超过这个范围阈值时,则判定为异常。Bring the principal component 1 of the unknown sample into the formula as x, and two y values with the same size and opposite sign will be obtained, which can be used as a range threshold. When the real principal component 2 score of the unknown sample exceeds this range threshold, then judged to be abnormal.

S103、利用之前获得的主成分的载荷矩阵X_loadings,将采集到的未知样本光谱Xp映射到主成分一和主成分二中,获得该样本对应的主成分一得分值Xp_scores1和主成分二得分值Xp_scores2;S103. Using the previously obtained loading matrix X_loadings of the principal component, map the collected unknown sample spectrum Xp to the first principal component and the second principal component, and obtain the first principal component score Xp_scores1 and the second principal component score corresponding to the sample value Xp_scores2;

其中映射方式为:The mapping method is:

Xpscores=Xp×XloadingsXp scores = Xp×X loadings ;

S104、将主成分一的得分值Xp_scores1代入椭圆极坐标表达式中,获得两个数值相同符号相反的主成分二的预测得分值,由这两个值构成了判断区间可以表示为:[-Xk_scores,Xk_scores];S104. Substituting the score value Xp_scores1 of principal component 1 into the ellipse polar coordinate expression to obtain two predicted score values of principal component 2 with the same numerical values and opposite signs, and the judgment interval formed by these two values can be expressed as: [ -Xk_scores,Xk_scores];

S105、逐一检测未知样本的真实主成分二的得分值是否在判断区间内,从而判断未知样品与定量模型的匹配度。S105. Detect one by one whether the scores of the real principal components 2 of the unknown samples are within the judgment interval, so as to judge the matching degree between the unknown samples and the quantitative model.

具体的:specific:

(1)、当单一未知样本的Xp_scores2∈[-Xk_scores,Xk_scores]时,表明该样本与定量模型的匹配度高,可以直接进行预测。(1) When Xp_scores2∈[-Xk_scores,Xk_scores] of a single unknown sample indicates that the sample has a high degree of matching with the quantitative model and can be directly predicted.

(2)当单一未知样本的

Figure BDA0004103471820000071
时,表明该样本与定量模型的匹配度低,为了排除人为操作因素的干扰,重新采集光谱。(2) When a single unknown sample
Figure BDA0004103471820000071
When , it indicates that the matching degree between the sample and the quantitative model is low. In order to eliminate the interference of human operation factors, the spectrum was collected again.

S106、经过二次光谱采集后,如果同样判断该样本与定量模型的匹配度低,说明可以排除人工操作,实验环境等外部因素影响,则该未知样本作为边缘样本进行正常预测操作。S106. After secondary spectrum collection, if it is also judged that the matching degree between the sample and the quantitative model is low, it means that the influence of external factors such as manual operation and experimental environment can be excluded, and the unknown sample is used as a marginal sample for normal prediction operation.

本实施例中,边缘样本为自身标定值超出训练集样本标定值变化范围的样本。In this embodiment, an edge sample is a sample whose calibration value exceeds the variation range of the calibration value of the samples in the training set.

S107、进行一个时间段的未知样本预测后,检测该时间段未知样本集合中定量模型和样本的匹配度情况,并设置阈值判断总体匹配度,时间段可以根据预测样本的频率设置为一个月,总体匹配度阈值设置为80%。S107. After predicting the unknown samples for a period of time, detect the matching degree between the quantitative model and the sample in the unknown sample set for the period of time, and set a threshold to judge the overall matching degree. The time period can be set to one month according to the frequency of the predicted samples. The overall match threshold is set at 80%.

S108、利用阈值判断未知样本集中总体匹配度的情况,当总体匹配度大于阈值时,在下个时间段沿用原模型进行判断预测,当总体匹配度小于阈值时,进行模型更新,加强模型与未知样本的适配性。S108. Use the threshold to determine the overall matching degree of the unknown sample set. When the overall matching degree is greater than the threshold, continue to use the original model for judgment and prediction in the next time period. When the overall matching degree is less than the threshold, update the model to strengthen the relationship between the model and the unknown samples. adaptability.

图2为判断未知样本与定量模型匹配度的示意图,图中每个圆点的横坐标和纵坐标表示每个未知样本对应的主成分一和主成分二得分值,椭圆形是由得分值阈值构成的极坐标表达式画出的,图中在椭圆形内部的圆点说明对应的未知样本与模型的匹配度较高,在椭圆形外部的圆点说明对应的未知样本与模型的匹配度较低。Figure 2 is a schematic diagram of judging the matching degree between an unknown sample and a quantitative model. The abscissa and ordinate of each dot in the figure represent the score values of principal component 1 and principal component 2 corresponding to each unknown sample, and the ellipse is represented by the score It is drawn by the polar coordinate expression composed of value thresholds. The dots inside the ellipse in the figure indicate that the corresponding unknown samples have a high degree of matching with the model, and the dots outside the ellipse indicate that the corresponding unknown samples match the model. lower degree.

本发明的方法通过判断未知样本预测过程中与定量模型的匹配度,能够对异常样本进行辨别,提示重新采集光谱,避免人为采集因素,当样品状态,内外部环境,光谱仪状态发生变化时,未知样本集合中显示匹配度不高的样本数量比例较大,能够及时提示更新模型,保证光谱仪长时间使用的稳定性和准确性。The method of the present invention can distinguish abnormal samples by judging the matching degree between the unknown sample prediction process and the quantitative model, prompting to re-collect the spectrum, and avoiding artificial collection factors. When the sample state, internal and external environments, and the spectrometer state change, the unknown In the sample set, the proportion of samples with low matching degree is large, which can prompt to update the model in time to ensure the stability and accuracy of the spectrometer for long-term use.

另外,本发明还公开一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述一种便携式近红外光谱定量模型与未知样本匹配度判别方法的步骤。In addition, the present invention also discloses a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned method for discriminating the matching degree between a portable near-infrared spectrum quantitative model and an unknown sample are realized.

尽管这里参照本发明的解释性实施例对本发明进行了描述,上述实施例仅为本发明较佳的实施方式,本发明的实施方式并不受上述实施例的限制,应该理解,本领域技术人员可以设计出很多其他的修改和实施方式,这些修改和实施方式将落在本申请公开的原则范围和精神之内。Although the present invention has been described here with reference to the illustrative examples of the present invention, the above-mentioned examples are only preferred implementations of the present invention, and the implementation of the present invention is not limited by the above-mentioned examples. It should be understood that those skilled in the art Many other modifications and embodiments can be devised which will fall within the scope and spirit of the principles disclosed in this application.

Claims (7)

1.一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,包括如下步骤:1. A portable near-infrared spectrum quantitative model and unknown sample matching degree discrimination method, is characterized in that, comprises the steps: 步骤a.基于训练集的光谱数据,利用偏最小二乘法PLS构建定量模型,获得PLS主成分一和主成分二的每个变量的载荷矩阵X_loadings和每个训练集样本的得分矩阵Xt_scores;Step a. Based on the spectral data of the training set, utilize the partial least squares method PLS to construct a quantitative model, obtain the loading matrix X_loadings of each variable of the PLS principal component one and the principal component two and the score matrix Xt_scores of each training set sample; 步骤b.利用多变量t检验方法分别计算出主成分一和主成分二的得分阈值,将两个阈值作为长半径和短半径带入椭圆极坐标公式,构建椭圆极坐标表达式;Step b. Using the multivariate t-test method to calculate the score thresholds of principal component 1 and principal component 2 respectively, and bring the two thresholds into the ellipse polar coordinate formula as the long radius and short radius to construct the ellipse polar coordinate expression; 步骤c.利用步骤a所获得的载荷矩阵X_loadings,将采集到的未知样本光谱映射到主成分一和主成分二中,获得该未知样本对应的主成分一和主成分二的得分值;Step c. Using the loading matrix X_loadings obtained in step a, map the collected unknown sample spectrum to principal component 1 and principal component 2, and obtain the score values of principal component 1 and principal component 2 corresponding to the unknown sample; 步骤d.将主成分一的得分值代入步骤b所构建的椭圆极坐标表达式中,获得两个数值相同符号相反的主成分二的预测得分值,由所获得的两个主成分二的预测得分值构成判断区间;Step d. Substituting the score value of principal component 1 into the ellipse polar coordinate expression constructed in step b, and obtaining the predicted score values of two principal components 2 with the same value and opposite sign, the obtained two principal components 2 The predicted score value constitutes the judgment interval; 步骤e.逐一检测未知样本的主成分二的得分值是否在判断区间内,从而判断未知样品与定量模型的匹配度。Step e. Checking one by one whether the score value of principal component 2 of the unknown sample is within the judgment interval, so as to judge the matching degree between the unknown sample and the quantitative model. 2.根据权利要求1所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述步骤e中,判断未知样品与定量模型的匹配度包括:2. according to claim 1, a kind of portable near-infrared spectrum quantitative model and unknown sample matching method are characterized in that, in the step e, judging the matching degree of unknown sample and quantitative model comprises: (1)、当未知样本的主成分二得分值在判断区间内时,则判断该未知样品与定量模型的匹配度高,对该未知样品进行正常预测;(1), when the principal component 2 score value of the unknown sample is within the judgment interval, it is judged that the matching degree between the unknown sample and the quantitative model is high, and the normal prediction is performed on the unknown sample; (2)、当未知样本的主成分二得分值不在判断区间内时,则判断该未知样本与定量模型匹配度低,重新进行该未知样本的光谱采集。(2) When the principal component 2 score of the unknown sample is not within the judgment interval, it is judged that the matching degree between the unknown sample and the quantitative model is low, and the spectrum acquisition of the unknown sample is performed again. 3.根据权利要求2所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述匹配度判别方法还包括:3. a kind of portable near-infrared spectrum quantitative model and unknown sample matching degree discrimination method according to claim 2, it is characterized in that, described matching degree discrimination method also comprises: 步骤f.经二次光谱采集后,如果同样判定该未知样本与定量模型的匹配度低,则视该未知样本为边缘样本进行正常预测;Step f. After secondary spectrum collection, if it is also determined that the matching degree between the unknown sample and the quantitative model is low, then the unknown sample is regarded as a marginal sample for normal prediction; 步骤g.进行一个时间段的未知样本预测后,检测该采集时间段的未知样本集合中定量模型和未知样本的匹配度情况,并设置阈值判断总体匹配度。Step g. After predicting the unknown samples in a time period, detect the matching degree between the quantitative model and the unknown samples in the collection of unknown samples in the collection time period, and set a threshold to judge the overall matching degree. 4.根据权利要求3所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述步骤f中,边缘样本为自身标定值超出训练集样本标定值变化范围的样本。4. according to claim 3, a kind of portable near-infrared spectrum quantitative model and unknown sample matching degree discriminant method, it is characterized in that, in described step f, edge sample is the sample whose calibration value exceeds the range of variation of training set sample calibration value for itself . 5.根据权利要求3所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述步骤g中,包括:5. a kind of portable near-infrared spectrum quantitative model and unknown sample matching degree discriminant method according to claim 3, it is characterized in that, in described step g, comprise: 1)、当该时间段的未知样本集合总体匹配度大于阈值时,在下一个时间段沿用原定量模型对未知样本进行判别和正常预测;1) When the overall matching degree of the unknown sample set in this time period is greater than the threshold, the original quantitative model is used to distinguish and predict the unknown samples in the next time period; 2)、当该时间段的未知样本集合总体匹配度小于阈值时,进行定量模型更新。2) When the overall matching degree of the unknown sample set in the time period is less than the threshold, the quantitative model is updated. 6.根据权利要求1-5任一项所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述步骤b包括:6. according to any one of claim 1-5 described a kind of portable near-infrared spectrum quantitative model and unknown sample matching degree discriminant method, it is characterized in that, described step b comprises: b1)、根据多变量t检验方法计算得出每一个样本点对主成分一和主成分二的得分值经过变化后服从F分布;b1), calculated according to the multivariate t-test method, the score values of each sample point to principal component 1 and principal component 2 obey the F distribution after being changed; b2)、设置显著水平,得到主成分一和主成分二上的得分阈值,得分阈值的计算公式为:b2), set the significant level, and obtain the score threshold on principal component 1 and principal component 2, the calculation formula of the score threshold is: threshold=(sh·F2,n-2,α·2(n2-1)/(n(n-2)))0.5threshold=(s h F 2, n-2, α 2(n 2 -1)/(n(n-2))) 0.5 ; 其中,threshold为得分阈值;sh为第h主成分的方差,h取1或2;F2,n-2,α为F分布拒绝域边界,可通过查表得知;α为显著水平;n为样本数。Among them, threshold is the score threshold; s h is the variance of the h-th principal component, and h takes 1 or 2; F 2, n-2, α is the boundary of the rejection domain of the F distribution, which can be known by looking up the table; α is the significant level; n is the number of samples. 7.根据权利要求6所述一种便携式近红外光谱定量模型与未知样本匹配度判别方法,其特征在于,所述步骤b中,根据多变量t检验方法计算得出每一个样本点对主成分一和主成分二的得分值经过变化后服从F分布,公式为:7. according to claim 6, a kind of portable near-infrared spectrum quantitative model and unknown sample matching degree discriminant method, it is characterized in that, in described step b, according to multivariate t test method, calculate and draw each sample point to principal component The score values of 1 and principal component 2 obey the F distribution after being changed, and the formula is:
Figure FDA0004103471810000031
Figure FDA0004103471810000031
其中,thi为训练集中第i个样本对第h个主成分的得分值,h为1或2,sh为第h主成分的方差,n为样本数,m为变量数。Among them, t hi is the score value of the i-th sample in the training set for the h-th principal component, h is 1 or 2, s h is the variance of the h-th principal component, n is the number of samples, and m is the number of variables.
CN202310185204.6A 2023-03-01 2023-03-01 Portable near infrared spectrum quantitative model and unknown sample matching degree judging method Pending CN116166973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310185204.6A CN116166973A (en) 2023-03-01 2023-03-01 Portable near infrared spectrum quantitative model and unknown sample matching degree judging method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310185204.6A CN116166973A (en) 2023-03-01 2023-03-01 Portable near infrared spectrum quantitative model and unknown sample matching degree judging method

Publications (1)

Publication Number Publication Date
CN116166973A true CN116166973A (en) 2023-05-26

Family

ID=86416222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310185204.6A Pending CN116166973A (en) 2023-03-01 2023-03-01 Portable near infrared spectrum quantitative model and unknown sample matching degree judging method

Country Status (1)

Country Link
CN (1) CN116166973A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050037515A1 (en) * 2001-04-23 2005-02-17 Nicholson Jeremy Kirk Methods for analysis of spectral data and their applications osteoporosis
WO2008061296A1 (en) * 2006-11-20 2008-05-29 Queensland University Of Technology Testing device and method for use on soft tissue
CN110987866A (en) * 2019-12-19 2020-04-10 汉谷云智(武汉)科技有限公司 Gasoline property evaluation method and device
US20200193597A1 (en) * 2018-12-14 2020-06-18 Spectral Md, Inc. Machine learning systems and methods for assessment, healing prediction, and treatment of wounds

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050037515A1 (en) * 2001-04-23 2005-02-17 Nicholson Jeremy Kirk Methods for analysis of spectral data and their applications osteoporosis
WO2008061296A1 (en) * 2006-11-20 2008-05-29 Queensland University Of Technology Testing device and method for use on soft tissue
US20200193597A1 (en) * 2018-12-14 2020-06-18 Spectral Md, Inc. Machine learning systems and methods for assessment, healing prediction, and treatment of wounds
CN110987866A (en) * 2019-12-19 2020-04-10 汉谷云智(武汉)科技有限公司 Gasoline property evaluation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIEL PELLICCIA: ""Outliers detection with PLS regression for NIR spectroscopy in Python"", 《NIRPY RESEARCH》, 22 September 2018 (2018-09-22) *
刘晓晔: ""公猪异味快速在线检测技术研究"", 《中国优秀博士学位论文全文数据库 信息科技辑》, 15 July 2019 (2019-07-15) *

Similar Documents

Publication Publication Date Title
CN105486655B (en) The soil organism rapid detection method of model is intelligently identified based on infrared spectroscopy
CN103278616B (en) A kind of multiple-factor method of soil corrosivity Fast Evaluation
CN105445218B (en) The method for building up of middle infrared spectrum rapeseed protein content detection adaptive model
CN114113471A (en) Method and system for detecting food freshness of artificial nose refrigerator based on machine learning
Wang et al. Big data driven outlier detection for soybean straw near infrared spectroscopy
CN105224961A (en) A kind of diffuse reflectance infrared spectroscopy of high resolution extracts and matching process
CN101915767A (en) A systematic identification method for egg cracks
CN114970675A (en) System and method for food freshness detection in artificial nose refrigerator based on feature selection
CN118334650B (en) Modeling method, system and equipment for grain moisture detection
CN115330664A (en) Image recognition-based surrounding rock weathering degree full-automatic recognition method and device
CN113514410A (en) A real-time quantitative monitoring method based on canopy hyperspectral technology for vertical distribution of nitrogen use efficiency in summer maize throughout the growth period
CN110874576B (en) Pedestrian re-identification method based on typical correlation analysis fusion characteristics
WO2023051275A1 (en) Svm-based cold flow test detection method and system during diesel engine assembly
CN119851977B (en) A multi-dimensional student physical health monitoring method and system
CN111896497A (en) Spectral data correction method based on predicted value
CN116166973A (en) Portable near infrared spectrum quantitative model and unknown sample matching degree judging method
CN115565028A (en) Transformer oil aging detection method, detection system, electronic equipment and readable storage medium
CN112329791B (en) Automatic extraction method for hyperspectral image water area
CN115015131B (en) Sample screening method for infrared spectroscopy training set
CN113762759A (en) A multi-index system evaluation method suitable for food testing
CN111220565B (en) CPLS-based infrared spectrum measuring instrument calibration migration method
CN111121943A (en) Zero point fault detection method and device, computer equipment and readable storage medium
CN119534370A (en) Coal quality detection method, electronic equipment and storage medium
CN108982390A (en) A kind of water body pesticide residue detection method based on atomic absorption light spectrum information
CN110118749B (en) A method for detecting pesticide residues in fruits and vegetables based on near-infrared spectroscopy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination