HK1232956B - Identification using spectroscopy - Google Patents
Identification using spectroscopy Download PDFInfo
- Publication number
- HK1232956B HK1232956B HK17106719.8A HK17106719A HK1232956B HK 1232956 B HK1232956 B HK 1232956B HK 17106719 A HK17106719 A HK 17106719A HK 1232956 B HK1232956 B HK 1232956B
- Authority
- HK
- Hong Kong
- Prior art keywords
- classification model
- classification
- sample
- control device
- categories
- Prior art date
Links
Description
背景background
原材料识别可用于药物产品的质量控制。例如,可对医学化合物执行原材料识别以确定医学化合物的组成成分是否相应于与医学化合物相关的包装标签。光谱学可利用相对于其它化学技术的减少的制备和数据获取时间来便于非破坏性原材料识别。Raw material identification can be used for quality control of pharmaceutical products. For example, raw material identification can be performed on a medicinal compound to determine whether the composition of the medicinal compound corresponds to the packaging label associated with the medicinal compound. Spectroscopy can facilitate non-destructive raw material identification by taking advantage of reduced preparation and data acquisition time compared to other chemical techniques.
概述Overview
根据一些可能的实现方式,设备可包括一个或多个处理器。一个或多个处理器可接收识别未知样品的光谱测量的结果的信息。一个或多个处理器可基于光谱测量的结果和全局分类模型来执行未知样品的第一分类。全局分类模型可利用支持向量机(SVM)分类器技术。全局分类模型可包括类别的全局组。一个或多个处理器可基于第一分类来生成局部分类模型。局部分类模型可利用SVM分类器技术。局部分类模型可包括类别的全局组的类别的子集。一个或多个处理器可基于光谱测量的结果和局部分类模型来执行未知样品的第二分类。一个或多个处理器可基于执行第二分类来提供识别类别的子集中的与未知样品相关的类别的信息。According to some possible implementations, the device may include one or more processors. The one or more processors may receive information identifying the results of spectral measurements of unknown samples. The one or more processors may perform a first classification of the unknown sample based on the results of the spectral measurements and a global classification model. The global classification model may utilize support vector machine (SVM) classifier technology. The global classification model may include a global group of categories. The one or more processors may generate a local classification model based on the first classification. The local classification model may utilize SVM classifier technology. The local classification model may include a subset of categories of the global group of categories. The one or more processors may perform a second classification of the unknown sample based on the results of the spectral measurements and the local classification model. The one or more processors may provide information identifying categories associated with the unknown sample in a subset of categories based on performing the second classification.
根据一些可能的实现方式,计算机可读介质可存储指令,指令当由一个或多个处理器执行时可使一个或多个处理器接收识别未知组的一组光谱测量的结果的信息。未知组可包括一组未知样品。一个或多个指令当由一个或多个处理器执行时可使一个或多个处理器基于该组光谱测量的结果和全局分类模型来执行该组未知样品的第一分类。全局分类模型可利用支持向量机(SVM)线性分类器技术。一个或多个指令当由一个或多个处理器执行时可使一个或多个处理器基于第一分类来生成对于该组未知样品的一组局部分类模型。该组局部分类模型可利用SVM线性分类器技术。一个或多个指令当由一个或多个处理器执行时可使一个或多个处理器基于该组光谱测量的结果和该组局部分类模型来执行该组未知样品的第二分类。一个或多个指令当由一个或多个处理器执行时可使一个或多个处理器基于执行第二分类来提供识别该组未知样品的分类的信息。According to some possible implementations, a computer-readable medium may store instructions that, when executed by one or more processors, may cause the one or more processors to receive information identifying the results of a set of spectral measurements of an unknown group. The unknown group may include a set of unknown samples. The one or more instructions, when executed by one or more processors, may cause the one or more processors to perform a first classification of the set of unknown samples based on the results of the set of spectral measurements and a global classification model. The global classification model may utilize support vector machine (SVM) linear classifier technology. The one or more instructions, when executed by one or more processors, may cause the one or more processors to generate a set of local classification models for the set of unknown samples based on the first classification. The set of local classification models may utilize SVM linear classifier technology. The one or more instructions, when executed by one or more processors, may cause the one or more processors to perform a second classification of the set of unknown samples based on the results of the set of spectral measurements and the set of local classification models. The one or more instructions, when executed by one or more processors, may cause the one or more processors to provide information identifying the classification of the set of unknown samples based on performing the second classification.
根据一些可能的实现方式,方法可包括由设备接收识别由第一分光计执行的未知样品的光谱测量的结果的信息。该方法可包括基于光谱测量的结果和全局分类模型由设备来执行未知样品的第一分类。全局分类模型可通过利用支持向量机(SVM)分类器技术和由第二分光计执行的一组光谱测量来生成。该方法可包括基于第一分类由设备来生成局部分类模型。局部分类模型可利用SVM分类器技术。局部分类模型可包括全局分类模型的一组类别中的类别的子集。该方法可包括基于光谱测量的结果和局部分类模型由设备来执行未知样品的第二分类。该方法可包括基于执行第二分类由设备来提供识别类别的子集中的与未知样品相关的类别的信息。According to some possible implementations, the method may include receiving, by a device, information identifying a result of a spectral measurement of an unknown sample performed by a first spectrometer. The method may include performing, by the device, a first classification of the unknown sample based on the result of the spectral measurement and a global classification model. The global classification model may be generated by utilizing support vector machine (SVM) classifier technology and a set of spectral measurements performed by a second spectrometer. The method may include generating, by the device, a local classification model based on the first classification. The local classification model may utilize SVM classifier technology. The local classification model may include a subset of categories from a set of categories of the global classification model. The method may include performing, by the device, a second classification of the unknown sample based on the result of the spectral measurement and the local classification model. The method may include providing, by the device, information identifying a category associated with the unknown sample from a subset of categories based on performing the second classification.
(1)本申请涉及一种设备,包括:(1) This application relates to a device comprising:
一个或多个处理器,其用于:One or more processors configured to:
接收识别未知样品的光谱测量的结果的信息;receiving information identifying results of a spectral measurement of an unknown sample;
基于光谱测量的结果和全局分类模型来执行未知样品的第一分类,Perform a first classification of unknown samples based on the results of spectral measurements and a global classification model,
基于第一分类来生成局部分类模型,Generate a local classification model based on the first classification,
基于光谱测量的结果和局部分类模型来执行未知样品的第二分类;以及performing a second classification of the unknown sample based on the results of the spectral measurement and the local classification model; and
基于执行第二分类来提供识别与未知样品相关的类别的信息。Information identifying a class associated with the unknown sample is provided based on performing the second classification.
(2)如(1)所述的设备,其中一个或多个处理器还用于:(2) The device as described in (1), wherein the one or more processors are further configured to:
确定与全局分类模型的一组类别相关的一组相应的概率,determining a corresponding set of probabilities associated with a set of categories for a global classification model,
一组相应的概率中的特定概率指示未知样品与一组类别中的特定类别相关的可能性,A particular probability in a set of corresponding probabilities indicates the likelihood that the unknown sample is associated with a particular class in a set of classes,
基于一组相应的概率来选择一组类别的子集;以及selecting a subset of a set of classes based on a set of corresponding probabilities; and
其中一个或多个处理器在生成局部分类模型时用于:One or more of the processors are used to generate a local classification model:
基于一组类别的子集来生成局部分类模型。Generate a local classification model based on a subset of a set of categories.
(3)如(1)所述的设备,其中一个或多个处理器还用于:(3) The device as described in (1), wherein the one or more processors are further configured to:
执行自动按比例缩放预处理过程;以及performing an automatic scaling preprocessing process; and
基于执行自动按比例缩放预处理过程来执行第一分类或第二分类中的至少一个。At least one of the first classification or the second classification is performed based on performing an automatic scaling pre-processing process.
(4)如(1)所述的设备,其中一个或多个处理器还用于:(4) The device as described in (1), wherein the one or more processors are further configured to:
接收来自与第一分光计相关的控制设备的全局分类模型,receiving a global classification model from a control device associated with the first spectrometer,
全局分类模型由控制设备使用由第一分光计执行的一个或多个光谱测量来生成;The global classification model is generated by the control device using one or more spectral measurements performed by the first spectrometer;
使光谱测量将由第二分光计执行,so that the spectral measurement will be performed by a second spectrometer,
第二分光计不同于第一分光计;以及The second spectrometer is different from the first spectrometer; and
其中一个或多个处理器在执行未知样品的第一分类时用于:The one or more processors, when performing a first classification of an unknown sample, are configured to:
基于使用由第一分光计执行的一个或多个光谱测量生成的全局分类模型并基于由第二分光计执行的光谱测量的结果来执行未知样品的第一分类。A first classification of the unknown sample is performed based on a global classification model generated using one or more spectral measurements performed by the first spectrometer and based on results of spectral measurements performed by the second spectrometer.
(5)如(1)所述的设备,其中全局分类模型的一组类别相应于一组化合物,且类别被包括在该一组类别中;以及(5) The apparatus of (1), wherein the set of categories of the global classification model corresponds to a set of compounds, and the category is included in the set of categories; and
其中一个或多个处理器在提供识别类别的信息时用于:One or more of these processors, when providing the identified categories of information, is used to:
提供识别一组化合物中的相应于类别的化合物的信息。Provides information identifying compounds within a group of compounds that correspond to a class.
(6)如(1)所述的设备,其中一个或多个处理器在执行第二分类时用于:(6) The apparatus of (1), wherein the one or more processors, when performing the second classification, are configured to:
确定与未知样品相关的光谱与类别相关,Determine the spectra associated with the unknown sample and the class associated with it,
光谱由执行光谱测量的结果识别;以及The spectrum is identified as a result of performing a spectral measurement; and
其中一个或多个处理器在提供识别类别的信息时用于:One or more of these processors, when providing the identified categories of information, is used to:
基于确定与未知样品相关的光谱与类别相关来提供识别类别的信息。Information identifying the class is provided based on determining that the spectrum associated with the unknown sample is associated with the class.
(7)如(1)所述的设备,其中支持向量机(SVM)分类器技术用于生成全局分类模型或局部分类模型中的至少一个;以及(7) The apparatus of (1), wherein a support vector machine (SVM) classifier technique is used to generate at least one of the global classification model or the local classification model; and
其中SVM分类器技术与下列项中的至少一个相关:Wherein the SVM classifier technique is associated with at least one of the following:
径向基函数类型的核函数,Kernel functions of the radial basis function type,
线性函数类型的核函数,Kernel function of linear function type,
S型函数类型的核函数,S-type kernel function,
多项式函数类型的核函数,或a kernel function of polynomial type, or
指数函数类型的核函数。Exponential kernel function.
(8)如(1)所述的设备,其中一个或多个处理器在执行第二分类时用于:(8) The apparatus of (1), wherein the one or more processors, when performing the second classification, are configured to:
基于下列项中的至少一个将未知样品分配到类别:Assign unknown samples to categories based on at least one of the following:
概率值,或probability value, or
决策值。Decision value.
(9)本申请提供了一种存储指令的计算机可读介质,该指令包括:(9) The present application provides a computer-readable medium storing instructions, the instructions comprising:
当由一个或多个处理器执行时可使一个或多个处理器执行下列操作的一个或多个指令:One or more instructions that, when executed by one or more processors, cause the one or more processors to:
接收识别未知组的一组光谱测量的结果的信息,receiving information identifying results of a set of spectral measurements of an unknown group,
该未知组包括多个未知样品;The unknown group includes a plurality of unknown samples;
基于一组光谱测量的结果和全局分类模型来执行多个未知样品的第一分类,performing a first classification of a plurality of unknown samples based on the results of a set of spectral measurements and a global classification model,
该全局分类模型利用支持向量机(SVM)线性分类器技术,The global classification model uses support vector machine (SVM) linear classifier technology.
基于第一分类来生成用于多个未知样品的一组局部分类模型,generating a set of local classification models for a plurality of unknown samples based on the first classification,
该一组局部分类模型利用SVM线性分类器技术;The set of local classification models utilizes SVM linear classifier technology;
基于所述一组光谱测量的所述结果和所述一组局部分类模型来执行所述多个未知样品的第二分类;以及performing a second classification of the plurality of unknown samples based on the results of the set of spectral measurements and the set of local classification models; and
基于执行所述第二分类来提供识别所述多个未知样品的分类的信息。Information identifying classifications of the plurality of unknown samples is provided based on performing the second classification.
(10)如(9)所述的计算机可读介质,其中全局分类模型基于由第一分光计执行的一个或多个光谱测量来生成;以及(10) The computer-readable medium of (9), wherein the global classification model is generated based on one or more spectral measurements performed by the first spectrometer; and
其中使一个或多个处理器接收识别一组光谱测量的结果的信息的一个或多个指令使一个或多个处理器:Wherein the one or more instructions causing the one or more processors to receive information identifying results of a set of spectral measurements cause the one or more processors to:
从第二分光计接收识别一组光谱测量的结果的信息,receiving information identifying results of a set of spectral measurements from a second spectrometer,
第二分光计不同于第一分光计;以及The second spectrometer is different from the first spectrometer; and
其中使一个或多个处理器执行第一分类的一个或多个指令使一个或多个处理器:wherein causing the one or more processors to execute the one or more instructions of the first classification causes the one or more processors to:
使用从第二分光计接收的一组光谱测量的结果和基于由第一分光计执行的一个或多个光谱测量而生成的全局分类模型来执行第一分类。The first classification is performed using results of a set of spectral measurements received from the second spectrometer and a global classification model generated based on one or more spectral measurements performed by the first spectrometer.
(11)如(9)所述的计算机可读介质,其中使一个或多个处理器接收识别一组光谱测量的结果的信息的一个或多个指令使一个或多个处理器:(11) The computer-readable medium of (9), wherein the one or more instructions that cause the one or more processors to receive information identifying results of a set of spectral measurements cause the one or more processors to:
接收相应于多个未知样品的多个光谱;以及receiving a plurality of spectra corresponding to a plurality of unknown samples; and
其中使一个或多个处理器执行第一分类的一个或多个指令使一个或多个处理器:wherein causing the one or more processors to execute the one or more instructions of the first classification causes the one or more processors to:
将多个光谱分配到全局分类模型的一个或多个类别,Assign multiple spectra to one or more classes of a global classification model,
全局分类模型的一个或多个类别相应于一种或多种化合物。One or more classes of the global classification model correspond to one or more compounds.
12.如权利要求11所述的计算机可读介质,其中所述一个或多个指令当由所述一个或多个处理器执行时还使所述一个或多个处理器:12. The computer-readable medium of claim 11 , wherein the one or more instructions, when executed by the one or more processors, further cause the one or more processors to:
确定用于一个或多个类别的一个或多个置信度度量和多个光谱的特定光谱,determining one or more confidence measures for the one or more classes and a particular spectrum of the plurality of spectra,
一个或多个置信度度量中的一个置信度度量指示特定光谱与一个或多个类别中的相应于该置信度度量的特性类别相关的可能性;A confidence metric of the one or more confidence metrics indicates a likelihood that a particular spectrum is associated with a characteristic class of the one or more classes corresponding to the confidence metric;
基于一个或多个置信度度量将特定光谱分配到特定类别;以及assigning a particular spectrum to a particular class based on one or more confidence metrics; and
其中使一个或多个处理器生成一组局部分类模型的一个或多个指令使一个或多个处理器:wherein the one or more instructions causing the one or more processors to generate a set of local classification models causes the one or more processors to:
基于一个或多个置信度度量来选择一个或多个类别的子集;以及selecting a subset of the one or more categories based on the one or more confidence metrics; and
基于一个或多个类别的子集来生成一组局部分类模型的特定局部分类模型。A specific local classification model is generated based on a subset of one or more categories to form a set of local classification models.
(13)如(9)所述的计算机可读介质,其中使一个或多个处理器接收识别一组光谱测量的结果的信息的一个或多个指令使一个或多个处理器:(13) The computer-readable medium of (9), wherein the one or more instructions that cause the one or more processors to receive information identifying results of a set of spectral measurements cause the one or more processors to:
接收相应于多个未知样品的多个光谱;以及receiving a plurality of spectra corresponding to a plurality of unknown samples; and
其中使一个或多个处理器执行第二分类的一个或多个指令使一个或多个处理器:wherein causing the one or more processors to execute the one or more instructions of the second classification causes the one or more processors to:
将多个光谱分配到一组局部分类模型的一个或多个类别,Assign multiple spectra to one or more classes of a set of local classification models,
一组局部分类模型的一个或多个类别相应于一种或多种化合物;以及One or more classes of a set of local classification models correspond to one or more compounds; and
其中使一个或多个处理器提供识别多个未知样品的分类的一个或多个指令还:wherein the one or more instructions causing the one or more processors to provide for identifying a classification of the plurality of unknown samples further:
基于将多个光谱分配到一个或多个类别来提供指示多个光谱中的与多个未知样品中的一个未知样品相关的光谱被分配到的一个或多个类别中的一个类别的信息。Information indicating one of the one or more categories to which a spectrum in the plurality of spectra associated with one of the unknown samples is assigned is provided based on assigning the plurality of spectra to the one or more categories.
(14)如(9)所述的计算机可读介质,其中使一个或多个处理器执行第二分类的一个或多个指令使一个或多个处理器:(14) The computer-readable medium of (9), wherein the one or more instructions that cause the one or more processors to execute the second classification cause the one or more processors to:
确定与一组局部分类模型中的特定局部分类模型和多个未知样品中的特定未知样品相关的一组决策值,determining a set of decision values associated with a particular local classification model from a set of local classification models and a particular unknown sample from a plurality of unknown samples,
决策值相应于特定局部分类模型的一组类别中的一个类别;以及The decision value corresponds to one of a set of classes of a particular local classification model; and
基于一组决策值将特定未知样品分配到一组类别中的该类别。Assigns a particular unknown sample to one of a set of classes based on a set of decision values.
(15)本申请涉及一种方法,包括:(15) The present application relates to a method comprising:
由设备接收识别由第一分光计执行的未知样品的光谱测量的结果的信息;receiving, by the device, information identifying results of a spectral measurement of the unknown sample performed by the first spectrometer;
基于光谱测量的结果和全局分类模型由设备来执行未知样品的第一分类,A first classification of the unknown sample is performed by the device based on the results of the spectral measurements and the global classification model,
全局分类模型利用支持向量机(SVM)分类器技术和由第二分光计执行的一组光谱测量来生成;A global classification model is generated using a support vector machine (SVM) classifier technique and a set of spectral measurements performed by a second spectrometer;
基于第一分类由设备来生成局部分类模型,generating, by the device, a local classification model based on the first classification,
局部分类模型利用SVM分类器技术,The local classification model uses SVM classifier technology.
局部分类模型包括全局分类模型的一组类别中的类别的子集;The local classification model includes a subset of the categories from the set of categories of the global classification model;
基于光谱测量的结果和局部分类模型由设备来执行未知样品的第二分类;以及performing, by the device, a second classification of the unknown sample based on the results of the spectral measurement and the local classification model; and
基于执行第二分类由设备来提供识别类别的子集中的与未知样品相关的类别的信息。Information identifying a class in the subset of classes associated with the unknown sample is provided by the device based on performing the second classification.
(16)如(15)所述的方法,其中与SVM分类器技术相关的核函数包括下列项中的至少一个:(16) The method of (15), wherein the kernel function associated with the SVM classifier technique includes at least one of the following:
径向基函数类型的核函数,Kernel functions of the radial basis function type,
线性函数类型的核函数,Kernel function of linear function type,
S型函数类型的核函数,S-type kernel function,
多项式函数类型的核函数,或a kernel function of polynomial type, or
指数函数类型的核函数。Exponential kernel function.
(17)如(15)所述的方法,其中执行第二分类包括:(17) The method of (15), wherein performing the second classification comprises:
基于下列项中的至少一个将未知样品分配到类别的子集:Assign unknown samples to a subset of classes based on at least one of the following:
与类别相关的概率值,或The probability value associated with the class, or
与类别相关的决策值。The decision value associated with the category.
(18)如(15)所述的方法,其中第一分光计不同于第二分光计。(18) The method of (15), wherein the first spectrometer is different from the second spectrometer.
(19)如(15)所述的方法,还包括:(19) The method as described in (15), further comprising:
从与第二分光计相关的控制设备接收全局分类模型;receiving a global classification model from a control device associated with a second spectrometer;
经由数据结构存储全局分类模型;以及storing the global classification model via a data structure; and
其中执行第一分类包括:The first category includes:
从数据结构得到全局分类模型;以及deriving a global classification model from the data structure; and
使用全局分类模型执行第一分类。The first classification is performed using the global classification model.
(20)如(15)所述的方法,还包括:(20) The method as described in (15), further comprising:
提供识别与第二分类相关的置信度度量的信息,providing information identifying a confidence measure associated with the second classification,
置信度度量表示未知样品利用其被分配到类别的置信度的量度。The confidence metric represents a measure of the confidence with which an unknown sample is assigned to a class.
附图的简要说明BRIEF DESCRIPTION OF THE DRAWINGS
图1A和1B是本文所描述的示例实现方式的概观的图示;1A and 1B are diagrams of overviews of example implementations described herein;
图2是在其中本文所描述的系统和/或方法可被实现方式的示例环境的图示;FIG2 is an illustration of an example environment in which the systems and/or methods described herein may be implemented;
图3是图2的一个或多个设备的示例部件的图示;FIG3 is a diagram of example components of one or more devices of FIG2;
图4是用于基于支持向量机分类器来生成用于原材料识别的全局分类模型的示例过程的流程图;4 is a flow chart of an example process for generating a global classification model for raw material identification based on a support vector machine classifier;
图5是与图4中所示的示例过程有关的示例实现方式的图示;FIG5 is a diagram of an example implementation related to the example process shown in FIG4;
图6是用于使用多级分类技术来执行原材料识别的示例过程的流程图;以及FIG6 is a flow chart of an example process for performing raw material identification using a multi-level classification technique; and
图7A和7B是关于与图6中所示的示例过程相关的预测成功率的示例实现方式的图示。7A and 7B are diagrams of example implementations of predicting success rates associated with the example process shown in FIG. 6 .
详细描述Detailed description
示例实现方式的以下详细描述参考所附附图。在不同附图中的相同参考数字可识别相同或相似的元件。The following detailed description of example implementations refers to the accompanying drawings, in which the same reference numerals in different drawings may identify the same or similar elements.
原材料识别(RMID)是用于识别特定样品的组分(例如,成分)以用于识别、验证等的技术。例如,RMID可用于验证在药物化合物中的成分相应于在标签上标识的一组成分。分光计可用于对样品(例如,药物化合物)执行光谱学以确定样品的组分。分光计可确定样品的一组测量并可提供该组测量以用于分类。化学计量分类技术(例如,分类器)可基于样品的该组测量来便于样品的组分的确定。然而,相对于其它技术,一些化学计量分类技术可与差的可转移性、用于执行大规模分类的不足的粒度等相关。本文所述的实现方式可利用层次式支持向量机分类器以便于RMID。以这种方式,相对于其它RMID技术,分光计的控制设备有助于提高的分类准确度。Raw material identification (RMID) is a technique for identifying the components (e.g., ingredients) of a particular sample for identification, verification, and the like. For example, RMID can be used to verify that the components in a pharmaceutical compound correspond to a set of components identified on a label. A spectrometer can be used to perform spectroscopy on a sample (e.g., a pharmaceutical compound) to determine the components of the sample. The spectrometer can determine a set of measurements of the sample and can provide the set of measurements for classification. Chemometric classification techniques (e.g., classifiers) can facilitate determination of the components of a sample based on the set of measurements of the sample. However, relative to other techniques, some chemometric classification techniques may be associated with poor transferability, insufficient granularity for performing large-scale classification, and the like. The implementation described herein can utilize a hierarchical support vector machine classifier to facilitate RMID. In this way, relative to other RMID techniques, the control device of the spectrometer contributes to improved classification accuracy.
图1A和1B是本文所描述的示例实现方式100的概观的图。如图1A所示,示例实现方式100可包括第一控制设备和第一分光计。第一控制设备可使第一分光计对训练组(例如,用于训练分类模型的一组已知样品)执行一组光谱测量。训练组可被选择为包括对于每个类别的分类模型的阈值数量的样品。分类模型的类别可以涉及共享共有的一个或多个特征的一组类似的化合物,例如(在制药学背景中)乳糖化合物、果糖化合物、醋氨酚化合物、异丁苯丙酸化合物、阿斯匹林化合物等。Figures 1A and 1B are diagrams of overviews of an example implementation 100 described herein. As shown in Figure 1A, the example implementation 100 may include a first control device and a first spectrometer. The first control device may cause the first spectrometer to perform a set of spectral measurements on a training set (e.g., a set of known samples used to train a classification model). The training set may be selected to include a threshold number of samples for each class of the classification model. The classes of the classification model may relate to a group of similar compounds that share one or more common features, such as (in a pharmaceutical context) lactose compounds, fructose compounds, acetaminophen compounds, ibuprofen compounds, aspirin compounds, etc.
如进一步在图1A中所示的,第一分光计可基于从第一控制设备接收指令而对训练组执行一组光谱测量。例如,第一分光计可确定对于训练组的每个样品的光谱。第一分光计可向第一控制设备提供该组光谱测量。第一控制设备可使用特定的分类技术并基于该组光谱测量来生成全局分类模型。例如,第一控制设备可使用支持向量机(SVM)技术(例如,用于信息分类的机器学习技术)来生成全局分类模型。全局分类模型可包括与将特定光谱分配到特定类别相关的信息,并可包括与识别和特定类别相关的化合物的类型相关的信息。以这种方式,控制设备可提供基于将未知样品的光谱分配到特定类别来识别未知样品的化合物的类型的信息。全局分类模型可经由数据结构被存储,被提供到一个或多个其它控制设备,等等。As further shown in Figure 1A, the first spectrometer may perform a set of spectral measurements on the training set based on instructions received from the first control device. For example, the first spectrometer may determine a spectrum for each sample in the training set. The first spectrometer may provide the set of spectral measurements to the first control device. The first control device may use a specific classification technique and generate a global classification model based on the set of spectral measurements. For example, the first control device may use support vector machine (SVM) technology (e.g., a machine learning technology for information classification) to generate the global classification model. The global classification model may include information related to assigning specific spectra to specific categories, and may include information related to identifying the type of compound associated with the specific category. In this way, the control device can provide information to identify the type of compound of the unknown sample based on assigning the spectrum of the unknown sample to a specific category. The global classification model may be stored via a data structure, provided to one or more other control devices, and so on.
如图1B所示,第二控制设备可接收全局分类模型(例如,来自第一控制设备),并可经由数据结构存储全局分类模型。第二控制设备可使第二分光计对未知组(例如,对其将执行RMID的一组未知样品)执行一组光谱测量。第二分光计可基于从第二控制设备接收指令而执行该组光谱测量。例如,第二分光计可确定对于未知组的每个样品的光谱。第二分光计可向第二控制设备提供该组光谱测量。第二控制设备可基于全局分类模型使用多级分类技术来对未知组执行RMID。As shown in FIG1B , a second control device may receive a global classification model (e.g., from a first control device) and may store the global classification model via a data structure. The second control device may cause a second spectrometer to perform a set of spectral measurements on an unknown group (e.g., a group of unknown samples for which RMID is to be performed). The second spectrometer may perform the set of spectral measurements based on instructions received from the second control device. For example, the second spectrometer may determine a spectrum for each sample in the unknown group. The second spectrometer may provide the set of spectral measurements to the second control device. The second control device may perform RMID on the unknown group using a multi-level classification technique based on the global classification model.
关于图1B,第二控制设备可使用全局分类模型来执行未知组的特定样品的第一分类。第二控制设备可确定与特定样品和全局分类模型相关的一组置信度度量。置信度度量可以涉及与将特定样品分配到特定类别相关的置信度。例如,第二控制设备可确定与特定样品和全局分类模块的每个类别相关的置信度度量。第二控制设备可基于一个或多个相应的置信度度量来选择全局分类模型的类别的子集,并可基于该类别的集来生成局部分类模型。局部分类模型可以涉及使用SVM技术和类别的子集所生成的原位分类模型。第二控制设备可基于局部分类模型来执行第二分类以将特定样品分配到特定类别。以这种方式,第二控制设备以相对于其它分类模型和/或单极分类技术提高的准确度来对未知组的特定样品执行RMID。第二控制设备可对未知组的每个样品执行第一分类和第二分类以识别未知组的每个样品。在另一例子中,第一控制设备可基于由第一分光计执行的光谱学使用全局分类模型和局部分类模型来将特定样品分类。With respect to FIG. 1B , a second control device may use a global classification model to perform a first classification of a specific sample of an unknown group. The second control device may determine a set of confidence metrics associated with the specific sample and the global classification model. The confidence metrics may relate to the confidence associated with assigning the specific sample to a specific category. For example, the second control device may determine a confidence metric associated with the specific sample and each category of the global classification model. The second control device may select a subset of categories of the global classification model based on one or more corresponding confidence metrics and may generate a local classification model based on the set of categories. The local classification model may involve an in-situ classification model generated using support vector machines (SVMs) techniques and a subset of categories. The second control device may perform a second classification based on the local classification model to assign the specific sample to a specific category. In this manner, the second control device performs RMID on the specific sample of the unknown group with improved accuracy relative to other classification models and/or single-pole classification techniques. The second control device may perform the first classification and the second classification on each sample of the unknown group to identify each sample of the unknown group. In another example, the first control device may use the global classification model and the local classification model to classify the specific sample based on spectroscopy performed by the first spectrometer.
图2是在其中本文所描述的系统和/或方法可被实现的示例环境200的图示。如图2所示,环境200可包括控制设备210、分光计220和网络230。可经由有线连接、无线连接或有线和无线连接的组合来使环境200的设备互连。FIG2 is a diagram of an example environment 200 in which the systems and/or methods described herein may be implemented. As shown in FIG2 , environment 200 may include a control device 210, a spectrometer 220, and a network 230. The devices of environment 200 may be interconnected via wired connections, wireless connections, or a combination of wired and wireless connections.
控制设备210可包括能够存储、处理和/或按规定路线发送与RMID相关的信息的一个或多个设备。例如,控制设备210可包括服务器、计算机、穿戴式设备、云计算设备等等,其基于分类器和训练组的一组测量来生成模型并利用该模型来基于未知组的一组测量执行RMID。在一些实现方式中,控制设备210可与特定的分光计220相关。在一些实现方式中,控制设备210可与多个分光计220相关。在一些实现方式中,控制设备210可从环境200中的另一设备(例如,分光计220)接收信息和/或将信息发送到环境200中的另一设备(例如,分光计220)。The control device 210 may include one or more devices capable of storing, processing, and/or routing information related to RMID. For example, the control device 210 may include a server, a computer, a wearable device, a cloud computing device, etc., which generates a model based on a classifier and a set of measurements from a training set and uses the model to perform RMID based on a set of measurements from an unknown set. In some implementations, the control device 210 may be associated with a specific spectrometer 220. In some implementations, the control device 210 may be associated with multiple spectrometers 220. In some implementations, the control device 210 may receive information from another device in the environment 200 (e.g., the spectrometer 220) and/or send information to another device in the environment 200 (e.g., the spectrometer 220).
分光计220可包括能够对样品执行光谱测量的一个或多个设备。例如,分光计220可包括执行光谱学(例如,振动光谱学,诸如近红外(NIR)分光计、中红外光谱学(中IR)、拉曼光谱学等)的分光计设备。在一些实现方式中,分光计220可合并到穿戴式设备中,诸如穿戴式分光计等。在一些实现方式中,分光计220可从环境200中的另一设备(诸如,控制设备210)接收信息和/或将信息发送到环境200中的另一设备(诸如,控制设备210)。Spectrometer 220 may include one or more devices capable of performing spectroscopic measurements on a sample. For example, spectrometer 220 may include a spectrometer device that performs spectroscopy (e.g., vibrational spectroscopy, such as a near-infrared (NIR) spectrometer, mid-infrared spectroscopy (mid-IR), Raman spectroscopy, etc.). In some implementations, spectrometer 220 may be incorporated into a wearable device, such as a wearable spectrometer. In some implementations, spectrometer 220 may receive information from another device in environment 200 (such as control device 210) and/or send information to another device in environment 200 (such as control device 210).
网络230可包括一个或多个有线和/或无线网络。例如,网络230可包括蜂窝网络(例如,长期演进(LTE)网络、3G网络、码分多址(CDMA)网络等)、公共陆地移动网络(PLMN)、局域网(LAN)、广域网(WAN)、城域网(MAN)、电话网络(例如,公共交换电话网络(PSTN))、专用网络、自组织网络、内联网、互联网、基于光纤的网络、云计算网络等和/或这些和其它类型的网络的组合。The network 230 may include one or more wired and/or wireless networks. For example, the network 230 may include a cellular network (e.g., a Long Term Evolution (LTE) network, a 3G network, a Code Division Multiple Access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., a public switched telephone network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber-optic-based network, a cloud computing network, etc., and/or combinations of these and other types of networks.
在图2中所示的设备和网络的数量和布置作为例子被提供。实际上,与图2中所示的那些设备和/或网络相比,可以有额外的设备和/或网络、更少的设备和/或网络、不同的设备和/或网络或不同地布置的设备和/或网络。此外,图2中所示的两个或更多个设备可在单个设备内实现,或图2中所示的单个设备可被实现为多个分布式设备。例如,虽然控制设备210和分光计220在本文中被描述为两个分离的设备,但控制设备210和分光计220可在单个设备内实现。另外或可选地,环境200的一组设备(例如,一个或多个设备)可执行被描述为由环境200的另一组设备执行的一个或多个功能。The number and arrangement of the devices and networks shown in FIG2 are provided as examples. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or differently arranged devices and/or networks compared to those shown in FIG2 . In addition, two or more devices shown in FIG2 may be implemented within a single device, or the single device shown in FIG2 may be implemented as multiple distributed devices. For example, although control device 210 and spectrometer 220 are described herein as two separate devices, control device 210 and spectrometer 220 may be implemented within a single device. Additionally or alternatively, one or more devices (e.g., one or more devices) of environment 200 may perform one or more functions described as being performed by another group of devices of environment 200.
图3是设备300的示例部件的图示。设备300可相应于控制设备210和/或分光计220。在一些实现方式中,控制设备210和/或分光计220可包括一个或多个设备300和/或设备300的一个或多个部件。如图3所示,设备300可包括总线310、处理器320、存储器330、存储部件340、输入部件350、输出部件360和通信接口370。FIG3 is a diagram of example components of a device 300. Device 300 may correspond to control device 210 and/or spectrometer 220. In some implementations, control device 210 and/or spectrometer 220 may include one or more devices 300 and/or one or more components of device 300. As shown in FIG3 , device 300 may include a bus 310, a processor 320, a memory 330, a storage component 340, an input component 350, an output component 360, and a communication interface 370.
总线310可包括允许在设备300的部件中的通信的部件。处理器320在硬件、固件或硬件和软件的组合中实现。处理器320可包括处理器(例如,中央处理单元(CPU)、图形处理单元(GPU)、加速处理单元(APU)等)、微处理器和/或解释和/或执行指令的任何处理部件(例如,现场可编程门阵列(FPGA)、专用集成电路(ASIC)等)。在一些实现方式中,处理器320可包括能够被编程为执行功能的一个或多个处理器。存储器330可包括随机存取存储器(RAM)、只读存储器(ROM)和/或存储信息和/或指令以供处理器320使用的另一类型的动态或静态存储设备(例如,闪存存储器、磁性存储器、光学存储器等)。The bus 310 may include components that allow communication among the components of the device 300. The processor 320 is implemented in hardware, firmware, or a combination of hardware and software. The processor 320 may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), etc.), a microprocessor, and/or any processing component that interprets and/or executes instructions (e.g., a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), etc.). In some implementations, the processor 320 may include one or more processors that can be programmed to perform functions. The memory 330 may include random access memory (RAM), read-only memory (ROM), and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and/or instructions for use by the processor 320.
存储部件340可存储与设备300的操作和使用有关的信息和/或软件。例如,存储部件340可包括硬盘(例如,磁盘、光学盘、磁光盘、固态磁盘等)、光盘(CD)、数字通用盘(DVD)、软盘、磁带盒、磁带和/或另一类型的计算机可读介质连同相应的驱动器。Storage component 340 may store information and/or software related to the operation and use of device 300. For example, storage component 340 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optical disk, a solid-state disk, etc.), a compact disk (CD), a digital versatile disk (DVD), a floppy disk, a magnetic cassette, a magnetic tape, and/or another type of computer-readable medium along with a corresponding drive.
输入部件350可包括允许设备300诸如经由用户输入(例如,触摸屏显示器、键盘、小键盘、鼠标、按钮、开关、麦克风等)接收信息的部件。另外或可选地,输入部件350可包括用于感测信息的传感器(例如,全球定位系统(GPS)部件、加速计、陀螺仪、致动器等)。输出部件360可包括从设备300提供输出信息的部件(例如,显示器、扬声器、一个或多个发光二极管(LED)等)。Input components 350 may include components that allow device 300 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, buttons, switches, a microphone, etc.). Additionally or alternatively, input components 350 may include sensors for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, an actuator, etc.). Output components 360 may include components that provide output information from device 300 (e.g., a display, a speaker, one or more light emitting diodes (LEDs), etc.).
通信接口370可包括使设备300能够诸如经由有线连接、无线连接或有线和无线连接的组合与其它设备通信的像收发器的部件(例如,收发器、分开的接收器和发射器等)。通信接口370可允许设备300从另一设备接收信息和/或向另一设备提供信息。例如,通信接口370可包括以太网接口、光学接口、同轴接口、红外接口、射频(RF)接口、通用串行总线(USB)接口、Wi-Fi接口、蜂窝网络接口等。The communication interface 370 may include transceiver-like components (e.g., a transceiver, a separate receiver and transmitter, etc.) that enable the device 300 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. The communication interface 370 may allow the device 300 to receive information from another device and/or provide information to another device. For example, the communication interface 370 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, etc.
设备300可执行在本文中所述的一个或多个过程。设备300可响应于处理器320执行由计算机可读介质(诸如存储器330和/或存储部件340)存储的软件指令而执行这些过程。计算机可读介质在本文中被定义为非临时性存储器设备。存储器设备包括在单个物理存储设备内的存储空间或遍布多个物理存储设备的存储空间。Device 300 can perform one or more processes described herein. Device 300 can perform these processes in response to processor 320 executing software instructions stored by a computer-readable medium (such as memory 330 and/or storage component 340). Computer-readable media is defined herein as a non-transitory memory device. A memory device includes storage space within a single physical storage device or storage space spread across multiple physical storage devices.
软件指令可从另一计算机可读介质或从另一设备经由通信接口370被读取到存储器330和/或存储部件340内。当被执行时,存储在存储器330和/或存储部件340中的软件指令可使处理器320执行本文中所述的一个或多个过程。另外或可选地,可代替或结合软件指令来使用硬连线电路以执行本文所述的一个或多个过程。因此,本文所述的实现方式不限于硬件电路和软件的任何特定的组合。The software instructions may be read from another computer-readable medium or from another device into the memory 330 and/or storage component 340 via the communication interface 370. When executed, the software instructions stored in the memory 330 and/or storage component 340 may cause the processor 320 to perform one or more processes described herein. Additionally or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, the implementations described herein are not limited to any specific combination of hardware circuitry and software.
图3所示的部件的数量和布置作为例子被提供。实际上,与图3所示的那些部件相比,设备300可包括额外的部件、更少的部件、不同的部件或不同地布置的部件。另外或可选地,环境300的一组部件(例如,一个或多个部件)可执行被描述为由设备300的另一组部件执行的一个或多个功能。The number and arrangement of components shown in FIG3 are provided as examples. In practice, device 300 may include additional components, fewer components, different components, or differently arranged components compared to those shown in FIG3. Additionally or alternatively, one or more components of environment 300 (e.g., one or more components) may perform one or more functions described as being performed by another group of components of device 300.
图4是用于基于支持向量机分类器来生成用于原材料识别的全局分类模型的示例过程400的流程图。在一些实现中,图4的一个或多个过程块可由控制设备210执行。在一些实现方式中,图4的一个或多个过程块可由与控制设备210分离或包括控制设备210的另一设备或一组设备(诸如分光计220)执行。FIG4 is a flow chart of an example process 400 for generating a global classification model for raw material identification based on a support vector machine classifier. In some implementations, one or more process blocks of FIG4 can be performed by the control device 210. In some implementations, one or more process blocks of FIG4 can be performed by another device or group of devices (such as the spectrometer 220) that is separate from or includes the control device 210.
如图4所示,过程400可包括使一组光谱测量对训练组执行(块410)。例如,控制设备210可使分光计220对样品的训练组执行一组光谱测量以确定对于训练组的每个样品的光谱。训练组可以指一个或多个已知化合物的一组样品,其可用于生成全局分类模型。例如,训练组可包括一组化合物的一个或多个型式(例如,由不同的制造商制造以控制制造差异的一个或多个型式)。在一些实现方式中,可基于将对其执行RMID的预期组的化合物来选择训练组。例如,当预期对药物化合物执行RMID时,训练组可包括活性药物成分(API)、赋形剂等的一组样品。在一些实现方式中,训练组可被选择以包括对于每种类型的化合物的特定数量的样品。例如,训练组可被选择以包括特定化合物的多个样品(例如,5个样品、10个样品、15个样品、50个样品等)。以这种方式,控制器210可以被提供有与特定类型的化合物相关的阈值数量的光谱,从而便于能够向其准确分配未知样品的分类模型(例如,全局分类模型、局部分类模型等)的类别的生成。As shown in Figure 4, process 400 may include causing a set of spectral measurements to be performed on a training set (block 410). For example, the control device 210 may cause the spectrometer 220 to perform a set of spectral measurements on the training set of samples to determine the spectrum for each sample of the training set. A training set may refer to a set of samples of one or more known compounds that can be used to generate a global classification model. For example, a training set may include one or more versions of a set of compounds (e.g., one or more versions manufactured by different manufacturers to control manufacturing differences). In some implementations, a training set may be selected based on the compounds of the expected group on which RMID will be performed. For example, when it is expected that RMID will be performed on a pharmaceutical compound, the training set may include a set of samples of active pharmaceutical ingredients (APIs), excipients, etc. In some implementations, a training set may be selected to include a specific number of samples for each type of compound. For example, a training set may be selected to include multiple samples of a specific compound (e.g., 5 samples, 10 samples, 15 samples, 50 samples, etc.). In this manner, the controller 210 may be provided with a threshold number of spectra associated with a particular type of compound, thereby facilitating the generation of a classification model (eg, a global classification model, a local classification model, etc.) to which an unknown sample can be accurately assigned.
在一些实现方式中,控制设备210可使多个分光计220执行该组光谱测量以考虑到一个或多个物理条件。例如,控制设备210可使第一分光计220和第二分光计220使用NIR光谱学执行一组振动光谱测量。另外或可选地,控制设备210可使该组光谱测量在多个时间处、在多个位置上、在多个不同的实验室条件下等被执行。以这种方式,控制设备210减小了光谱测量是不准确的可能性,作为相对于使该组光谱测量由单个分光计220执行的物理条件的结果。In some implementations, the control device 210 can cause multiple spectrometers 220 to perform the set of spectral measurements to account for one or more physical conditions. For example, the control device 210 can cause the first spectrometer 220 and the second spectrometer 220 to perform a set of vibrational spectral measurements using NIR spectroscopy. Additionally or alternatively, the control device 210 can cause the set of spectral measurements to be performed at multiple times, at multiple locations, under multiple different laboratory conditions, and so on. In this manner, the control device 210 reduces the likelihood that the spectral measurements will be inaccurate as a result of the physical conditions that caused the set of spectral measurements to be performed by a single spectrometer 220.
如进一步在图4中所示的,过程400可包括接收识别该组光谱测量的结果的信息(块420)。例如,控制设备210可接收识别该组光谱测量的结果的信息。在一些实现方式中,控制设备210可接收识别相应于训练组的样品的一组光谱的信息。例如,控制设备210可接收识别当分光计220对训练组执行光谱学时观察到的特定光谱的信息。另外或可选地,控制设备210可接收其它信息作为该组光谱测量的结果。例如,控制设备210可接收与识别能量的吸收、能量的发射、能量的散射等相关的信息。As further shown in FIG4 , process 400 may include receiving information identifying the results of the set of spectral measurements (block 420). For example, control device 210 may receive information identifying the results of the set of spectral measurements. In some implementations, control device 210 may receive information identifying a set of spectra corresponding to samples of the training set. For example, control device 210 may receive information identifying specific spectra observed when spectrometer 220 performed spectroscopy on the training set. Additionally or alternatively, control device 210 may receive other information as a result of the set of spectral measurements. For example, control device 210 may receive information related to identifying absorption of energy, emission of energy, scattering of energy, and the like.
在一些实现方式中,控制设备210可从多个分光计220接收识别该组光谱测量的结果的信息。例如,控制设备210可通过接收由多个分光计220执行的、在多个不同的时间处执行的、在多个不同的位置处执行等的光谱测量来控制物理条件,诸如在多个分光计220之间的差异、在实验室条件中的潜在差异等。In some implementations, the control device 210 can receive information identifying the results of the set of spectral measurements from the plurality of spectrometers 220. For example, the control device 210 can control physical conditions, such as differences between the plurality of spectrometers 220, potential differences in laboratory conditions, etc., by receiving spectral measurements performed by the plurality of spectrometers 220, performed at a plurality of different times, performed at a plurality of different locations, etc.
如进一步在图4中所示的,过程400可包括基于识别该组光谱测量的结果的信息来生成与特定的分类器相关的全局分类模型(块430)。例如,控制设备210可基于识别该组光谱测量的结果的信息来生成与SVM分类器技术相关的全局分类模型。在一些实现方式中,控制设备210可执行一组分类以生成全局分类模型。例如,控制设备210可基于使用SVM技术来将由该组光谱测量的结果所识别的一组光谱分配到一组类别内。As further shown in FIG4 , process 400 may include generating a global classification model associated with a particular classifier based on the information identifying the results of the set of spectral measurements (block 430). For example, the control device 210 may generate a global classification model associated with a SVM classifier technique based on the information identifying the results of the set of spectral measurements. In some implementations, the control device 210 may perform a set of classifications to generate the global classification model. For example, the control device 210 may assign the set of spectra identified by the results of the set of spectral measurements to a set of categories based on the use of the SVM technique.
SVM可以指执行用于分类的模式识别的监督的学习模型。在一些实现方式中,当使用SVM技术生成全局分类模型时,控制设备210可利用特定类型的核函数。例如,控制设备210可利用径向基函数(RBF)(例如,被称为SVM-rbf)类型的核函数、线性函数(例如,当用于多级分类技术时被称为SVM线性并被称为hier-SVM线性)类型的核函数、S型函数类型的核函数、多项式函数类型的核函数、指数函数类型的核函数等。在一些实现方式中,控制设备210可利用特定类型的SVM,诸如基于概率值的SVM(例如,基于确定样品是一组类别中的类别的成员的概率的分类)、基于决策值的SVM(例如,利用决策函数来投票赞成一组类别中的类别作为样品是其成员的类别的分类)等。SVM can refer to a supervised learning model that performs pattern recognition for classification. In some implementations, when using the SVM technology to generate a global classification model, the control device 210 can utilize a specific type of kernel function. For example, the control device 210 can utilize a kernel function of the radial basis function (RBF) (e.g., referred to as SVM-rbf) type, a kernel function of the linear function (e.g., when used for multi-class classification technology, referred to as SVM linear and referred to as hier-SVM linear) type, a kernel function of the S-type function type, a kernel function of the polynomial function type, a kernel function of the exponential function type, etc. In some implementations, the control device 210 can utilize a specific type of SVM, such as an SVM based on a probability value (e.g., a classification based on the probability that a sample is a member of a class in a set of classes), an SVM based on a decision value (e.g., a classification using a decision function to vote in favor of a class in a set of classes as a class of which the sample is a member), etc.
在一些实现方式中,控制设备210可选择用于从一组分类技术生成全局分类模型的特定分类器。例如,控制设备210可生成相应于多个分类器的多个分类模型,并可诸如通过确定每个模型的可转移性(例如,基于在第一分光计220上执行的光谱测量而生成的分类模型在应用于在第二分光计220上执行的光谱测量时是准确的程度)、大规模分类准确度(例如,分类模型可用于同时对满足阈值的一些样品分类时的准确度)等来测试多个分类模型。在这种情况下,控制设备210可基于确定SVM分类器与相对于其它分类器的更好的可移动性和/或大规模分类准确度相关来选择SVM分类器(例如,hier-SVM线性)。In some implementations, the control device 210 may select a particular classifier for generating a global classification model from a set of classification techniques. For example, the control device 210 may generate multiple classification models corresponding to multiple classifiers and may test the multiple classification models, such as by determining each model's transferability (e.g., the degree to which a classification model generated based on spectral measurements performed on a first spectrometer 220 is accurate when applied to spectral measurements performed on a second spectrometer 220), large-scale classification accuracy (e.g., the accuracy with which a classification model can be used to simultaneously classify a number of samples that meet a threshold), etc. In this case, the control device 210 may select an SVM classifier (e.g., a hier-SVM linear) based on determining that the SVM classifier is associated with better transferability and/or large-scale classification accuracy relative to other classifiers.
在一些实现方式中,控制设备210可基于识别训练组的样品的信息来生成全局分类模型。例如,控制设备210可利用识别由训练组的样品所代表的化合物的类型的信息来识别具有化合物的类型的光谱的类别。在一些实现方式中,当生成全局分类模型时,控制设备210可训练全局分类模型。例如,控制设备210可使用该组光谱测量的一部分训练该模型。另外或可选地,控制设备210可执行全局分类模型的评估。例如,控制设备210可利用该组光谱测量的另一部分来验证全局分类模型(例如,针对预测强度)。在一些实现方式中,控制设备210可使用多级分类技术来验证全局分类模型。例如,当关于一个或多个局部分类模型被利用时,控制设备210可确定全局分类模型是准确的,如关于图6在本文所述的。以这种方式,控制设备210在提供全局分类模型以用于由与其它分光计220相关的其它控制设备210利用之前确保具有阈值准确度的全局分类模型被生成。In some implementations, the control device 210 may generate a global classification model based on information identifying the samples of the training set. For example, the control device 210 may use information identifying the types of compounds represented by the samples of the training set to identify the categories of spectra having the types of compounds. In some implementations, when generating the global classification model, the control device 210 may train the global classification model. For example, the control device 210 may use a portion of the set of spectral measurements to train the model. Additionally or alternatively, the control device 210 may perform an evaluation of the global classification model. For example, the control device 210 may use another portion of the set of spectral measurements to validate the global classification model (e.g., for predicted intensity). In some implementations, the control device 210 may use a multi-level classification technique to validate the global classification model. For example, when one or more local classification models are utilized, the control device 210 may determine that the global classification model is accurate, as described herein with respect to FIG. 6 . In this way, the control device 210 ensures that a global classification model with a threshold accuracy is generated before providing the global classification model for utilization by other control devices 210 associated with other spectrometers 220.
在一些实现方式中,控制设备210可在生成全局分类模型之后向与其它分光计220相关的其它控制设备210提供全局分类模型。例如,第一控制设备210可生成全局分类模型,并可向第二控制设备210提供全局分类模型以用于利用。在这种情况下,第二控制设备210可存储全局分类模型,并可在生成一个或多个局部分类模型并对未知组的一个或多个样品分类时利用全局分类模型,如在本文关于图6所述的。另外或可选地,控制设备210可存储全局分类模型以用于由控制设备210在生成一个或多个局部分类模型并对一个或多个样品分类时利用。以这种方式,控制设备210提供全局分类模型用于在未知样品的RMID中利用。In some implementations, the control device 210 may provide the global classification model to other control devices 210 associated with other spectrometers 220 after generating the global classification model. For example, a first control device 210 may generate a global classification model and may provide the global classification model to a second control device 210 for use. In this case, the second control device 210 may store the global classification model and may use the global classification model when generating one or more local classification models and classifying one or more samples of the unknown group, as described herein with respect to FIG. 6 . Additionally or alternatively, the control device 210 may store the global classification model for use by the control device 210 when generating one or more local classification models and classifying one or more samples. In this manner, the control device 210 provides the global classification model for use in the RMID of the unknown sample.
虽然图4示出过程400的示例块,在一些实现方式中,与在图4中描绘的那些块相比,过程400可包括额外的块、更少的块、不同的块或不同地布置的块。另外或可选地,过程400的两个或更多个块可并行地被执行。Although Figure 4 shows example blocks of process 400, in some implementations, process 400 may include additional blocks, fewer blocks, different blocks, or differently arranged blocks than those depicted in Figure 4. Additionally or alternatively, two or more blocks of process 400 may be performed in parallel.
图5是与图4所示的示例过程400有关的示例实现方式500的图。图5示出基于支持向量机分类器来生成用于原材料识别的全局分类模型的例子。Figure 5 is a diagram of an example implementation 500 related to the example process 400 shown in Figure 4. Figure 5 shows an example of generating a global classification model for raw material identification based on a support vector machine classifier.
如图5所示,控制设备210-1将信息传输到分光计220-1以指示分光计220-1对训练组510执行一组光谱测量。假设训练组510包括第一组训练样品(例如,对其的测量用于训练全局分类模型)和第二组验证样品(例如,对其的测量用于验证全局分类模型的准确度)。如由参考数字515所示的,分光计220-1基于接收到指令来对训练组执行该组光谱测量。如由参考数字520所示的,控制设备210-1接收用于训练样品的第一组光谱和用于验证样品的第二组光谱。假设控制设备210-1存储识别训练组510的每个样品的信息。As shown in Figure 5, the control device 210-1 transmits information to the spectrometer 220-1 to instruct the spectrometer 220-1 to perform a set of spectral measurements on the training set 510. It is assumed that the training set 510 includes a first set of training samples (e.g., measurements thereof are used to train the global classification model) and a second set of validation samples (e.g., measurements thereof are used to verify the accuracy of the global classification model). As shown by reference numeral 515, the spectrometer 220-1 performs the set of spectral measurements on the training set based on the received instructions. As shown by reference numeral 520, the control device 210-1 receives the first set of spectra for the training samples and the second set of spectra for the validation samples. It is assumed that the control device 210-1 stores information identifying each sample of the training set 510.
关于图5,假设控制设备210-1已经选择利用用于生成全局分类模型的hier-SVM线性分类器(例如,基于对照一个或多个其它分类器来测试hier-SVM线性分类器)。如由参考数字525所示的,控制设备210-1使用hier-SVM线性分类器和第一组光谱来训练全局分类模型,并使用hier-SVM线性分类器和第二组光谱来验证全局分类模型。假设控制设备210-1确定全局分类模型满足验证阈值(例如,具有超过验证阈值的准确度)。如由参考数字530所示的,控制设备210-1向控制设备210-2(例如,用于在对由分光计220-2执行的光谱测量执行RMID时利用)和向控制设备210-3(例如,用于在对由分光计220-3执行的光谱测量执行RMID时利用)提供全局分类模型。5 , assume that control device 210-1 has selected to utilize a hier-SVM linear classifier for generating a global classification model (e.g., based on testing the hier-SVM linear classifier against one or more other classifiers). As indicated by reference numeral 525, control device 210-1 trains the global classification model using the hier-SVM linear classifier and the first set of spectra, and validates the global classification model using the hier-SVM linear classifier and the second set of spectra. Assume that control device 210-1 determines that the global classification model satisfies a validation threshold (e.g., has an accuracy that exceeds the validation threshold). As indicated by reference numeral 530, control device 210-1 provides the global classification model to control device 210-2 (e.g., for utilization when performing RMID on spectral measurements performed by spectrometer 220-2) and to control device 210-3 (e.g., for utilization when performing RMID on spectral measurements performed by spectrometer 220-3).
如上所指示的,图5仅作为例子被提供。其它例子是可能的并可不同于关于图5所述的内容。As indicated above, FIG5 is provided only as an example. Other examples are possible and may differ from what is described with respect to FIG5.
以这种方式,控制设备210促进基于所选择的分类技术(例如,基于模型可转移性、大规模分类准确度等而选择)的全局分类模型的生成以及全局分类模型的分配用于由与一个或多个分光计220相关的一个或多个其它控制设备210利用。而且,相对于在将执行RMID的每个控制设备210上生成全局分类模型,控制设备210降低了成本和时间要求。In this manner, the control device 210 facilitates the generation of a global classification model based on a selected classification technique (e.g., selected based on model transferability, large-scale classification accuracy, etc.) and the distribution of the global classification model for utilization by one or more other control devices 210 associated with one or more spectrometers 220. Furthermore, the control device 210 reduces the cost and time requirements relative to generating the global classification model on each control device 210 that will perform RMID.
图6是用于使用多级分类技术来执行原材料识别的示例过程600的流程图。在一些实现方式中,图6的一个或多个过程块可由控制设备210执行。在一些实现方式中,图6的一个或多个过程块可由与控制设备210分离或包括控制设备210的另一设备或一组设备(诸如分光计220)执行。FIG6 is a flow chart of an example process 600 for performing raw material identification using a multi-level classification technique. In some implementations, one or more process blocks of FIG6 can be performed by the control device 210. In some implementations, one or more process blocks of FIG6 can be performed by another device or group of devices, such as the spectrometer 220, that is separate from or includes the control device 210.
如图6所示,过程600可包括接收识别对未知组执行的一组光谱测量的结果的信息(块610)。例如,控制设备210可接收识别由分光计220对未知组执行的该组光谱测量的结果的信息。未知组可包括将对其执行RMID的一组样品(例如,未知样品)。例如,控制设备210可使分光计220对该组样品执行该组光谱测量,并可接收识别相应于该组样品的一组光谱的信息。在一些实现方式中,控制设备210可接收识别来自多个分光计220的结果的信息。例如,控制设备210可使多个分光计220对未知组(例如,相同的样品组)执行该组光谱测量,并可接收识别相应于未知组的样品的一组光谱的信息。另外或可选地,控制设备210可接收识别在多个时间处、在多个位置上等执行的一组光谱测量的结果的信息,并可基于在多个时间处、在多个位置上等执行的该组光谱测量(例如,基于将该组光谱测量平均或基于另一技术)来对特定样品进行分类。以这种方式,控制设备210可考虑到可影响该组光谱测量的结果的物理条件。6 , process 600 may include receiving information identifying results of a set of spectral measurements performed on an unknown group (block 610). For example, the control device 210 may receive information identifying results of the set of spectral measurements performed by the spectrometer 220 on the unknown group. The unknown group may include a group of samples (e.g., unknown samples) for which RMID is to be performed. For example, the control device 210 may cause the spectrometer 220 to perform the set of spectral measurements on the group of samples and may receive information identifying a set of spectra corresponding to the group of samples. In some implementations, the control device 210 may receive information identifying results from multiple spectrometers 220. For example, the control device 210 may cause multiple spectrometers 220 to perform the set of spectral measurements on the unknown group (e.g., the same group of samples) and may receive information identifying a set of spectra corresponding to the samples of the unknown group. Additionally or alternatively, the control device 210 may receive information identifying results of a set of spectral measurements performed at multiple times, at multiple locations, etc., and may classify a particular sample based on the set of spectral measurements (e.g., based on averaging the set of spectral measurements or based on another technique) performed at multiple times, at multiple locations, etc. In this manner, the control device 210 may take into account physical conditions that may affect the results of the set of spectral measurements.
另外或可选地,控制设备210可使第一分光计220对未知组的第一部分执行该组光谱测量的第一部分,并可使第二分光计220对未知组的第二部分执行该组光谱测量的第二部分。以这种方式,相对于使所有光谱测量由单个分光计220执行,控制设备210可减小执行该组光谱测量的时间量。Additionally or alternatively, the control device 210 may cause the first spectrometer 220 to perform a first portion of the set of spectral measurements on a first portion of the unknown group, and may cause the second spectrometer 220 to perform a second portion of the set of spectral measurements on a second portion of the unknown group. In this manner, the control device 210 may reduce the amount of time to perform the set of spectral measurements relative to having all spectral measurements performed by a single spectrometer 220.
如进一步在图6中示出的,过程600可包括基于该组光谱测量的结果和全局分类模型来执行第一分类(块620)。例如,控制设备210可基于所述结果和全局分类模型来执行第一分类。在一些实现方式中,控制设备210可接收全局分类模型用于在执行第一分类时利用。例如,第一控制设备210可生成全局分类模型(例如,使用SVM线性分类器并基于对训练组执行的一组光谱测量,如关于图4在本文所述的),并可向第二控制设备210提供全局分类模型以用于执行未知组的第一分类。另外或可选地,控制设备210可生成全局分类模型(例如,使用SVM线性分类器并基于对训练组执行的一组光谱测量,如关于图4在本文所述的),并可利用该全局分类模型来用于执行未知组的第一分类。As further shown in FIG6 , process 600 may include performing a first classification based on the results of the set of spectral measurements and a global classification model (block 620). For example, the control device 210 may perform the first classification based on the results and the global classification model. In some implementations, the control device 210 may receive the global classification model for use in performing the first classification. For example, the first control device 210 may generate a global classification model (e.g., using an SVM linear classifier and based on a set of spectral measurements performed on a training set, as described herein with respect to FIG4 ), and may provide the global classification model to the second control device 210 for use in performing the first classification of the unknown set. Additionally or alternatively, the control device 210 may generate a global classification model (e.g., using an SVM linear classifier and based on a set of spectral measurements performed on a training set, as described herein with respect to FIG4 ), and may utilize the global classification model for use in performing the first classification of the unknown set.
在一些实现方式中,控制设备210在执行第一分类时可将未知组的特定样本分配到全局分类模型的一组类别中的特定类别。例如,控制设备210可基于全局分类模型来确定与特定样品相关的特定光谱相应于一类化合物(例如,纤维素化合物、乳糖化合物、咖啡因化合物等),并可将特定样品分配到特定类别。在一些实现方式中,控制设备210可基于置信度度量来分配特定样品。例如,控制设备210可基于全局分类模型来确定特定光谱与全局分类模型的每个类别相关的概率。在这种情况下,控制设备210可基于对于特定类别的特定概率超过与其它类别相关的概率来将该特定样品分配到该特定类别。以这种方式,控制设备210确定与样品相关的一种类型的化合物,从而识别样品。In some implementations, the control device 210 may assign a specific sample of the unknown group to a specific category in a set of categories of the global classification model when performing the first classification. For example, the control device 210 may determine, based on the global classification model, that a specific spectrum associated with the specific sample corresponds to a class of compounds (e.g., cellulose compounds, lactose compounds, caffeine compounds, etc.), and may assign the specific sample to a specific category. In some implementations, the control device 210 may assign the specific sample based on a confidence metric. For example, the control device 210 may determine, based on the global classification model, the probability that a specific spectrum is associated with each category of the global classification model. In this case, the control device 210 may assign the specific sample to the specific category based on the specific probability for the specific category exceeding the probability associated with other categories. In this way, the control device 210 determines a type of compound associated with the sample, thereby identifying the sample.
另外或可选地,控制设备210可确定与第一分类相关的另一置信度度量。例如,当控制设备210在执行第一分类时将特定样品分配到特定的类别时,控制设备210可确定在特定样品与特定类别相关的概率(例如,被称为最大概率)与特定样品与下一最可能的特定类别相关的概率(例如,被称为第二最大概率)之间的差异。以这种方式,控制设备210确定与将特定样品分配到特定类别而不是下一最可能的类别相关的置信度。当最大概率和第二最大概率都是相对高和相对类似的(例如,最大概率是48%而第二最大概率是47%,而不是最大概率是48%而第二最大概率是4%)时,控制设备210通过提供在最大概率和第二最大概率之间的差异来提供分配准确度的更好指示。换句话说,在最大概率是48%而第二最大概率是47%的第一情况下,对最可能的类别的分配准确度相对低于在最大概率是48%而第二最大概率是4%的第二情况中,虽然最大概率对于这两种情况是相同的。提供在最大概率和第二最大概率之间的差异的度量可区分开这两种情况。Additionally or alternatively, the control device 210 may determine another confidence metric associated with the first classification. For example, when the control device 210 assigns a particular sample to a particular category when performing the first classification, the control device 210 may determine the difference between the probability that the particular sample is associated with the particular category (e.g., referred to as the maximum probability) and the probability that the particular sample is associated with the next most likely particular category (e.g., referred to as the second maximum probability). In this way, the control device 210 determines the confidence associated with assigning the particular sample to a particular category rather than the next most likely category. When both the maximum probability and the second maximum probability are relatively high and relatively similar (e.g., the maximum probability is 48% and the second maximum probability is 47%, rather than the maximum probability being 48% and the second maximum probability being 4%), the control device 210 provides a better indication of the accuracy of the assignment by providing the difference between the maximum probability and the second maximum probability. In other words, in the first case where the maximum probability is 48% and the second maximum probability is 47%, the accuracy of the assignment to the most likely category is relatively lower than in the second case where the maximum probability is 48% and the second maximum probability is 4%, although the maximum probability is the same for both cases. Providing a measure of the difference between the maximum probability and the second maximum probability can distinguish between these two cases.
如进一步在图6中所示的,过程600可包括基于第一分类来生成局部分类模型(块630)。例如,控制设备210可基于第一分类来生成局部分类模型。局部分类模型可以涉及使用SVM分类技术(例如,SVM-rbf、SVM线性等;基于概率值的SVM、基于决策值的SVM等;或类似技术)基于与第一分类相关的置信度度量而生成的原位分类模型。例如,当基于全局分类模型对样本的光谱确定一组置信度度量时,控制设备210可基于光谱与全局分类模型的每个类别相关的相应概率来选择全局分类模型的类别的子集。在这种情况下,控制设备210可使用SVM分类技术并基于类别的选定子集来生成局部分类模型。As further shown in FIG6 , process 600 may include generating a local classification model based on the first classification (block 630). For example, the control device 210 may generate the local classification model based on the first classification. The local classification model may involve an in-situ classification model generated based on a confidence metric associated with the first classification using an SVM classification technique (e.g., SVM-rbf, SVM linear, etc.; SVM based on probability values, SVM based on decision values, etc.; or similar techniques). For example, when a set of confidence metrics are determined for the spectrum of the sample based on a global classification model, the control device 210 may select a subset of categories of the global classification model based on the corresponding probabilities of the spectrum being associated with each category of the global classification model. In this case, the control device 210 may use the SVM classification technique and generate the local classification model based on the selected subset of categories.
在一些实现方式中,可执行按比例缩放预处理过程。例如,为了生成局部分类模型,控制设备210可对于与选择用于局部分类模型的全局分类模型的类别的子集相关的光谱执行自动按比例缩放预处理过程。在一些实现方式中,可为使用全局分类模型的诸如第一分类的另一分类执行自动按比例缩放预处理过程。在一些实现方式中,可执行另一类型的预处理过程,诸如定中心过程、转换等。In some implementations, a scaling preprocessing process may be performed. For example, to generate a local classification model, the control device 210 may perform an automatic scaling preprocessing process on spectra associated with a subset of the categories of the global classification model selected for the local classification model. In some implementations, the automatic scaling preprocessing process may be performed for another classification, such as the first classification, that uses the global classification model. In some implementations, another type of preprocessing process may be performed, such as a centering process, a transformation, etc.
在一些实现方式中,类别的子集可包括与最高相应的置信度度量相关的阈值数量的类别。例如,控制设备210可基于全局分类模型的十个类别比全局分类模型的其它类别与更高的相应概率(样品的光谱与其相关)相关来选择该十个类别,并可基于这十个类别来生成局部模型。在一些实现方式中,控制设备210可基于满足阈值的类别的子集来选择类别的子集。例如,控制设备210可选择与满足阈值的概率相关的每个类别。另外或可选地,控制设备210可选择每个均满足阈值的阈值数量的类别。例如,控制设备210可选择高达十个类别,假定这十个类别每个满足最小阈值概率。另外或可选地,控制设备210可选择另一数量的类别(例如,两个类别、五个类别、二十个类别等)。In some implementations, the subset of categories may include a threshold number of categories associated with the highest corresponding confidence metrics. For example, the control device 210 may select ten categories of the global classification model based on the ten categories being associated with a higher corresponding probability (with which the spectrum of the sample is associated) than other categories of the global classification model, and may generate a local model based on these ten categories. In some implementations, the control device 210 may select the subset of categories based on the subset of categories that meet a threshold. For example, the control device 210 may select each category associated with a probability that meets a threshold. Additionally or alternatively, the control device 210 may select a threshold number of categories that each meet a threshold. For example, the control device 210 may select up to ten categories, assuming that each of the ten categories meets a minimum threshold probability. Additionally or alternatively, the control device 210 may select another number of categories (e.g., two categories, five categories, twenty categories, etc.).
在一些实现方式中,控制设备210可生成多个局部分类模型。例如,控制设备210可生成对于未知组的第一样品的第一光谱的第一局部分类模型和对于未知组的第二样品的第二光谱的第二局部分类模型。以这种方式,控制设备210可通过使用多个局部分类模型同时对多个未知样品操作来便于多个未知样品的同时分类。In some implementations, the control device 210 can generate multiple local classification models. For example, the control device 210 can generate a first local classification model for a first spectrum of a first sample in the unknown group and a second local classification model for a second spectrum of a second sample in the unknown group. In this manner, the control device 210 can facilitate simultaneous classification of multiple unknown samples by using multiple local classification models to operate on the multiple unknown samples simultaneously.
在一些实现方式中,控制设备210可基于使用全局分类模型执行第一分类来生成量化模型。例如,当控制设备210用于确定在未知样品中的物质的浓度且多个未知样品与用于确定物质的浓度的不同量化模型相关时,控制设备210可利用第一分类来选择未知样品的类别,并可选择与未知样品的类别相关的局部量化模型。以这种方式,控制设备210利用层次分类和量化模型来提高原材料识别和/或其量化。In some implementations, the control device 210 may generate a quantification model based on performing a first classification using a global classification model. For example, when the control device 210 is used to determine the concentration of a substance in an unknown sample, and multiple unknown samples are associated with different quantification models for determining the concentration of the substance, the control device 210 may use the first classification to select a class for the unknown sample and may select a local quantification model associated with the class of the unknown sample. In this manner, the control device 210 utilizes a hierarchical classification and quantification model to improve raw material identification and/or quantification thereof.
如进一步在图6中所示的,过程600可包括基于该组光谱测量的结果和局部分类模型来执行第二分类(块640)。例如,控制设备210可基于结果和局部分类模型来执行第二分类。在一些实现方式中,控制设备210可对特定光谱执行第二分类。例如,控制设备210可基于局部分类模型将特定光谱分配到特定类别。在一些实现方式中,控制设备210可确定与特定光谱和局部分类模型相关的一组置信度度量。例如,控制设备210可确定特定光谱与局部分类模型的每个类别相关的概率,并可将特定光谱(例如,与特定光谱相关的特定样品)分配到具有比局部分类模型的其它类别更高概率的类别。以这种方式,控制设备210识别未知组的样品。As further shown in FIG6 , process 600 may include performing a second classification based on the results of the set of spectral measurements and the local classification model (block 640). For example, the control device 210 may perform the second classification based on the results and the local classification model. In some implementations, the control device 210 may perform the second classification on a particular spectrum. For example, the control device 210 may assign the particular spectrum to a particular category based on the local classification model. In some implementations, the control device 210 may determine a set of confidence metrics associated with the particular spectrum and the local classification model. For example, the control device 210 may determine the probability that the particular spectrum is associated with each category of the local classification model, and may assign the particular spectrum (e.g., a particular sample associated with the particular spectrum) to a category with a higher probability than other categories of the local classification model. In this manner, the control device 210 identifies samples of an unknown group.
另外或可选地,控制设备210可确定与特定光谱和局部分类模型相关的另一置信度度量。例如,当控制设备210在执行第二分类时将特定样品分配到特定类别时,控制设备210可确定在特定样品与特定类别相关的概率(例如,最大概率)与特定样品与下一最可能的类别相关的概率(例如,第二最大概率)之间的差异。以这种方式,控制设备210在基于局部分类模型执行第二分类时确定与将特定样本分配到特定类别而不是下一最可能的类别相关的置信度。Additionally or alternatively, the control device 210 may determine another confidence metric associated with a particular spectrum and the local classification model. For example, when the control device 210 assigns a particular sample to a particular category when performing the second classification, the control device 210 may determine the difference between the probability that the particular sample is associated with the particular category (e.g., the maximum probability) and the probability that the particular sample is associated with the next most likely category (e.g., the second largest probability). In this manner, the control device 210 determines a confidence level associated with assigning a particular sample to a particular category rather than the next most likely category when performing the second classification based on the local classification model.
在一些实现方式中,控制设备210可执行多个第二分类。例如,控制设备210可基于第一局部分类模型来执行与第一样品相关的第一光谱的第二分类,并可基于第二局部分类模型来执行与第二样品相关的第二光谱的另一第二分类。以这种方式,控制设备210便于未知组的多个样品的同时分类。在一些实现方式中,控制设备210可从第二分类省略在未知组中的样品的一部分。例如,当控制设备210确定用于基于全局分类模型将特定样品分配到特定类别的置信度度量且置信度度量满足阈值时,控制设备210可从第二分类省略特定样品。以这种方式,控制设备210可相对于对未知组的所有样品执行第二分类而减小资源利用率。In some implementations, the control device 210 may perform multiple second classifications. For example, the control device 210 may perform a second classification of a first spectrum associated with a first sample based on a first local classification model, and may perform another second classification of a second spectrum associated with a second sample based on a second local classification model. In this way, the control device 210 facilitates simultaneous classification of multiple samples of an unknown group. In some implementations, the control device 210 may omit a portion of the samples in the unknown group from the second classification. For example, when the control device 210 determines a confidence metric for assigning a particular sample to a particular category based on a global classification model and the confidence metric meets a threshold, the control device 210 may omit the particular sample from the second classification. In this way, the control device 210 may reduce resource utilization relative to performing the second classification on all samples of the unknown group.
在一些实现方式中,控制设备210可在执行第一分类之后(和/或在执行第二分类之后)执行量化。例如,控制设备210可基于执行一个或多个分类来选择局部量化模型,并可基于选择局部量化模型来执行与特定样品有关的量化。作为例子,当执行原材料识别以确定在植物体中的特定化学物质的浓度时,其中植物体与多个量化模型相关(例如,与植物是否在户内或户外、在冬天或在夏天等生长有关),控制设备210可执行一组分类以识别特定的量化模型。在这种情况下,控制设备210可基于执行一组分类来确定植物在冬天在户内生长,并可选择与植物在冬天在户内生长有关的量化模型以用于确定特定化学物质的浓度。In some implementations, the control device 210 may perform quantification after performing the first classification (and/or after performing the second classification). For example, the control device 210 may select a local quantification model based on performing one or more classifications, and may perform quantification related to a specific sample based on selecting the local quantification model. As an example, when performing raw material identification to determine the concentration of a specific chemical in a plant body, where the plant body is associated with multiple quantification models (for example, related to whether the plant is grown indoors or outdoors, in winter or in summer, etc.), the control device 210 may perform a set of classifications to identify a specific quantification model. In this case, the control device 210 may determine that the plant is grown indoors in winter based on performing the set of classifications, and may select a quantification model related to the plant growing indoors in winter for use in determining the concentration of the specific chemical.
如进一步在图6中示出的,过程600可包括基于执行第二分类来提供识别未知组的分类的信息(块650)。例如,控制设备210可基于执行第二分类来提供识别未知组的样品的分类的信息。在一些实现方式中,控制设备210可提供识别特定样品的特定类别的信息。例如,控制设备210可提供指示与特定样品相关的特定光谱被确定为与特定类别相关的信息,从而识别样品。在一些实现方式中,控制设备210可提供指示与将特定样品分配到特定类别相关的置信度度量的信息。例如,控制设备210可提供识别特定样品与特定类别相关的概率、在特定样品的最大概率与第二最大概率之间的差异等的信息。以这种方式,控制设备210提供指示特定光谱被准确地分配到特定类别的可能性。As further shown in FIG6 , process 600 may include providing information identifying a classification of an unknown group based on performing a second classification (block 650). For example, the control device 210 may provide information identifying a classification of a sample of an unknown group based on performing a second classification. In some implementations, the control device 210 may provide information identifying a specific category of a particular sample. For example, the control device 210 may provide information indicating that a specific spectrum associated with a particular sample is determined to be associated with a specific category, thereby identifying the sample. In some implementations, the control device 210 may provide information indicating a confidence metric associated with assigning a particular sample to a specific category. For example, the control device 210 may provide information indicating the probability of identifying a particular sample being associated with a specific category, the difference between the maximum probability and the second maximum probability for a particular sample, and the like. In this way, the control device 210 provides information indicating the likelihood that a particular spectrum is accurately assigned to a specific category.
在一些实现方式中,控制设备210提供识别用于多个样品的类别的信息。例如,控制设备210可提供指示未知组的第一样品与第一类别相关而未知组的第二样品与第二类别相关的信息。以这种方式,控制设备210提供多个样品的同时识别。In some implementations, the control device 210 provides information identifying the categories for multiple samples. For example, the control device 210 may provide information indicating that a first sample of an unknown group is associated with a first category and a second sample of the unknown group is associated with a second category. In this manner, the control device 210 provides simultaneous identification of multiple samples.
在一些实现方式中,控制设备210可基于执行一组分类和量化来提供量化。例如,基于识别出局部量化模型,控制设备210可提供识别在未知样品中的物质的浓度的信息,对于未知样品利用一组分类来选择用于确定物质的浓度的量化模型。In some implementations, the control device 210 may provide quantification based on performing a set of classifications and quantifications. For example, based on identifying a local quantification model, the control device 210 may provide information identifying the concentration of a substance in an unknown sample for which a set of classifications is used to select a quantification model for determining the concentration of the substance.
在一些实现方式中,控制设备210可提供与样品的类别有关的输出。例如,控制设备210可基于将未知样品分类成第一组类别和第二组类别之一来提供与用于未知样品的分类有关的二元输出(例如,是/否输出),对于该未知样品第一组类别相应于第一二元输出(例如,是)而第二组类别相应于第二二元输出(例如,否)。作为例子,对于第一组类别(例如,犹太认证的肉(Kosher Meat),其可包括合犹太认证的牛前腰肉部位的牛排(BeefStrip Steak)、犹太认证的牛肋排、犹太认证的鸡腿、犹太认证的鸡胸等)和第二组类别(例如,非犹太认证的肉,其可包括非犹太认证的牛肋排、非犹太认证的猪肉、非犹太认证的鸡翅等),控制设备210可基于将未知样品分类为第一组类别或第二组类别的特定类别来提供犹太认证或非犹太认证的输出。作为另一例子,控制设备210可利用与被分类为清真或非清真的食物有关的一组类别,并可提供指示样品是否相应于清真类别或非清真类别的输出(即,从其得到样品的动物是否以清真方式被宰杀,而不考虑是否满足清真分类的其它标准,例如宗教证明、在宰杀期间的祈祷等)。以这种方式,当特定类别的识别对控制设备210的用户(即,试图确定一块肉是否是犹太认证的而不是试图确定肉的类型的人)不重要时,控制设备210可以以与提供特定类别的识别有关的准确度的较大可能性提供分类。In some implementations, the control device 210 may provide an output related to the classification of the sample. For example, the control device 210 may provide a binary output related to the classification of the unknown sample (e.g., a yes/no output) based on classifying the unknown sample into one of a first set of categories and a second set of categories, for which the first set of categories corresponds to a first binary output (e.g., yes) and the second set of categories corresponds to a second binary output (e.g., no). As an example, for a first set of categories (e.g., kosher meat, which may include kosher beef strip steak, kosher beef ribs, kosher chicken legs, kosher chicken breasts, etc.) and a second set of categories (e.g., non-kosher meat, which may include non-kosher beef ribs, non-kosher pork, non-kosher chicken wings, etc.), the control device 210 may provide an output of kosher or non-kosher based on classifying the unknown sample into a particular category of the first set of categories or the second set of categories. As another example, the control device 210 may utilize a set of categories associated with foods that are classified as halal or non-halal, and may provide an output indicating whether a sample corresponds to a halal category or a non-halal category (i.e., whether the animal from which the sample was obtained was slaughtered in a halal manner, without regard to whether other criteria for halal classification are met, such as religious certification, prayers during slaughter, etc.). In this manner, when identification of a particular category is not important to a user of the control device 210 (i.e., someone trying to determine whether a piece of meat is kosher rather than trying to determine the type of meat), the control device 210 may provide a classification with a greater likelihood of accuracy associated with providing identification of the particular category.
虽然图6示出过程600的示例块,但与在图6中描绘的那些块相比,在一些实现方式中,过程600可包括额外的块、更少的块、不同的块或不同地布置的块。另外或可选地,过程600的两个或更多个块可并行地被执行。Although Figure 6 shows example blocks of process 600, in some implementations, process 600 may include additional blocks, fewer blocks, different blocks, or differently arranged blocks compared to those depicted in Figure 6. Additionally or alternatively, two or more blocks of process 600 may be performed in parallel.
图7A和7B是与和图6所示的示例过程600相关的预测成功率有关的示例实现700的图。图7A和7B示出使用基于层次式支持向量机(hier-SVM线性)的技术的原材料识别的示例结果。Figures 7A and 7B are diagrams of an example implementation 700 related to prediction success rates associated with the example process 600 shown in Figure 6. Figures 7A and 7B illustrate example results of raw material identification using a hierarchical support vector machine (hier-SVM linear) based technique.
如在图7A中且由参考数字710所示的,为未知组提供一组置信度度量。对于未知组的每个样品,控制设备210确定样品与全局分类模型的每个类别相关的概率。对于未知组的每个样品,最大概率与第二最大(下一最大)概率比较。如由参考数字712所示的,未知组的最大概率范围从大约5%到大约20%。如由参考数字714所示的,未知组的第二最大概率范围从大约0%到大约5%。如由参考数字716所示的,突出显示了控制设备210基于全局分类模型不正确地分类的未知组的样品(例如,在未知组中的2645个样品中的84个样品不正确地被分类)。As shown in Fig. 7 A and by reference numeral 710, provide a group of confidence metrics for unknown group.For each sample of unknown group, control device 210 determines the probability that sample is relevant to each category of global classification model.For each sample of unknown group, maximum probability is compared with second maximum (next maximum) probability.As shown by reference numeral 712, the maximum probability range of unknown group is from about 5% to about 20%.As shown by reference numeral 714, the second maximum probability range of unknown group is from about 0% to about 5%.As shown by reference numeral 716, highlight the sample of unknown group that control device 210 incorrectly classifies based on global classification model (for example, 84 samples in 2645 samples in unknown group are incorrectly classified).
如在图7A中且由参考数字720所示的,为未知组提供一组置信度度量。对于未知组的每个样品,控制设备210确定样品与相应的全局分类模型的每个类别相关的概率。对于未知组的每个样品,最大概率与第二最大(下一最大)概率比较。如由参考数字722所示的,未知组的最大概率范围从大约50%到大约98%。如由参考数字724所示的,未知组的第二最大概率范围从大约2%到大约45%。而且,对于除了一个样品(对于其概率差异大约是8%,且然而对于其正确的分类被执行)以外的未知组的每个样品,在最大概率和第二最大概率之间的概率差异大于大约0.33(33%)。基于执行一组分类,控制设备210将未知组的所有成员正确地分类。As shown in Fig. 7 A and by reference numeral 720, provide a group of confidence metrics for unknown group.For each sample of unknown group, control device 210 determines the probability that sample is relevant to each category of corresponding global classification model.For each sample of unknown group, maximum probability is compared with second maximum (next maximum) probability.As shown by reference numeral 722, the maximum probability range of unknown group is from about 50% to about 98%.As shown by reference numeral 724, the second maximum probability range of unknown group is from about 2% to about 45%.And, for each sample of unknown group except a sample (for its probability difference is about 8%, and yet for its correct classification being performed), the probability difference between maximum probability and second maximum probability is greater than about 0.33 (33%).Based on performing a group of classification, control device 210 correctly classifies all members of unknown group.
关于图7B,当在分类模型(例如,全局分类模型、局部分类模型等)的每个类别中的样品的数量未能满足阈值时,控制设备210在将未知组的样品分配给类别时可确定减小的置信度度量和相关预测准确度。如由参考数字730所示的,当在每个类别中的样品的数量不满足阈值时,控制设备210在对未知组基于全局分类模型执行第一分类并基于一组局部分类模型(例如,基于概率的SVM分类器局部分类模型)执行第二分类之后将4451个样品中的128个样品错误地分类。如由参考数字740所示的,当控制设备210基于全局分类模型执行另一第一分类并基于另一组局部分类模型(例如,基于决策值的SVM分类器局部分类模型)执行另一第二分类时,控制设备210将4451个样品中的1个样品错误地分类。以这种方式,控制设备210利用基于决策值的SVM分类器来相对于基于概率的SVM分类器提高分类准确度。About Fig. 7 B, when the quantity of the sample in each category of classification model (for example, global classification model, local classification model etc.) fails to meet threshold value, control device 210 can determine the confidence measure and the relevant prediction accuracy that reduce when the sample of unknown group is assigned to category.As shown by reference numeral 730, when the quantity of the sample in each category does not meet threshold value, control device 210 is carrying out the first classification based on global classification model and based on one group of local classification model (for example, based on the SVM classifier local classification model of probability) after carrying out the second classification to unknown group, 128 samples in 4451 samples are mistakenly classified.As shown by reference numeral 740, when control device 210 carries out another first classification based on global classification model and based on another group of local classification model (for example, based on the SVM classifier local classification model of decision value) when carrying out another second classification, control device 210 is mistakenly classified with 1 sample in 4451 samples.In this way, control device 210 utilizes the SVM classifier based on decision value to improve classification accuracy relative to the SVM classifier based on probability.
如上面指示的,图7A和7B仅作为例子被提供。其它例子是可能的并可不同于关于图7A和7B所述的内容。As indicated above, Figures 7A and 7B are provided as examples only. Other examples are possible and may differ from what is described with respect to Figures 7A and 7B.
以这种方式,控制设备210利用全局分类模型和基于全局分类模型生成的局部分类模型来执行RMID。In this manner, the control device 210 performs RMID using the global classification model and the local classification model generated based on the global classification model.
前述公开提供了说明和描述,但并不旨在是无遗漏的或将实现限制到所公开的精确形式。修改和变化按照上述公开是可能的或可从实现方式的实施被获取。The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations.
在本文结合阈值来描述这样的实现方式。如在本文使用的,满足阈值可以指值大于阈值、多于阈值、高于阈值、大于或等于阈值、小于阈值、少于阈值、低于阈值、小于或等于阈值、等于阈值等。Such implementations are described herein in conjunction with thresholds. As used herein, satisfying a threshold may refer to a value being greater than a threshold, more than a threshold, higher than a threshold, greater than or equal to a threshold, less than a threshold, less than a threshold, lower than a threshold, less than or equal to a threshold, equal to a threshold, etc.
将明显的是,可以在不同形式的硬件、固件或硬件和软件的组合中实现本文所述的系统和/或方法。用于实现这些系统和/或方法的实际专用控制硬件或软件代码不是限制实现方式。因此,系统和/或方法的操作和行为在本文被描述而不参考特定的软件代码——应理解,软件和硬件可被设计成基于本文的描述来实现系统和/或方法。It will be apparent that the systems and/or methods described herein may be implemented in various forms of hardware, firmware, or a combination of hardware and software. The actual dedicated control hardware or software code used to implement these systems and/or methods is not limiting of the implementation. Thus, the operation and behavior of the systems and/or methods are described herein without reference to specific software code—it being understood that software and hardware can be designed to implement the systems and/or methods based on the description herein.
即使在权利要求中详述和/或在说明书中公开了特征的特定组合,但这些组合也不意欲限制可能的实现的公开。事实上,可以用未特别在权利要求中详述和/或在说明书中公开的方式来组合这些特征中的很多。虽然所列出的每个从属权利要求可直接从属于仅仅一个权利要求,但可能的实现的公开包括每个从属权利要求与权利要求组中的每个其它权利要求组合。Even if particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible implementations. In fact, many of these features can be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each of the listed dependent claims may be directly dependent on only one claim, the disclosure of possible implementations includes each dependent claim in combination with every other claim in the claim group.
在本文使用的元件、行动或指令不应被解释为关键的或必要的,除非明确地这样描述。此外,如在本文使用的,冠词“一(a)”和“一(an)”意欲包括一个或多个项,并可与“一个或多个”可互换地使用。此外,如在本文使用的,术语“组”意欲包括一个或多个项(例如,相关项、不相关项、相关项和不相关项的组合等),并可与“一个或多个”可互换地使用。在只有一个项被预期的场合,使用术语“一个”或类似的语言。此外,如在本文使用的,术语“具有(has)”、“具有(have)”、“具有(having)”等被规定为开放式术语。此外,短语“基于”意欲意指“至少部分地基于”,除非明确地规定相反的情况。The elements, actions or instructions used in this article should not be interpreted as critical or necessary unless explicitly described as such. In addition, as used in this article, the articles "a" and "an" are intended to include one or more items and can be used interchangeably with "one or more". In addition, as used in this article, the term "group" is intended to include one or more items (for example, related items, unrelated items, a combination of related items and unrelated items, etc.), and can be used interchangeably with "one or more". Where only one item is expected, the term "one" or similar language is used. In addition, as used in this article, the terms "has", "have", "having" etc. are defined as open terms. In addition, the phrase "based on" is intended to mean "based at least in part on", unless explicitly specified to the contrary.
Claims (14)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US62/210,198 | 2015-08-26 |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| HK42020015700.6A Division HK40023284A (en) | 2015-08-26 | 2017-07-04 | Identification using spectroscopy |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| HK42020015700.6A Addition HK40023284A (en) | 2015-08-26 | 2017-07-04 | Identification using spectroscopy |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1232956A1 HK1232956A1 (en) | 2018-01-19 |
| HK1232956B true HK1232956B (en) | 2021-01-08 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106483083B (en) | Identification using spectroscopy | |
| JP7238056B2 (en) | Discrimination for Spectroscopic Classification with Reduced False Positives | |
| CN110084262B (en) | Reduced false positive identification for spectral quantification | |
| US20230273121A1 (en) | Outlier detection for spectroscopic classification | |
| HK1232956B (en) | Identification using spectroscopy | |
| HK1232956A1 (en) | Identification using spectroscopy | |
| HK40023284A (en) | Identification using spectroscopy | |
| HK40004123A (en) | Reduced false positive identification for spectroscopic classification | |
| HK40005112B (en) | Reduced false positive identification for spectroscopic classification | |
| HK40005112A (en) | Reduced false positive identification for spectroscopic classification | |
| HK40004124A (en) | Reduced false positive identification for spectroscopic quantification |