CN116337911A

CN116337911A - Method and device for rapid determination of heavy metal content by portable XRF

Info

Publication number: CN116337911A
Application number: CN202310377502.5A
Authority: CN
Inventors: 张元�; 张丹
Original assignee: Beijing Academy Of Ecological And Environmental Protection
Current assignee: Beijing Academy Of Ecological And Environmental Protection
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-06-27

Abstract

The invention provides a method and device for quickly measuring heavy metal content by portable XRF, which relates to the technical field of environmental monitoring, including: obtaining pXRF measurement data of various heavy metal elements in a sample to be determined; using a preset data dictionary to perform pXRF measurement data Processing to obtain the characteristic data corresponding to the pXRF measurement data; use the target machine learning model to process the characteristic data to obtain the specified heavy metal element content corresponding to the pXRF measurement data; wherein, the target machine learning model is based on multiple sets of pXRF measurement data and their corresponding A model determined after training on laboratory content measurements of specified heavy metal elements. Compared with methods that use common empirical formulas or linear regression to determine the content of heavy metals, this method has stronger sample adaptability and higher quantitative accuracy, and, because there is no need to wait for the laboratory to prepare a standard curve, it can also guarantee the accuracy of pXRF. quickness.

Description

Method and device for rapid determination of heavy metal content by portable XRF

技术领域technical field

本发明涉及环境监测的技术领域，尤其是涉及一种便携式XRF快速测定重金属含量的方法和装置。The invention relates to the technical field of environmental monitoring, in particular to a method and device for quickly measuring heavy metal content by portable XRF.

背景技术Background technique

便携式X射线荧光光谱仪(pXRF)可用于土壤中重金属元素定量分析，目前在环境保护领域应用较多，其在制样方便、无损、快速等方面优于其它分析方法，但其在定量精度和样品适应范围等方面一直受到挑战。Portable X-ray fluorescence spectrometer (pXRF) can be used for quantitative analysis of heavy metal elements in soil. Currently, it is widely used in the field of environmental protection. It is superior to other analytical methods in terms of convenient sample preparation, non-destructive, and rapid, but its quantitative accuracy and sample Aspects such as the range of adaptation have been challenged.

传统的便携式X射线荧光光谱分析会采用经验系数法或多元回归法，通过对标准样品的各个元素的荧光强度与含量建立数学模型，得到校准曲线，来对元素含量进行定量分析。这种方法的缺点是需要先制备校准曲线，一定程度上影响了pXRF用于元素分析的快捷性。并且，即使有一系列标准样品用于建立校准曲线，对未知样品的定量分析精度也受到与标准样品基体一致性、元素之间含量关系、样品制备是否达到标准样品物理状态等情况的影响。Traditional portable X-ray fluorescence spectroscopic analysis uses empirical coefficient method or multiple regression method to establish a mathematical model for the fluorescence intensity and content of each element in the standard sample to obtain a calibration curve for quantitative analysis of the element content. The disadvantage of this method is that a calibration curve needs to be prepared first, which affects the speed of pXRF for elemental analysis to a certain extent. Moreover, even if a series of standard samples are used to establish a calibration curve, the accuracy of quantitative analysis of unknown samples is also affected by the consistency of the standard sample matrix, the content relationship between elements, and whether the sample preparation reaches the physical state of the standard sample.

发明内容Contents of the invention

本发明的目的在于提供一种便携式XRF快速测定重金属含量的方法和装置，以缓解了现有技术中便携式XRF测定重金属含量的方法存在的定量分析精度差、样本适应性弱以及分析时间长的技术问题。The purpose of the present invention is to provide a method and device for the rapid determination of heavy metal content by portable XRF, so as to alleviate the poor quantitative analysis accuracy, weak sample adaptability and long analysis time in the prior art. question.

第一方面，本发明提供一种便携式XRF快速测定重金属含量的方法，包括：获取待测定样本中多种重金属元素的pXRF测量数据；利用预设数据字典对所述pXRF测量数据进行处理，得到所述pXRF测量数据对应的特征数据；其中，所述预设数据字典中包括：特征名称、特征含义和数值类型；利用目标机器学习模型对所述特征数据进行处理，得到所述pXRF测量数据对应的指定重金属元素含量；其中，所述目标机器学习模型是基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后确定的模型。In the first aspect, the present invention provides a method for the rapid determination of heavy metal content by portable XRF, comprising: obtaining the pXRF measurement data of various heavy metal elements in the sample to be determined; using a preset data dictionary to process the pXRF measurement data to obtain the The feature data corresponding to the pXRF measurement data; wherein, the preset data dictionary includes: feature name, feature meaning and value type; the target machine learning model is used to process the feature data to obtain the corresponding pXRF measurement data The content of specified heavy metal elements; wherein, the target machine learning model is a model determined after training based on multiple sets of pXRF measurement data and corresponding laboratory content measurement values of specified heavy metal elements.

在可选的实施方式中，所述方法还包括：获取多组样本测量数据；其中，每组所述样本测量数据包括：多种重金属元素的pXRF测量数据及其对应的指定重金属元素的实验室含量测量值；利用特征生成工具对所述多组样本测量数据中的pXRF测量数据进行处理，得到所述多组样本测量数据的若干数据特征；计算每种所述数据特征的特征重要性评分，以保留特征重要性评分大于0的目标数据特征；基于所有所述目标数据特征构建所述预设数据字典。In an optional embodiment, the method further includes: acquiring multiple sets of sample measurement data; wherein, each set of sample measurement data includes: pXRF measurement data of multiple heavy metal elements and their corresponding designated heavy metal element laboratory Content measurement value; Utilize feature generation tool to process the pXRF measurement data in the multiple groups of sample measurement data, obtain several data features of the multiple groups of sample measurement data; calculate the feature importance score of each described data feature, Retaining target data features with feature importance scores greater than 0; constructing the preset data dictionary based on all target data features.

在可选的实施方式中，所述方法还包括：利用所述预设数据字典对每组所述样本测量数据中的pXRF测量数据进行处理，得到每组所述样本测量数据对应的特征数据；基于多组所述样本测量数据对应的特征数据及其对应的指定重金属元素的实验室含量测量值对初始机器学习模型进行训练，得到所述目标机器学习模型。In an optional embodiment, the method further includes: using the preset data dictionary to process the pXRF measurement data in each set of sample measurement data to obtain characteristic data corresponding to each set of sample measurement data; The initial machine learning model is trained based on multiple sets of feature data corresponding to the sample measurement data and the corresponding laboratory content measurement values of specified heavy metal elements to obtain the target machine learning model.

在可选的实施方式中，基于多组所述样本测量数据对应的特征数据及其对应的指定重金属元素的实验室含量测量值对初始机器学习模型进行训练，包括：按照预设比例将所述多组样本测量数据划分为训练集、验证集和测试集；利用所述训练集对所述初始机器学习模型进行训练，并利用所述验证集进行早停，得到当前最优模型；判断所述当前最优模型在所述测试集上的泛化性能是否符合预设条件；若符合，则将所述当前最优模型作为所述目标机器学习模型；若不符合，则调整预设模型参数，并重新对所述初始机器学习模型进行训练，直至得到所述目标机器学习模型。In an optional embodiment, the initial machine learning model is trained based on multiple sets of characteristic data corresponding to the sample measurement data and the corresponding laboratory content measurement values of specified heavy metal elements, including: Multiple sets of sample measurement data are divided into a training set, a verification set and a test set; using the training set to train the initial machine learning model, and using the verification set to perform early stopping to obtain the current optimal model; judging the Whether the generalization performance of the current optimal model on the test set meets the preset conditions; if so, use the current optimal model as the target machine learning model; if not, adjust the preset model parameters, And retrain the initial machine learning model until the target machine learning model is obtained.

在可选的实施方式中，利用所述训练集对所述初始机器学习模型进行训练，并利用所述验证集进行早停，包括：重复执行多轮下述步骤，直至目标轮的重金属含量平均绝对误差小于其上一轮的重金属含量平均绝对误差，且小于其后续指定轮数的重金属含量平均绝对误差；初始化模型参数，得到第一模型参数；基于所述第一模型参数和所述训练集对所述初始机器学习模型进行训练，得到第一模型；计算所述第一模型在所述验证集上的重金属含量平均绝对误差。In an optional embodiment, using the training set to train the initial machine learning model, and using the verification set to perform early stopping, includes: repeatedly performing the following steps for multiple rounds until the heavy metal content of the target round is averaged The absolute error is less than the average absolute error of the heavy metal content of the previous round, and less than the average absolute error of the heavy metal content of the subsequent specified number of rounds; initialize the model parameters to obtain the first model parameters; based on the first model parameters and the training set The initial machine learning model is trained to obtain a first model; and the average absolute error of the heavy metal content of the first model on the verification set is calculated.

在可选的实施方式中，所述目标机器学习模型包括以下其中一种：极端梯度提升树模型，神经网络模型。In an optional implementation manner, the target machine learning model includes one of the following: an extreme gradient boosting tree model, and a neural network model.

在可选的实施方式中，采用以下其中任一种方法计算每种所述数据特征的特征重要性评分：三折交叉验证的LASSO回归算法，岭回归算法，线性回归算法，树模型。In an optional embodiment, any of the following methods is used to calculate the feature importance score of each of the data features: three-fold cross-validated LASSO regression algorithm, ridge regression algorithm, linear regression algorithm, tree model.

第二方面，本发明提供一种便携式XRF快速测定重金属含量的装置，包括：第一获取模块，用于获取待测定样本中多种重金属元素的pXRF测量数据；第一处理模块，用于利用预设数据字典对所述pXRF测量数据进行处理，得到所述pXRF测量数据对应的特征数据；其中，所述预设数据字典中包括：特征名称、特征含义和数值类型；第二处理模块，用于利用目标机器学习模型对所述特征数据进行处理，得到所述pXRF测量数据对应的指定重金属元素含量；其中，所述目标机器学习模型是基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后确定的模型。In the second aspect, the present invention provides a portable XRF device for quickly determining the content of heavy metals, including: a first acquisition module, used to acquire pXRF measurement data of various heavy metal elements in the sample to be determined; It is assumed that the data dictionary processes the pXRF measurement data to obtain the characteristic data corresponding to the pXRF measurement data; wherein, the preset data dictionary includes: characteristic name, characteristic meaning and value type; the second processing module is used for The target machine learning model is used to process the characteristic data to obtain the specified heavy metal element content corresponding to the pXRF measurement data; wherein, the target machine learning model is an experiment based on multiple sets of pXRF measurement data and their corresponding specified heavy metal elements A model determined after training on chamber content measurements.

第三方面，本发明提供一种电子设备，包括存储器、处理器，所述存储器上存储有可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现前述实施方式中任一项所述的便携式XRF快速测定重金属含量的方法的步骤。In a third aspect, the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the aforementioned implementation mode when executing the computer program The step of the method for the portable XRF rapid determination heavy metal content described in any one.

第四方面，本发明提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令被处理器执行时实现前述实施方式中任一项所述的便携式XRF快速测定重金属含量的方法。In a fourth aspect, the present invention provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions are executed by a processor, the portable XRF rapid Method for the determination of heavy metal content.

本发明提供了一种便携式XRF快速测定重金属含量的方法，应用基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后的目标机器学习模型，处理多种重金属元素的pXRF测量数据对应的特征数据，以得到多种重金属元素的pXRF测量数据对应的指定重金属元素含量。与利用普通的经验公式或线性回归测定重金属含量的方法相比，该方法具有更强的样本适应性和更高的定量精度，并且，因为无需等待实验室制备标准曲线，所以还能保障pXRF的快捷性。The invention provides a portable XRF method for quickly determining the content of heavy metals, using a target machine learning model trained based on multiple sets of pXRF measurement data and corresponding laboratory content measurement values of specified heavy metal elements to process the pXRF of various heavy metal elements Measure the characteristic data corresponding to the data to obtain the specified heavy metal element content corresponding to the pXRF measurement data of various heavy metal elements. Compared with methods that use common empirical formulas or linear regression to determine the content of heavy metals, this method has stronger sample adaptability and higher quantitative accuracy, and, because there is no need to wait for the laboratory to prepare a standard curve, it can also guarantee the accuracy of pXRF. quickness.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the specific implementation or description of the prior art. Obviously, the accompanying drawings in the following description The drawings show some implementations of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.

图1为本发明实施例提供的一种便携式XRF快速测定重金属含量的方法的流程图；Fig. 1 is the flow chart of a kind of method for portable XRF rapid determination heavy metal content provided by the embodiment of the present invention;

图2为本发明实施例提供的一种构建预设数据字典的流程图；FIG. 2 is a flow chart of constructing a preset data dictionary provided by an embodiment of the present invention;

图3为本发明实施例提供的一种便携式XRF快速测定重金属含量的装置的功能模块图；Fig. 3 is a functional block diagram of a portable XRF rapid determination of heavy metal content provided by an embodiment of the present invention;

图4为本发明实施例提供的一种电子设备的示意图。Fig. 4 is a schematic diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

因此，以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围，而是仅仅表示本发明的选定实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

下面结合附图，对本发明的一些实施方式作详细说明。在不冲突的情况下，下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

实施例一Embodiment one

图1为本发明实施例提供的一种便携式XRF快速测定重金属含量的方法的流程图，如图1所示，该方法具体包括如下步骤：Fig. 1 is a flow chart of a method for the rapid determination of heavy metal content by portable XRF provided by the embodiment of the present invention. As shown in Fig. 1, the method specifically includes the following steps:

步骤S102，获取待测定样本中多种重金属元素的pXRF测量数据。Step S102, acquiring pXRF measurement data of various heavy metal elements in the sample to be determined.

具体的，对于任意的重金属污染场地，使用便携式X射线荧光光谱仪(pXRF)可测得待测定样本中多种重金属元素对应的pXRF测量值，多种金属元素一般包括：砷、镉、铬、铜、铅、锌、汞和镍。本发明实施不对多种重金属元素的数量以及元素类型进行具体限定，用户可以根据实际需求进行设定。Specifically, for any heavy metal-contaminated site, the pXRF measurement values corresponding to various heavy metal elements in the sample to be determined can be measured by using a portable X-ray fluorescence spectrometer (pXRF). , lead, zinc, mercury and nickel. The implementation of the present invention does not specifically limit the quantity and element type of various heavy metal elements, and users can set them according to actual needs.

步骤S104，利用预设数据字典对pXRF测量数据进行处理，得到pXRF测量数据对应的特征数据。Step S104, using the preset data dictionary to process the pXRF measurement data to obtain characteristic data corresponding to the pXRF measurement data.

本发明实施例选择使用机器学习模型测定重金属含量，因此，为了提升模型表现，在获取到待测定样本中多种重金属元素的pXRF测量数据之后，本发明实施例进一步进行特征工程，具体为利用预设数据字典对pXRF测量数据进行处理，其中，预设数据字典中包括：特征名称、特征含义和数值类型。也就是说，从原始的pXRF测量数据中提取特征数据，这些特征数据的名称、含义和数值类型都是预先定义的。例如，pXRF测量数据对应的特征数据可以是：重金属元素M对应的pXRF测量值与重金属元素N对应的pXRF测量值的加和，或者，重金属元素M对应的pXRF测量值与重金属元素W对应的pXRF测量值的差。The embodiment of the present invention chooses to use the machine learning model to determine the content of heavy metals. Therefore, in order to improve the performance of the model, after obtaining the pXRF measurement data of various heavy metal elements in the sample to be determined, the embodiment of the present invention further performs feature engineering. The data dictionary is set to process the pXRF measurement data, wherein the preset data dictionary includes: feature name, feature meaning and value type. That is to say, feature data are extracted from the original pXRF measurement data, and the names, meanings and value types of these feature data are all pre-defined. For example, the characteristic data corresponding to the pXRF measurement data can be: the sum of the pXRF measurement value corresponding to the heavy metal element M and the pXRF measurement value corresponding to the heavy metal element N, or the pXRF measurement value corresponding to the heavy metal element M and the pXRF measurement value corresponding to the heavy metal element W The difference in measured values.

步骤S106，利用目标机器学习模型对特征数据进行处理，得到pXRF测量数据对应的指定重金属元素含量。Step S106, using the target machine learning model to process the feature data to obtain the specified heavy metal element content corresponding to the pXRF measurement data.

其中，目标机器学习模型是基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后确定的模型。Wherein, the target machine learning model is a model determined after training based on multiple sets of pXRF measurement data and corresponding laboratory content measurement values of specified heavy metal elements.

在本发明实施例中，目标机器学习模型的输入为pXRF测量数据对应的特征数据，模型的输出为pXRF测量数据对应的指定重金属元素含量。指定重金属元素表示上述多种重金属元素中的任一种重金属元素。在本发明实施例中，目标机器学习模型是经过多组样本测量数据训练后得到的模型，每一组样本测量数据是由多种重金属元素的pXRF测量数据，以及该pXRF测量数据对应的指定重金属元素的实验室含量测量值构成的。因此，本发明实施例所提供的方法比传统的方法具有更强的样本适应性和更高的定量精度。In the embodiment of the present invention, the input of the target machine learning model is the characteristic data corresponding to the pXRF measurement data, and the output of the model is the specified heavy metal element content corresponding to the pXRF measurement data. The designated heavy metal element means any one of the above-mentioned multiple heavy metal elements. In the embodiment of the present invention, the target machine learning model is a model obtained after training with multiple sets of sample measurement data, each set of sample measurement data is the pXRF measurement data of various heavy metal elements, and the specified heavy metal corresponding to the pXRF measurement data composed of laboratory content measurements of the elements. Therefore, the method provided by the embodiment of the present invention has stronger sample adaptability and higher quantitative precision than the traditional method.

本发明实施例提供了一种便携式XRF快速测定重金属含量的方法，应用基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后的目标机器学习模型，处理多种重金属元素的pXRF测量数据对应的特征数据，以得到多种重金属元素的pXRF测量数据对应的指定重金属元素含量。与利用普通的经验公式或线性回归测定重金属含量的方法相比，该方法具有更强的样本适应性和更高的定量精度，并且，因为无需等待实验室制备标准曲线，所以还能保障pXRF的快捷性。The embodiment of the present invention provides a portable XRF method for quickly determining the content of heavy metals, using a target machine learning model trained based on multiple sets of pXRF measurement data and corresponding laboratory content measurement values of specified heavy metal elements to process various heavy metal elements The characteristic data corresponding to the pXRF measurement data in order to obtain the specified heavy metal element content corresponding to the pXRF measurement data of various heavy metal elements. Compared with methods that use common empirical formulas or linear regression to determine the content of heavy metals, this method has stronger sample adaptability and higher quantitative accuracy, and, because there is no need to wait for the laboratory to prepare a standard curve, it can also guarantee the accuracy of pXRF. quickness.

在一个可选的实施方式中，如图2所示，本发明方法还包括如下步骤：In an optional embodiment, as shown in Figure 2, the method of the present invention also includes the following steps:

步骤S201，获取多组样本测量数据。Step S201, acquiring multiple sets of sample measurement data.

其中，每组样本测量数据包括：多种重金属元素的pXRF测量数据及其对应的指定重金属元素的实验室含量测量值。Wherein, each set of sample measurement data includes: pXRF measurement data of multiple heavy metal elements and their corresponding laboratory content measurement values of specified heavy metal elements.

步骤S202，利用特征生成工具对多组样本测量数据中的pXRF测量数据进行处理，得到多组样本测量数据的若干数据特征。Step S202, using a feature generation tool to process the pXRF measurement data in the multiple sets of sample measurement data to obtain several data features of the multiple sets of sample measurement data.

步骤S203，计算每种数据特征的特征重要性评分，以保留特征重要性评分大于0的目标数据特征。Step S203, calculating the feature importance score of each data feature, so as to retain the target data feature with feature importance score greater than 0.

步骤S204，基于所有目标数据特征构建预设数据字典。Step S204, constructing a preset data dictionary based on all target data features.

在本发明实施例中，为了建立目标机器学习模型所需的预设数据字典，首先获取多组样本测量数据，如果多种金属元素包括：砷、镉、铬、铜、铅、锌、汞和镍，那么每一组样本测量数据如下述如表1所示。In the embodiment of the present invention, in order to establish the preset data dictionary required by the target machine learning model, multiple sets of sample measurement data are first obtained, if the multiple metal elements include: arsenic, cadmium, chromium, copper, lead, zinc, mercury and Nickel, then the measurement data of each group of samples are shown in Table 1 below.

表1样本测量数据Table 1 sample measurement data

序号serial number 名称name 符号symbol 11 砷的pXRF测量值pXRF measurements of arsenic AsAs 22 镉的pXRF测量值Cadmium pXRF measurements CdCd 33 铬的pXRF测量值Chromium pXRF measurements CrCr 44 铜的pXRF测量值Copper pXRF measurements CuCu 55 铅的pXRF测量值pXRF measurements of lead PbPb 66 锌的pXRF测量值Zinc pXRF measurements ZnZn 77 汞的pXRF测量值Mercury pXRF measurements HgHg 88 镍的pXRF测量值pXRF measurements of nickel NiNi 99 指定重金属X的实验室含量测量值Specify laboratory content measurements for heavy metal X X_labX_lab

接下来，利用特征生成工具对多组样本测量数据中的pXRF测量数据进行处理，以快速衍生出多组样本测量数据的大量数据特征。本发明实施例不对特征生成工具进行具体限定，用户可以根据实际需求进行选择，例如使用python包FeatureTools，或者也可通过手动编程实现数据特征的提取。Next, a feature generation tool is used to process the pXRF measurement data in multiple sets of sample measurement data, so as to quickly derive a large number of data features of the multiple sets of sample measurement data. The embodiment of the present invention does not specifically limit the feature generation tool, and the user can select according to actual needs, for example, use the python package FeatureTools, or realize the extraction of data features through manual programming.

由于特征生成工具产生的数据特征较多，且其中存在一些不重要的数据特征，因此，为了减少模型训练前不必要的数据计算量，在得到若干数据特征之后，进一步进行数据特征筛选。具体的，计算每种数据特征的特征重要性评分，然后将特征重要性等于0的特征剔除，只保留特征重要性评分大于0的目标数据特征，最后基于所有目标数据特征构建预设数据字典。Since there are many data features generated by the feature generation tool, and there are some unimportant data features, in order to reduce the unnecessary data calculation before model training, after obtaining some data features, further data feature screening is carried out. Specifically, the feature importance score of each data feature is calculated, and then the features whose feature importance is equal to 0 are eliminated, and only the target data features with feature importance scores greater than 0 are retained, and finally a preset data dictionary is constructed based on all target data features.

本发明实施例不对特征重要性评分的计算方法进行具体限定，用户可以采用以下其中任一种方法计算每种数据特征的特征重要性评分：三折交叉验证的LASSO回归算法，岭回归算法，线性回归算法，树模型。其中，LASSO回归算法是在准确度和速度中间的一个平衡选择。线性回归算法的速度会更快，但准确度不够；树模型的准确度更高，但速度太慢。The embodiment of the present invention does not specifically limit the calculation method of the feature importance score. Users can use any of the following methods to calculate the feature importance score of each data feature: three-fold cross-validated LASSO regression algorithm, ridge regression algorithm, linear Regression algorithm, tree model. Among them, the LASSO regression algorithm is a balanced choice between accuracy and speed. A linear regression algorithm would be faster but not as accurate; a tree model would be more accurate but too slow.

上文中对如何构建预设数据字典的方法进行了详细的描述，下面对如何获得目标机器学习模型进行介绍。在一个可选的实施方式中，本发明方法还包括如下步骤：The method of how to build the preset data dictionary is described in detail above, and how to obtain the target machine learning model is introduced below. In an optional embodiment, the method of the present invention also includes the following steps:

步骤S301，利用预设数据字典对每组样本测量数据中的pXRF测量数据进行处理，得到每组样本测量数据对应的特征数据。Step S301, using a preset data dictionary to process the pXRF measurement data in each set of sample measurement data to obtain characteristic data corresponding to each set of sample measurement data.

步骤S302，基于多组样本测量数据对应的特征数据及其对应的指定重金属元素的实验室含量测量值对初始机器学习模型进行训练，得到目标机器学习模型。In step S302, the initial machine learning model is trained based on the feature data corresponding to multiple sets of sample measurement data and the corresponding laboratory content measurement values of specified heavy metal elements to obtain a target machine learning model.

本发明实施例为了提高机器学习结果的质量，在获取到多组样本测量数据之后，模型训练时并不是直接使用其中的多种重金属元素的pXRF测量数据作为模型输入，而是利用预设数据字典分别对每组样本测量数据中的多种重金属元素的pXRF测量数据进行处理，以得到每组样本测量数据对应的特征数据，然后再将得到的特征数据作为模型输入。训练过程中观察模型预测的指定重金属元素的重金属含量与该组样本测量数据中的多种重金属元素的pXRF测量数据对应的实验室含量测量值的绝对误差。多个样本的绝对误差取平均，就是平均绝对误差MAE。如果平均绝对误差为零，则代表模型预测完全符合实验室数据，即模型完美。但实际训练过程中，可能并不会得到完美的模型，因此，只要达到预设的训练结束条件即可认为得到目标机器学习模型。In order to improve the quality of the machine learning results, the embodiment of the present invention does not directly use the pXRF measurement data of various heavy metal elements as model input during model training after obtaining multiple sets of sample measurement data, but uses the preset data dictionary The pXRF measurement data of multiple heavy metal elements in each group of sample measurement data are processed separately to obtain the characteristic data corresponding to each group of sample measurement data, and then the obtained characteristic data are used as model input. During the training process, observe the absolute error of the heavy metal content of the specified heavy metal element predicted by the model and the laboratory content measurement value corresponding to the pXRF measurement data of various heavy metal elements in the set of sample measurement data. The absolute error of multiple samples is averaged, which is the mean absolute error MAE. If the mean absolute error is zero, it means that the model predictions are in full agreement with the laboratory data, that is, the model is perfect. However, in the actual training process, the perfect model may not be obtained. Therefore, as long as the preset training end conditions are met, the target machine learning model can be considered to be obtained.

在一个可选的实施方式中，上述步骤S302，基于多组样本测量数据对应的特征数据及其对应的指定重金属元素的实验室含量测量值对初始机器学习模型进行训练，具体包括如下步骤：In an optional embodiment, the above step S302 is to train the initial machine learning model based on the characteristic data corresponding to multiple sets of sample measurement data and the corresponding laboratory content measurement values of specified heavy metal elements, specifically including the following steps:

步骤S3021，按照预设比例将多组样本测量数据划分为训练集、验证集和测试集。Step S3021, divide multiple sets of sample measurement data into training set, verification set and test set according to preset proportions.

步骤S3022，利用训练集对初始机器学习模型进行训练，并利用验证集进行早停，得到当前最优模型。Step S3022, use the training set to train the initial machine learning model, and use the verification set to perform early stopping to obtain the current optimal model.

步骤S3023，判断当前最优模型在测试集上的泛化性能是否符合预设条件。Step S3023, judging whether the generalization performance of the current optimal model on the test set meets the preset condition.

若符合，则执行下述步骤S3024；若不符合，则执行下述步骤S3025。If yes, execute the following step S3024; if not, execute the following step S3025.

步骤S3024，将当前最优模型作为目标机器学习模型；Step S3024, using the current optimal model as the target machine learning model;

步骤S3025，调整预设模型参数，并重新对初始机器学习模型进行训练，直至得到目标机器学习模型。Step S3025, adjusting the preset model parameters, and retraining the initial machine learning model until the target machine learning model is obtained.

具体的，一般在对机器学习模型训练时只划分训练集和验证集，但是这种方法对机器学习模型的泛化能力没有保障，经常出现过拟合的情况，导致模型实际应用时效果不如预期。为了解决上述问题，本发明实施例按照预设比例将多组样本测量数据划分为训练集、验证集和测试集，例如训练集80％，验证集10％，测试集10％。用户可以根据实际需求调整上述比例。Specifically, when training a machine learning model, only the training set and the verification set are generally divided, but this method does not guarantee the generalization ability of the machine learning model, and overfitting often occurs, resulting in the actual application of the model not as good as expected . In order to solve the above problems, the embodiment of the present invention divides multiple sets of sample measurement data into a training set, a verification set and a test set according to a preset ratio, for example, 80% of the training set, 10% of the verification set, and 10% of the test set. Users can adjust the above ratio according to actual needs.

在将样本划分结束之后，利用训练集对初始机器学习模型进行训练，并利用验证集进行早停，也就是说，训练时观察验证集上的平均绝对误差，只要在出现比上一轮训练得到的MAE更低的MAE(记作MAE_t)之后，如果其后续指定轮数的MAE均大于MAE_t，那么模型训练停止，并输出在验证集上表现最优的模型，也即当前最优模型。已知模型训练是迭代进行的，每一轮迭代都会更新一次模型的各项权重，每一组权重就对应一个模型，本发明实施例中，哪种权重最好需要根据平均绝对误差MAE来定。因此，上述当前最优模型是指平均绝对误差MAE最低的那组权重对应的机器学习模型。After the sample is divided, use the training set to train the initial machine learning model, and use the verification set to stop early, that is, observe the mean absolute error on the verification set during training, as long as it is higher than the previous round of training. After the lower MAE (denoted as MAE _t ), if the MAE of the subsequent specified number of rounds is greater than MAE _t , then the model training stops, and the model with the best performance on the verification set is output, that is, the current optimal model . It is known that model training is carried out iteratively, each round of iteration will update the weights of the model, and each set of weights corresponds to a model. In the embodiment of the present invention, which weight is best determined according to the mean absolute error MAE . Therefore, the above current optimal model refers to the machine learning model corresponding to the set of weights with the lowest mean absolute error MAE.

在得到当前最优模型之后，本发明实施例进一步在测试集上验证该模型的泛化性能。如果泛化性能符合预设条件，则将当前最优模型作为实际应用时预测指定重金属元素含量时使用的目标机器学习模型；如果泛化性能不佳的话，则需要调整预设模型参数，重复执行上述步骤S3022，直到得到表现较佳且泛化能力较好的模型。After obtaining the current optimal model, the embodiment of the present invention further verifies the generalization performance of the model on the test set. If the generalization performance meets the preset conditions, use the current optimal model as the target machine learning model used to predict the content of the specified heavy metal element in actual application; if the generalization performance is not good, you need to adjust the preset model parameters and repeat the execution The above step S3022, until a model with better performance and better generalization ability is obtained.

本发明实施例衡量模型泛化性能具体是观察验证集和测试集上MAE的差别，如果与验证集上的MAE相比，模型在测试集上MAE上升太多，则说明模型过拟合了，泛化性能不好。因此，上述预设条件可以是MAE_valid(当前最优模型在验证集上的MAE)和MAE_test(当前最优模型在测试集上的MAE)之间的差MAE_δ小于预设阈值，或者MAE_δ与MAE_valid之间的比值小于预设比例。The embodiment of the present invention measures the generalization performance of the model by observing the difference between the MAE on the verification set and the test set. If the MAE of the model on the test set rises too much compared with the MAE on the verification set, it means that the model is overfitting. Generalization performance is not good. Therefore, the above preset condition can be that the difference MAE _δ between MAE _valid (the MAE of the current optimal model on the validation set) and MAE _test (the MAE of the current optimal model on the test set) is less than the preset threshold, or MAE The ratio between _δ and MAE _valid is smaller than a preset ratio.

在一个可选的实施方式中，上述步骤S3022，利用训练集对初始机器学习模型进行训练，并利用验证集进行早停，具体包括如下内容：In an optional implementation, the above step S3022 uses the training set to train the initial machine learning model, and uses the verification set to perform early stopping, specifically including the following:

重复执行多轮下述步骤A-C，直至目标轮的重金属含量平均绝对误差小于其上一轮的重金属含量平均绝对误差，且小于其后续指定轮数的重金属含量平均绝对误差；Repeat the following steps A-C for multiple rounds until the average absolute error of the heavy metal content of the target round is less than the average absolute error of the heavy metal content of the previous round, and smaller than the average absolute error of the heavy metal content of the subsequent specified number of rounds;

步骤A，初始化模型参数，得到第一模型参数；Step A, initializing the model parameters to obtain the first model parameters;

步骤B，基于第一模型参数和训练集对初始机器学习模型进行训练，得到第一模型；Step B, training the initial machine learning model based on the first model parameters and the training set to obtain the first model;

步骤C，计算第一模型在验证集上的重金属含量平均绝对误差。Step C, calculating the mean absolute error of the heavy metal content of the first model on the verification set.

基于以上内容可知，每一轮训练需要初始化一次模型参数，然后根据该模型参数和训练集对初始机器学习模型进行训练，达到训练集训练结束的条件时，即可得到第一模型，上述训练集训练结束的条件可以是训练次数达到指定次数，也可以是MAE达到指定条件。在得到第一模型之后，即可计算第一模型在验证集上的重金属含量平均绝对误差，也即，验证集中所有样本的绝对误差的平均值。Based on the above content, it can be known that each round of training needs to initialize the model parameters once, and then train the initial machine learning model according to the model parameters and the training set. When the training set ends, the first model can be obtained. The above training set The condition for the end of training may be that the number of training times reaches a specified number of times, or that the MAE reaches a specified condition. After obtaining the first model, the average absolute error of the heavy metal content of the first model on the verification set can be calculated, that is, the average of the absolute errors of all samples in the verification set.

依次类推，计算每一轮训练得到的第一模型在验证集上的重金属含量平均绝对误差，假设第100轮的第一模型在验证集上的重金属含量平均绝对误差MAE₁₀₀小于第99轮的第一模型在验证集上的重金属含量平均绝对误差MAE₉₉，那么即可启动早停观察机制，判断第100轮之后指定轮数(例如100轮)的重金属含量平均绝对误差是否均大于MAE₁₀₀，也即MAE₁₀₁～MAE₂₀₀均大于MAE₁₀₀，如果是，那么第100轮即为上文中所描述的目标轮，第100轮对应的第一模型即为上述当前最优模型。By analogy, calculate the average absolute error of the heavy metal content of the first model obtained in each round of training on the verification set, assuming that the average absolute error of the heavy metal content of the first model in the 100th round on the verification set MAE ₁₀₀ is less than that of the 99th round If the average absolute error of the heavy metal content of a model on the verification set is MAE ₉₉ , then the early-stop observation mechanism can be started to determine whether the average absolute error of the heavy metal content of the specified number of rounds (for example, 100 rounds) after the 100th round is greater than MAE ₁₀₀ , and also That is, MAE ₁₀₁ to MAE ₂₀₀ are all greater than MAE _100. If so, then the 100th round is the target round described above, and the first model corresponding to the 100th round is the above-mentioned current optimal model.

如果在启动早停观察机制之后，在尚未达到指定轮数时，例如在第120轮出现了MAE₁₂₀<MAE₁₀₀的情况，则跳出当前早停观察，重新在第120轮处启动早停观察机制。If after starting the early stop observation mechanism, before reaching the specified number of rounds, for example, MAE ₁₂₀ < MAE ₁₀₀ occurs in the 120th round, skip the current early stop observation and start the early stop observation mechanism again at the 120th round .

在一个可选的实施方式中，目标机器学习模型包括以下其中一种：极端梯度提升树模型，神经网络模型。In an optional embodiment, the target machine learning model includes one of the following: an extreme gradient boosting tree model, and a neural network model.

以目标机器学习模型为极端梯度提升树模型(XGBoost模型)为例，XGBoost采用的参数如下：XGBRegressor(max_depth＝2；learning_rate＝0.25；gamma＝0.0；min_child_weight＝0.0；max_delta_step＝0.0；subsample＝0.9；colsample_bytree＝0.9；colsample_bylevel＝1.0；reg_alpha＝0.0；reg_lambda＝1.0；n_estimators＝1000；use_label_encoder＝False；nthread＝4；scale_pos_weight＝1.0；base_score＝0.5；seed＝1337；random_state＝1337)。Taking the target machine learning model as an extreme gradient boosting tree model (XGBoost model) as an example, the parameters used by XGBoost are as follows: XGBRegressor(max_depth=2; learning_rate=0.25; gamma=0.0; min_child_weight=0.0; max_delta_step=0.0; subsample=0.9; colsample_bytree=0.9; colsample_bylevel=1.0; reg_alpha=0.0; reg_lambda=1.0; n_estimators=1000; use_label_encoder=False; nthread=4; scale_pos_weight=1.0; base_score=0.5; seed=1337; random_state=1337).

如果模型过拟合了，模型参数的调整方向可以是：减少树的深度，child的个数，调节学习率，增加正则化等，对应learning_rate，n_estimators,min_child_weight等参数。If the model is overfitted, the adjustment direction of the model parameters can be: reduce the depth of the tree, the number of children, adjust the learning rate, increase regularization, etc., corresponding to learning_rate, n_estimators, min_child_weight and other parameters.

综上所述，本发明实施例所提供的便携式XRF快速测定重金属含量的方法，在训练机器学习模型时，使用了大量的样本测量数据，因此，本发明方法在根据实测pXRF值去预测重金属元素含量时具有更强的样本适应性和更高的定量精度，另外，由于该方法无需等待实验室制备标准曲线，因此完全保持了pXRF的快捷性。In summary, the portable XRF method for quickly determining the content of heavy metals provided by the embodiment of the present invention uses a large amount of sample measurement data when training the machine learning model. Therefore, the method of the present invention predicts heavy metal elements based on the measured pXRF value It has stronger sample adaptability and higher quantitative accuracy in terms of content. In addition, because this method does not need to wait for the laboratory to prepare a standard curve, it fully maintains the convenience of pXRF.

实施例二Embodiment two

本发明实施例还提供了一种便携式XRF快速测定重金属含量的装置，该便携式XRF快速测定重金属含量的装置主要用于执行上述实施例一所提供的便携式XRF快速测定重金属含量的方法，以下对本发明实施例提供的便携式XRF快速测定重金属含量的装置做具体介绍。The embodiment of the present invention also provides a portable XRF rapid determination of heavy metal content device, the portable XRF rapid determination of heavy metal content device is mainly used to implement the portable XRF rapid determination of heavy metal content method provided in the first embodiment, the following description of the present invention The portable XRF device for rapid determination of heavy metal content provided in the examples is described in detail.

图3是本发明实施例提供的一种便携式XRF快速测定重金属含量的装置的功能模块图，如图3所示，该装置主要包括：第一获取模块10，第一处理模块20，第二处理模块30，其中：Fig. 3 is a functional module diagram of a portable XRF device for quickly measuring heavy metal content provided by an embodiment of the present invention. As shown in Fig. 3, the device mainly includes: a first acquisition module 10, a first processing module 20, a second processing module Module 30, in which:

第一获取模块10，用于获取待测定样本中多种重金属元素的pXRF测量数据。The first acquisition module 10 is configured to acquire pXRF measurement data of various heavy metal elements in the sample to be determined.

第一处理模块20，用于利用预设数据字典对pXRF测量数据进行处理，得到pXRF测量数据对应的特征数据；其中，预设数据字典中包括：特征名称、特征含义和数值类型。The first processing module 20 is configured to process the pXRF measurement data by using a preset data dictionary to obtain feature data corresponding to the pXRF measurement data; wherein, the preset data dictionary includes: feature name, feature meaning and value type.

第二处理模块30，用于利用目标机器学习模型对特征数据进行处理，得到pXRF测量数据对应的指定重金属元素含量；其中，目标机器学习模型是基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后确定的模型。The second processing module 30 is used to process the characteristic data by using the target machine learning model to obtain the specified heavy metal element content corresponding to the pXRF measurement data; wherein, the target machine learning model is based on multiple sets of pXRF measurement data and their corresponding specified heavy metal elements The model was determined after training on laboratory content measurements.

本发明实施例所提供的便携式XRF快速测定重金属含量的装置所执行的方法，应用基于多组pXRF测量数据及其对应的指定重金属元素的实验室含量测量值训练后的目标机器学习模型，处理多种重金属元素的pXRF测量数据对应的特征数据，以得到多种重金属元素的pXRF测量数据对应的指定重金属元素含量。与利用普通的经验公式或线性回归测定重金属含量的方法相比，该装置具有更强的样本适应性和更高的定量精度，并且，因为无需等待实验室制备标准曲线，所以还能保障pXRF的快捷性。The method performed by the portable XRF rapid determination of heavy metal content provided by the embodiment of the present invention uses the target machine learning model trained based on multiple sets of pXRF measurement data and corresponding laboratory content measurement values of specified heavy metal elements to process multiple The characteristic data corresponding to the pXRF measurement data of various heavy metal elements, in order to obtain the specified heavy metal element content corresponding to the pXRF measurement data of various heavy metal elements. Compared with methods using common empirical formulas or linear regression to determine the content of heavy metals, this device has stronger sample adaptability and higher quantitative accuracy, and, because there is no need to wait for the laboratory to prepare a standard curve, it can also guarantee the accuracy of pXRF. quickness.

可选地，该装置还包括：Optionally, the device also includes:

第二获取模块，用于获取多组样本测量数据；其中，每组样本测量数据包括：多种重金属元素的pXRF测量数据及其对应的指定重金属元素的实验室含量测量值。The second acquisition module is used to acquire multiple sets of sample measurement data; wherein, each set of sample measurement data includes: pXRF measurement data of various heavy metal elements and corresponding laboratory content measurement values of specified heavy metal elements.

第三处理模块，用于利用特征生成工具对多组样本测量数据中的pXRF测量数据进行处理，得到多组样本测量数据的若干数据特征。The third processing module is used to process the pXRF measurement data in the multiple sets of sample measurement data by using the feature generation tool to obtain several data features of the multiple sets of sample measurement data.

计算和保留模块，用于计算每种数据特征的特征重要性评分，以保留特征重要性评分大于0的目标数据特征。The calculation and retention module is used to calculate the feature importance score of each data feature, so as to retain target data features with feature importance scores greater than 0.

构建模块，用于基于所有目标数据特征构建预设数据字典。A building block for building a preset data dictionary based on all target data characteristics.

可选地，该装置还包括：Optionally, the device also includes:

第四处理模块，用于利用预设数据字典对每组样本测量数据中的pXRF测量数据进行处理，得到每组样本测量数据对应的特征数据。The fourth processing module is configured to use the preset data dictionary to process the pXRF measurement data in each set of sample measurement data to obtain characteristic data corresponding to each set of sample measurement data.

训练模块，用于基于多组样本测量数据对应的特征数据及其对应的指定重金属元素的实验室含量测量值对初始机器学习模型进行训练，得到目标机器学习模型。The training module is used to train the initial machine learning model based on the feature data corresponding to multiple sets of sample measurement data and the corresponding laboratory content measurement values of specified heavy metal elements to obtain the target machine learning model.

可选地，训练模块包括：Optionally, the training modules include:

划分单元，用于按照预设比例将多组样本测量数据划分为训练集、验证集和测试集。The division unit is used for dividing multiple sets of sample measurement data into training set, verification set and test set according to preset proportions.

训练单元，用于利用训练集对初始机器学习模型进行训练，并利用验证集进行早停，得到当前最优模型。The training unit is used to use the training set to train the initial machine learning model, and use the verification set to perform early stopping to obtain the current optimal model.

判断单元，用于判断当前最优模型在测试集上的泛化性能是否符合预设条件。The judging unit is used to judge whether the generalization performance of the current optimal model on the test set meets the preset condition.

确定单元，用于在确定符合的情况下，将当前最优模型作为目标机器学习模型。The determining unit is configured to use the current optimal model as the target machine learning model when it is determined to be consistent.

调整和训练单元，用于在确定不符合的情况下，调整预设模型参数，并重新对初始机器学习模型进行训练，直至得到目标机器学习模型。The adjustment and training unit is used to adjust the preset model parameters and retrain the initial machine learning model until the target machine learning model is obtained when it is determined that the inconsistency is determined.

可选地，训练单元具体用于：Optionally, the training unit is specifically used for:

重复执行多轮下述步骤，直至目标轮的重金属含量平均绝对误差小于其上一轮的重金属含量平均绝对误差，且小于其后续指定轮数的重金属含量平均绝对误差。Repeat the following steps for several rounds until the average absolute error of the heavy metal content of the target round is less than the average absolute error of the heavy metal content of the previous round, and smaller than the average absolute error of the heavy metal content of the subsequent specified rounds.

初始化模型参数，得到第一模型参数。Initialize the model parameters to obtain the first model parameters.

基于第一模型参数和训练集对初始机器学习模型进行训练，得到第一模型。The initial machine learning model is trained based on the first model parameters and the training set to obtain the first model.

计算第一模型在验证集上的重金属含量平均绝对误差。Calculate the mean absolute error of the heavy metal content of the first model on the validation set.

可选地，目标机器学习模型包括以下其中一种：极端梯度提升树模型，神经网络模型。Optionally, the target machine learning model includes one of the following: extreme gradient boosting tree model, neural network model.

可选地，采用以下其中任一种方法计算每种数据特征的特征重要性评分：三折交叉验证的LASSO回归算法，岭回归算法，线性回归算法，树模型。Optionally, use any of the following methods to calculate the feature importance score of each data feature: three-fold cross-validated LASSO regression algorithm, ridge regression algorithm, linear regression algorithm, tree model.

实施例三Embodiment three

参见图4，本发明实施例提供了一种电子设备，该电子设备包括：处理器60，存储器61，总线62和通信接口63，所述处理器60、通信接口63和存储器61通过总线62连接；处理器60用于执行存储器61中存储的可执行模块，例如计算机程序。Referring to Fig. 4, the embodiment of the present invention provides a kind of electronic equipment, and this electronic equipment comprises: processor 60, memory 61, bus 62 and communication interface 63, described processor 60, communication interface 63 and memory 61 are connected by bus 62 ; The processor 60 is used to execute executable modules stored in the memory 61, such as computer programs.

其中，存储器61可能包含高速随机存取存储器(RAM，Random Access Memory)，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。通过至少一个通信接口63(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接，可以使用互联网，广域网，本地网，城域网等。Wherein, the memory 61 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. can be used.

总线62可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示，图4中仅用一个双向箭头表示，但并不表示仅有一根总线或一种类型的总线。The bus 62 can be an ISA bus, a PCI bus or an EISA bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one double-headed arrow is used in FIG. 4 , but it does not mean that there is only one bus or one type of bus.

其中，存储器61用于存储程序，所述处理器60在接收到执行指令后，执行所述程序，前述本发明实施例任一实施例揭示的过程定义的装置所执行的方法可以应用于处理器60中，或者由处理器60实现。Wherein, the memory 61 is used to store a program, and the processor 60 executes the program after receiving an execution instruction, and the method performed by the process-defining device disclosed in any of the above-mentioned embodiments of the present invention can be applied to the processor 60, or implemented by the processor 60.

处理器60可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器60中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器60可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等；还可以是数字信号处理器(Digital SignalProcessing，简称DSP)、专用集成电路(Application Specific Integrated Circuit，简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器61，处理器60读取存储器61中的信息，结合其硬件完成上述方法的步骤。The processor 60 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 60 or an instruction in the form of software. The above-mentioned processor 60 can be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (Digital Signal Processing, referred to as DSP) , Application Specific Integrated Circuit (ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods, steps and logic block diagrams disclosed in the embodiments of the present invention may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the methods disclosed in the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 61, and the processor 60 reads the information in the memory 61, and completes the steps of the above method in combination with its hardware.

本发明实施例所提供的一种便携式XRF快速测定重金属含量的方法和装置的计算机程序产品，包括存储了处理器可执行的非易失的程序代码的计算机可读存储介质，所述程序代码包括的指令可用于执行前面方法实施例中所述的方法，具体实现可参见方法实施例，在此不再赘述。The computer program product of a portable XRF rapid determination of heavy metal content method and device provided by an embodiment of the present invention includes a computer-readable storage medium storing a non-volatile program code executable by a processor, and the program code includes The instructions can be used to execute the methods described in the foregoing method embodiments. For specific implementation, refer to the method embodiments, and details are not repeated here.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

在本发明的描述中，需要说明的是，术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，或者是该发明产品使用时惯常摆放的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer" etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship that is usually placed when the product of the invention is used, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying References to devices or elements must have a particular orientation, be constructed, and operate in a particular orientation and therefore should not be construed as limiting the invention. In addition, the terms "first", "second", "third", etc. are only used for distinguishing descriptions, and should not be construed as indicating or implying relative importance.

此外，术语“水平”、“竖直”、“悬垂”等术语并不表示要求部件绝对水平或悬垂，而是可以稍微倾斜。如“水平”仅仅是指其方向相对“竖直”而言更加水平，并不是表示该结构一定要完全水平，而是可以稍微倾斜。In addition, the terms "horizontal", "vertical", "overhanging" and the like do not mean that the components are absolutely horizontal or overhanging, but may be slightly inclined. For example, "horizontal" only means that its direction is more horizontal than "vertical", and it does not mean that the structure must be completely horizontal, but can be slightly inclined.

在本发明的描述中，还需要说明的是，除非另有明确的规定和限定，术语“设置”、“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should also be noted that, unless otherwise clearly specified and limited, the terms "installation", "installation", "connection" and "connection" should be understood in a broad sense, for example, it may be a fixed connection, It can also be a detachable connection or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediary, and it can be the internal communication of two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention in specific situations.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

1. A method for the rapid determination of heavy metal content by portable XRF, characterized in that, comprising:

Obtain the pXRF measurement data of various heavy metal elements in the sample to be determined;

Using a preset data dictionary to process the pXRF measurement data to obtain feature data corresponding to the pXRF measurement data; wherein, the preset data dictionary includes: feature name, feature meaning and value type;

The target machine learning model is used to process the characteristic data to obtain the specified heavy metal element content corresponding to the pXRF measurement data; wherein, the target machine learning model is an experiment based on multiple sets of pXRF measurement data and their corresponding specified heavy metal elements A model determined after training on chamber content measurements.

2. the method for portable XRF rapid determination heavy metal content according to claim 1, is characterized in that, described method also comprises:

Obtain multiple sets of sample measurement data; wherein, each set of sample measurement data includes: pXRF measurement data of multiple heavy metal elements and their corresponding laboratory content measurement values of specified heavy metal elements;

Using a feature generation tool to process the pXRF measurement data in the multiple sets of sample measurement data to obtain several data features of the multiple sets of sample measurement data;

Calculating the feature importance score of each of the data features to retain target data features with feature importance scores greater than 0;

The preset data dictionary is constructed based on all the target data features.

3. the method for portable XRF rapid determination heavy metal content according to claim 2, is characterized in that, described method also comprises:

Using the preset data dictionary to process the pXRF measurement data in each set of sample measurement data to obtain characteristic data corresponding to each set of sample measurement data;

The initial machine learning model is trained based on multiple sets of feature data corresponding to the sample measurement data and the corresponding laboratory content measurement values of specified heavy metal elements to obtain the target machine learning model.

4. The method for portable XRF rapid determination of heavy metal content according to claim 3, characterized in that, based on the characteristic data corresponding to the sample measurement data of multiple groups and the corresponding laboratory content measurement values of the specified heavy metal elements, the initial machine The learning model is trained, including:

Divide the plurality of sets of sample measurement data into a training set, a verification set and a test set according to a preset ratio;

Using the training set to train the initial machine learning model, and using the verification set to perform early stopping to obtain the current optimal model;

Judging whether the generalization performance of the current optimal model on the test set meets a preset condition;

If so, using the current optimal model as the target machine learning model;

If not, adjust the preset model parameters, and retrain the initial machine learning model until the target machine learning model is obtained.

5. the method for portable XRF fast determination heavy metal content according to claim 4, is characterized in that, utilizes described training set to train described initial machine learning model, and utilizes described validation set to carry out early stop, comprising:

Repeat the following steps for multiple rounds until the average absolute error of the heavy metal content of the target round is less than the average absolute error of the heavy metal content of the previous round, and smaller than the average absolute error of the heavy metal content of the subsequent specified number of rounds;

Initialize the model parameters to obtain the first model parameters;

Train the initial machine learning model based on the first model parameters and the training set to obtain a first model;

calculating the mean absolute error of the heavy metal content of the first model on the validation set.

6. The method for rapid determination of heavy metal content by portable XRF according to claim 1, wherein the target machine learning model comprises one of the following: extreme gradient boosting tree model, neural network model.

7. The method for the rapid determination of heavy metal content by portable XRF according to claim 2, characterized in that, adopt any of the following methods to calculate the feature importance score of each described data feature: the LASSO regression algorithm of three-fold cross-validation , ridge regression algorithm, linear regression algorithm, tree model.

8. A device for the rapid determination of heavy metal content by portable XRF, characterized in that it comprises:

The first acquisition module is used to acquire the pXRF measurement data of various heavy metal elements in the sample to be determined;

The first processing module is configured to use a preset data dictionary to process the pXRF measurement data to obtain characteristic data corresponding to the pXRF measurement data; wherein, the preset data dictionary includes: characteristic names, characteristic meanings and values type;

The second processing module is used to process the characteristic data by using the target machine learning model to obtain the specified heavy metal element content corresponding to the pXRF measurement data; wherein, the target machine learning model is based on multiple sets of pXRF measurement data and its A model identified after training on laboratory content measurements corresponding to the specified heavy metal elements.

9. An electronic device, comprising a memory and a processor, the memory is stored with a computer program that can run on the processor, and it is characterized in that the above-mentioned claim 1 is realized when the processor executes the computer program Steps in the method for the portable XRF rapid determination of heavy metal content described in any one of to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and when the computer instructions are executed by a processor, the portable computer according to any one of claims 1 to 7 is realized. XRF method for rapid determination of heavy metal content.