[go: up one dir, main page]

CN118800357A - A predictive model for biomagnification of organic chemicals in low trophic levels of food chains - Google Patents

A predictive model for biomagnification of organic chemicals in low trophic levels of food chains Download PDF

Info

Publication number
CN118800357A
CN118800357A CN202410895638.XA CN202410895638A CN118800357A CN 118800357 A CN118800357 A CN 118800357A CN 202410895638 A CN202410895638 A CN 202410895638A CN 118800357 A CN118800357 A CN 118800357A
Authority
CN
China
Prior art keywords
model
biomagnification
food chain
descriptors
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410895638.XA
Other languages
Chinese (zh)
Other versions
CN118800357B (en
Inventor
范德玲
王蕾
汪贞
孙帅
张冰
梁梦园
邢维龙
方正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Nanjing Institute of Environmental Sciences MEP
Original Assignee
Nanjing Tech University
Nanjing Institute of Environmental Sciences MEP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University, Nanjing Institute of Environmental Sciences MEP filed Critical Nanjing Tech University
Priority to CN202410895638.XA priority Critical patent/CN118800357B/en
Publication of CN118800357A publication Critical patent/CN118800357A/en
Application granted granted Critical
Publication of CN118800357B publication Critical patent/CN118800357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明公开了一种有机化学品的低营养级食物链生物放大预测模型,该模型为首次基于QSAR模型构建的PBDEs和HBCDs的低营养级食物链生物放大预测模型。所述模型通过样本采集及筛选、分子描述符计算、模型构建、模型验证等步骤获得。利用本方法构建出的有机化学品的低营养级食物链生物放大预测模型能准确预测出PBDEs和HBCDs类化学品有机污染物生物放大因子,提高了预测结果的准确率,节省了人力、物力和时间,简单、快速有效,并且严格按照OECD规定的QSAR模型使用规则,从分子描述符结构上解释影响生物放大因子的关键因素,对PBDEs和HBCDs等毒害化学物质的风险管控和环境安全具有重要意义。

The present invention discloses a low-trophic-level food chain biomagnification prediction model for organic chemicals, which is the first low-trophic-level food chain biomagnification prediction model for PBDEs and HBCDs based on a QSAR model. The model is obtained through steps such as sample collection and screening, molecular descriptor calculation, model construction, and model verification. The low-trophic-level food chain biomagnification prediction model for organic chemicals constructed using this method can accurately predict the biomagnification factors of organic pollutants in PBDEs and HBCDs chemicals, improve the accuracy of the prediction results, save manpower, material resources and time, is simple, fast and effective, and strictly follows the QSAR model usage rules stipulated by the OECD, explaining the key factors affecting the biomagnification factor from the molecular descriptor structure, which is of great significance to the risk control and environmental safety of toxic chemicals such as PBDEs and HBCDs.

Description

一种有机化学品的低营养级食物链生物放大预测模型A predictive model for biomagnification of organic chemicals in low trophic levels of food chains

技术领域Technical Field

本发明涉及有机化学品环境风险评估领域,具体涉及一种有机化学品的低营养级食物链生物放大预测模型。The invention relates to the field of environmental risk assessment of organic chemicals, and in particular to a low trophic level food chain biomagnification prediction model for organic chemicals.

背景技术Background Art

有毒有害化学物质具有亲脂性、难降解性,且在生物体内积累和富集。在生态系统中,随着营养级的增加呈现生物放大效应,对高等生物及人体产生毒性。因此研究水生生物体内有机污染物的含量特征为当地居民的健康和食品安全具有重要意义。Toxic and hazardous chemicals are lipophilic, difficult to degrade, and accumulate and enrich in organisms. In an ecosystem, they exhibit a biomagnification effect as the trophic level increases, causing toxicity to higher organisms and humans. Therefore, studying the content characteristics of organic pollutants in aquatic organisms is of great significance to the health of local residents and food safety.

化合物通过食物链在高营养级生物和人体中的富集程度对评价该化合物对生态、环境毒性意义重大。多年来,科学家一直从事相关工作,通过长期的研究,他们发现并建立各种介质间化合物积累模型。评价有机污染物是否存在生物富集效应一般有两个标准。第一是化合物的KOW(正辛醇-水分配系数),一般地,当logKOW>4-5时,该化合物可能具有生物富集效应,而logKOW在5-7时,化合物具有最大的生物富集效应;第二是生物富集因子(BAF)。生物富集因子(BA F)可以表征一个化合物的相对生物可富集能力。The degree of enrichment of a compound in high-trophic-level organisms and the human body through the food chain is of great significance for evaluating the ecological and environmental toxicity of the compound. For many years, scientists have been engaged in related work. Through long-term research, they have discovered and established compound accumulation models between various media. There are generally two standards for evaluating whether an organic pollutant has a bioaccumulation effect. The first is the K OW (n-octanol-water partition coefficient) of the compound. Generally, when log K OW >4-5, the compound may have a bioaccumulation effect, and when log K OW is 5-7, the compound has the greatest bioaccumulation effect; the second is the bioaccumulation factor (BAF). The bioaccumulation factor (BAF) can characterize the relative bioaccumulation capacity of a compound.

水生食物链中生物富集因子BAF的计算公式如下:The calculation formula of bioaccumulation factor (BAF) in aquatic food chains is as follows:

C生物表示水生生物体中污染物的浓度,单位pg/kg lw;C水中溶解相表示水体中溶解的污染物的浓度,单位pg/L。生物中的浓度为脂肪归一化浓度。 Corganisms represents the concentration of pollutants in aquatic organisms, in pg/kg lw; Cdissolved phase in water represents the concentration of pollutants dissolved in water, in pg/L. The concentration in organisms is the fat-normalized concentration.

在BAF模型中,当某种污染物BAF值高于5000(或LogBAF>3.7)时,可认为该污染物在食物链中具有生物富集效应;当BAF值在2000~5000(或LogB AF>3.7)时,认为其有潜在的生物富集效应。In the BAF model, when the BAF value of a pollutant is higher than 5000 (or LogBAF>3.7), it can be considered that the pollutant has a bioaccumulation effect in the food chain; when the BAF value is between 2000 and 5000 (or LogBAF>3.7), it is considered to have a potential bioaccumulation effect.

PBDEs和HBCDs作为斯德哥尔摩公约新增列的一类新型持久性有机污染物,具有典型的生物富集性。由于PBDEs和HBCDs具有高亲脂性和代谢能力差等特点,所以具有沿食物链放大的潜力。而最近的研究也证实,PBDEs和HBCDs可以在食物链中随营养级传递。但是仅通过实验方法获取化学品生物放大因子(BMF)成本高、费时费力,难以满足化学物质生态风险性评价的需要。目前,针对化学物质在食物链上的生物放大效应模型还空缺。所以,迫切需要发展科学快捷有效的食物链传递模型的理论计算方法。经济合作与发展组织(OECD)于2007年发布了QSAR模型构建与验证的导则,提出了QSAR模型应满足的标准:①具有明确定义的环境指标;②具有清晰和明确的数学算法;③定义了模型的应用域;④模型具有适当的拟合优度、稳健性和预测能力;⑤尽可能进行模型机理解释。PBDEs and HBCDs are a new type of persistent organic pollutants newly listed in the Stockholm Convention, with typical bioaccumulation. Due to their high lipophilicity and poor metabolic capacity, PBDEs and HBCDs have the potential to be amplified along the food chain. Recent studies have also confirmed that PBDEs and HBCDs can be transferred along the trophic level in the food chain. However, obtaining the biomagnification factor (BMF) of chemicals only through experimental methods is costly, time-consuming and labor-intensive, and it is difficult to meet the needs of ecological risk assessment of chemical substances. At present, there is still a lack of models for the biomagnification effect of chemicals in the food chain. Therefore, it is urgent to develop a scientific, fast and effective theoretical calculation method for food chain transfer models. The Organization for Economic Cooperation and Development (OECD) issued guidelines for the construction and verification of QSAR models in 2007, proposing the standards that QSAR models should meet: ① have clearly defined environmental indicators; ② have clear and clear mathematical algorithms; ③ define the application domain of the model; ④ the model has appropriate goodness of fit, robustness and predictive ability; ⑤ explain the model mechanism as much as possible.

发明内容Summary of the invention

本发明首次基于QSAR模型,构建PBDEs和HBCDs的低营养级食物链生物放大预测模型,如下所示:The present invention, for the first time, constructs a low trophic level food chain biomagnification prediction model for PBDEs and HBCDs based on the QSAR model, as shown below:

PECoral,predator = PECwater*BCFfish*BMF (1)PEC oral,predator = PEC water *BCF fish *BMF (1)

BMF=-5.04472+0.8374*GGI3-35.46426*Mor21v (2)BMF=-5.04472+0.8374*GGI3-35.46426*Mor21v (2)

其中各参数含义如下:The meaning of each parameter is as follows:

本发明提供了上述低营养级食物链生物放大模型的构建方法,具体如下:The present invention provides a method for constructing the above-mentioned low trophic level food chain biomagnification model, which is specifically as follows:

⑴样本采集及筛选⑴Sample collection and screening

实验室数据获得包含了9个有机化合物的生物放大因子数据,这些化合物涵盖了PBDEs和HBCDs的有机物。为了建立有效的QSAR模型,首先把数据集分成训练集和测试集。为保证训练集化合物的代表性,本次工作所用的分组方法是Kennard&Stone方法,这种方法在一定程度上能够避免训练集样本分布不均匀,能够很好的将数据集划分为训练集和测试集,本数据集划分为6个训练集和3个验证集。The laboratory data obtained contains the biomagnification factor data of 9 organic compounds, which cover organic substances such as PBDEs and HBCDs. In order to establish an effective QSAR model, the data set is first divided into a training set and a test set. To ensure the representativeness of the training set compounds, the grouping method used in this work is the Kennard & Stone method. This method can avoid the uneven distribution of training set samples to a certain extent, and can well divide the data set into training set and test set. This data set is divided into 6 training sets and 3 validation sets.

⑵分子描述符计算⑵ Calculation of molecular descriptors

本方法首先在ChemDraw软件中构建出9个有机化合物的分子结构,然后导入HypeChem程序对分子进行优化。优化分为两步:首先是MM+分子力场方法进行初步的能量优化,然后使用半经验量子力学AM1方法对结构进行更加准确的构型优化。优化后的结构导入到DRAGON5.4软件中计算1664个不同类型的理论分子描述符。建模前对这些描述符进行了预处理,即将常数项、接近常数的项和具有高度相关(相关系数大于0.96的两个分子描述符中与目标值相关系数较小的)的分子描述符删除。最终剩余1169个描述符用于后面的变量选择过程。This method first constructs the molecular structures of 9 organic compounds in ChemDraw software, and then imports them into the HypeChem program to optimize the molecules. The optimization is divided into two steps: first, the MM+ molecular force field method is used for preliminary energy optimization, and then the semi-empirical quantum mechanics AM1 method is used to optimize the structure more accurately. The optimized structure is imported into DRAGON5.4 software to calculate 1664 different types of theoretical molecular descriptors. These descriptors were preprocessed before modeling, that is, constant terms, terms close to constants, and molecular descriptors with high correlation (the correlation coefficient of two molecular descriptors with a smaller correlation coefficient with the target value is greater than 0.96) were deleted. Finally, the remaining 1169 descriptors were used for the subsequent variable selection process.

⑶模型构建⑶Model construction

采用遗传算法来选择与生物富集具有高度相关的描述符集,这个过程在MobyDigs中实现。经过遗传算法变量选择后,用多元线性回归(MLR)方法建立线性QSAR模型,即GA-MLR模型。模型评价函数选择留一法交互检验(leave-one-out cross validation),即当增加一个描述符后模型的性能没有明显变化时(增加一个描述符Q2增加小于0.02),即达到最佳描述符个数。本方法中,最佳描述符个数为7。建模中的相关参数设置为:种群大小(population size)为100,初始模型允许的做大变量数(maximum allowed variables)为7,变异均衡值(mutation trade-off,T)为0.5,交叉(crossover)和变异(mutation)概率均基于T参数。A genetic algorithm was used to select a set of descriptors that were highly correlated with bioaccumulation, and this process was implemented in MobyDigs. After genetic algorithm variable selection, a linear QSAR model, namely the GA-MLR model, was established using the multivariate linear regression (MLR) method. The model evaluation function selected the leave-one-out cross validation, that is, when the performance of the model did not change significantly after adding a descriptor (the increase in Q2 by adding a descriptor was less than 0.02), the optimal number of descriptors was reached. In this method, the optimal number of descriptors was 7. The relevant parameters in the modeling were set as follows: population size was 100, the maximum allowed variables allowed in the initial model was 7, the mutation trade-off (T) was 0.5, and the crossover and mutation probabilities were based on the T parameter.

(4)模型验证(4) Model verification

经过遗传算法变量选择后,用多元线性回归方法建立线性QSAR模型,即MLR模型。线性MLR方程如下:After genetic algorithm variable selection, the linear QSAR model, namely the MLR model, was established using the multivariate linear regression method. The linear MLR equation is as follows:

BMF=-5.04472+0.8374*GGI3-35.46426*Mor21vBMF=-5.04472+0.8374*GGI3-35.46426*Mor21v

ntr=6Q2 LOO=0.9981R2 fitting=0.9996R2 adj=0.9994RMSEtr=0.0665R2 boot=0.7175n tr =6Q 2 LOO =0.9981R 2 fitting =0.9996R 2 adj =0.9994RMSE tr =0.0665R 2 boot =0.7175

next=3R2 ext=0.836,Q2 ext=0.8662R2 adj=0.9994RMSEext=0.0289n ext =3R 2 ext =0.836, Q 2 ext =0.8662R 2 adj =0.9994RMSE ext =0.0289

其中,GGI3表示3阶拓扑电荷指数,Mor21v表示3D-MoRSE-加权原子范德华体积,GGI3与生物放大因子呈正相关性,Mor21v与生物放大因子呈负相关性,训练集和验证集RMSE分别为0.0665和0.0289,模型预测效果较好。Among them, GGI3 represents the third-order topological charge index, Mor21v represents 3D-MoRSE-weighted atomic van der Waals volume, GGI3 is positively correlated with the biomagnification factor, Mor21v is negatively correlated with the biomagnification factor, the RMSE of the training set and the validation set are 0.0665 and 0.0289, respectively, and the model prediction effect is good.

表1PBDEs和HBCDs BMF实验值和预测值Table 1 Experimental and predicted values of BMF for PBDEs and HBCDs

本方法建立的有机化学品的低营养级食物链生物放大预测模型的优点在于:由实验手段测定水生生物体中污染物的浓度PBDEs和HBCDs含量水平、水体中溶解的污染物的浓度,再计算得到化学品在食物链上的生物放大时间长、成本高。利用本方法构建出的有机化学品的低营养级食物链生物放大预测模型能准确预测出PBDEs和HBCDs类化学品有机污染物生物放大因子,提高了预测结果的准确率,节省了人力、物力和时间,简单、快速有效,并且严格按照OECD规定的QSAR模型使用规则,从分子描述符结构上解释影响生物放大因子的关键因素,对PBDEs和HBCDs等毒害化学物质的风险管控和环境安全具有重要意义。The advantages of the low-trophic-level food chain biomagnification prediction model for organic chemicals established by this method are: the concentration of pollutants in aquatic organisms, the content level of PBDEs and HBCDs, the concentration of pollutants dissolved in water bodies, and then the calculated biomagnification time of chemicals in the food chain is long and the cost is high. The low-trophic-level food chain biomagnification prediction model for organic chemicals constructed by this method can accurately predict the biomagnification factors of organic pollutants such as PBDEs and HBCDs chemicals, improve the accuracy of the prediction results, save manpower, material resources and time, is simple, fast and effective, and strictly follows the QSAR model usage rules stipulated by the OECD, explaining the key factors affecting the biomagnification factor from the molecular descriptor structure, which is of great significance to the risk control and environmental safety of toxic chemicals such as PBDEs and HBCDs.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1BMF预测模型拟合图。Fig. 1BMF prediction model fitting diagram.

图2BMF预测模型预测模型应用域表征图。Fig. 2 Representation diagram of the application domain of the BMF prediction model.

具体实施方式DETAILED DESCRIPTION

以下结合具体实施例对本发明作进一步说明。The present invention is further described below in conjunction with specific embodiments.

实施例1Example 1

有机化学品的低营养级食物链生物放大预测模型构建具体步骤如下:The specific steps for constructing a low-trophic-level food chain biomagnification prediction model for organic chemicals are as follows:

(1)数据收集,设置训练集和验证集样本化合物(1) Data collection, setting training set and validation set sample compounds

实验室获得9种有机化学品生物放大因子BMF。训练集共选取6个样本化合物,验证集共选取3个样本化合物。The laboratory obtained the biomagnification factors (BMFs) of 9 organic chemicals. A total of 6 sample compounds were selected for the training set and 3 sample compounds were selected for the validation set.

(2)计算描述符(2) Calculate descriptors

采用Hyperchem 7.0软件中的MM+分子力学对化合物结构进行预优化,半经验AM1方法对化合物结构进行优化,基于优化的结构,使用Dragon 5.4软件计算描述符,并对计算的1664个描述符进行初步筛选,The compound structures were pre-optimized using MM+ molecular mechanics in Hyperchem 7.0 software, and the compound structures were optimized using the semi-empirical AM1 method. Based on the optimized structures, descriptors were calculated using Dragon 5.4 software, and a preliminary screening of the 1664 calculated descriptors was performed.

(3)模型构建(3) Model construction

采用MobyDigs软件中遗传算法(GA)进行变量选择,基于筛选的变量,采用多元线性回归(MLR)方法建立预测模型,即GA-MLR模型:Genetic algorithm (GA) in MobyDigs software was used for variable selection. Based on the selected variables, a prediction model was established using the multivariate linear regression (MLR) method, namely the GA-MLR model:

BMF=-5.04472+0.8374*GGI3-35.46426*Mor21vBMF=-5.04472+0.8374*GGI3-35.46426*Mor21v

其中,GGI3表示3阶拓扑电荷指数,Mor21v表示3D-MoRSE-加权原子范德华体积,GGI3与生物放大因子呈正相关性,Mor21v与生物放大因子呈负相关性,训练集和验证集RMSE分别为0.0665和0.0289。Among them, GGI3 represents the third-order topological charge index, Mor21v represents 3D-MoRSE-weighted atomic van der Waals volume, GGI3 is positively correlated with the biomagnification factor, Mor21v is negatively correlated with the biomagnification factor, and the RMSE of the training set and validation set are 0.0665 and 0.0289, respectively.

(4)模型验证(4) Model verification

根据OECD关于QSAR模型的导则,需要对构建的模型进行内部验证(拟合优度和稳健性评估)和外部验证(预测能力评估)。采用校正后的实验值与拟合值之间的相关系数平方(R2 adj)、均方根误差(RMSE)来表征模型的拟合优度:According to the OECD guidelines for QSAR models, the constructed model needs to be internally validated (goodness of fit and robustness assessment) and externally validated (predictive ability assessment). The square of the correlation coefficient (R 2 adj ) and the root mean square error (RMSE) between the corrected experimental value and the fitted value are used to characterize the goodness of fit of the model:

其中,n代表化合物的个数,m为预测变量的个数,yi分别表示第i个化合物活性指标的实验值和预测值;为化合物活性指标实验值的平均值。Where n represents the number of compounds, m is the number of predictor variables, yi and represent the experimental value and predicted value of the activity index of the i-th compound respectively; It is the average value of the experimental values of the compound activity index.

采用去一法交叉验证系数(Q2 LOO)和Bootstrapping法(Q2 BOOT)表征模型的稳定性:The cross validation coefficient (Q 2 LOO ) and Bootstrapping method (Q 2 BOOT ) were used to characterize the stability of the model:

其中,表示训练集化合物活性指标实验值的平均值。Bootstrapping法采用去1/5交叉验证,重复5000次。in, It represents the average experimental value of the activity index of the training set compound. The bootstrapping method uses 1/5 cross validation and is repeated 5000 times.

采用外部验证相关系数(Q2 EXT),R2 EXT,RMSEEXT表征模型预测能力:External validation correlation coefficient (Q 2 EXT ), R 2 EXT , RMSE EXT were used to characterize the prediction ability of the model:

其中nEXT代表验证集化合物个数,表示验证集化合物活性指标实验值和预测值的平均值。得到模型的表征与评价参数:Where n EXT represents the number of compounds in the validation set, Represents the average value of the experimental value and the predicted value of the activity index of the validation set compound. The characterization and evaluation parameters of the model are obtained:

ntr=6Q2 LOO=0.9981R2 fitting=0.9996R2 adj=0.9994RMSEtr=0.0665R2 boot=0.7175n tr =6Q 2 LOO =0.9981R 2 fitting =0.9996R 2 adj =0.9994RMSE tr =0.0665R 2 boot =0.7175

next=3R2 ext=0.836,Q2 ext=0.8662R2 adj=0.9994RMSEext=0.0289n ext =3R 2 ext =0.836, Q 2 ext =0.8662R 2 adj =0.9994RMSE ext =0.0289

其中,ntr和next分别是训练集和验证集化合物数,p为显著性水平。R2 adj为经自由度校正的决定系数;RMSE为均方根误差;Q2 LOO为去一法交叉验证系数;Q2 BOOT为Bootstrapping方法验证系数;R2 ext为实验值和预测值相关系数,Q2 ext为外部验证决定系数,RMSEext为验证集均方根误差。Where n tr and n ext are the number of compounds in the training set and validation set, respectively, and p is the significance level. R 2 adj is the determination coefficient corrected for degrees of freedom; RMSE is the root mean square error; Q 2 LOO is the leave-one-out cross-validation coefficient; Q 2 BOOT is the validation coefficient of the Bootstrapping method; R 2 ext is the correlation coefficient between the experimental value and the predicted value, Q 2 ext is the external validation determination coefficient, and RMSE ext is the root mean square error of the validation set.

结果表明,模型具有较好的预测能力和稳健性。The results show that the model has good predictive ability and robustness.

实施例2Example 2

本实施例对上述预测模型进行应用域表征。This embodiment characterizes the application domain of the above prediction model.

Williams图是由标准残差(δ)和杠杆值(以hi表示,i代表不同的化合物)定义的一种模型应用域。δ采用下式计算:The Williams plot is a model application domain defined by the standard residual (δ) and the leverage value (represented by h i , where i represents different compounds). δ is calculated using the following formula:

训练集化合物的杠杆值(leverage,hi)可以通过下面的公式求得:The leverage value (leverage, h i ) of the training set compounds can be calculated using the following formula:

hi = xi T (XTX)–1 xi (8)h i = x i T (X T X) –1 x i (8)

式中,xi为第i个化合物分子结构描述符的行向量。警戒值(h*)定义为:Where xi is the row vector of the molecular structure descriptor of the i-th compound. The warning value (h * ) is defined as:

h* = 3(k + 1)/n (9)h * = 3(k + 1)/n (9)

其中,k为描述符的个数,n为训练集个数。Among them, k is the number of descriptors and n is the number of training sets.

模型应用域表征结果如图1、图2所示。图1中h*=3(k+1)/n=3(2+1)/6=1.5。Williams图纵坐标用实验值和预测值的标准残差来表征实验值的离散程度,当化合物的标准残差δ的绝对值大于3.0时,被视为离群点。横坐标代表训练集中化合物hi值,hi大于警戒值(h*=1.5)时,说明在训练集中该物质的子结构出现较少,会对模型预测结果有显著影响。The results of the model application domain characterization are shown in Figures 1 and 2. In Figure 1, h * = 3(k+1)/n = 3(2+1)/6 = 1.5. The vertical axis of the Williams graph uses the standard residuals of the experimental values and the predicted values to characterize the degree of dispersion of the experimental values. When the absolute value of the standard residual δ of the compound is greater than 3.0, it is considered an outlier. The horizontal axis represents the h i value of the compound in the training set. When h i is greater than the warning value (h* = 1.5), it means that the substructure of the substance appears less frequently in the training set, which will have a significant impact on the model prediction results.

由图可见,所有化合物的杠杆值h在警戒杠杆值h*内,表明这个化合物的结构与训练集化合物的结构有一定的相似性,标准残差均在(-3,+3)范围内,说明本模型适用于BDE153(CAS:68631-49-2),BDE 183(CAS:207122-16-5)和BDE-99(CAS:60348-60-9)的标准残差均落在(-3,+3)范围内,表明本模型适用于这三种物质的预测,能够很好地被预测,见图2。As can be seen from the figure, the leverage values h of all compounds are within the warning leverage value h * , indicating that the structure of this compound has a certain similarity with the structure of the training set compound. The standard residuals are all in the range of (-3, +3), indicating that this model is suitable for the prediction of BDE153 (CAS: 68631-49-2), BDE 183 (CAS: 207122-16-5) and BDE-99 (CAS: 60348-60-9). The standard residuals all fall within the range of (-3, +3), indicating that this model is suitable for the prediction of these three substances and can be well predicted, as shown in Figure 2.

实施例3Example 3

利用实施例1构建的模型对9种持久性有机污染物食物链放大因子进行预测,结果见表2。模型的R2 adj=0.999,表明模型具有较强的拟合能力。Q2 LOO=0.9981,Q2 BOOT=0.7175,表明模型较为稳健。R2 ext=0.9994,Q2 ext=0.8662,Golbraikh等人研究认为,QSAR模型可接受的标准是Q2>0.50和R2>0.60。结果表明,模型具有较好的预测能力,能够成功应用到训练集以外的化合物中。图1为持久性有机化学品食物链放大因子BMF的预测值与实验值的拟合图,从图1中可见大部分物质的预测值和实验值拟合较好,BDE153(CAS:68631-49-2),BDE183(CAS:207122-16-5)和BDE-99(CAS:60348-60-9)能够被较好地预测。The model constructed in Example 1 was used to predict the food chain amplification factors of 9 persistent organic pollutants. The results are shown in Table 2. The R 2 adj of the model = 0.999, indicating that the model has a strong fitting ability. Q 2 LOO = 0.9981, Q 2 BOOT = 0.7175, indicating that the model is relatively robust. R 2 ext = 0.9994, Q 2 ext = 0.8662. Golbraikh et al. believed that the acceptable standards for QSAR models are Q 2 > 0.50 and R 2 > 0.60. The results show that the model has good predictive ability and can be successfully applied to compounds outside the training set. Figure 1 is a fitting diagram of the predicted value and experimental value of the food chain magnification factor (BMF) of persistent organic chemicals. It can be seen from Figure 1 that the predicted value and experimental value of most substances fit well, and BDE153 (CAS: 68631-49-2), BDE183 (CAS: 207122-16-5) and BDE-99 (CAS: 60348-60-9) can be well predicted.

实施例4Example 4

利用实施例1构建的模型,预测BDE 203(SMILES:BrC1=C(OC2=CC(Br)=C(Br)C(Br)=C2Br)C(Br)=CC(Br)=C1Br)的生物放大因子BMF。首先根据化学物质分子结构,使用Dragon软件计算出2种描述符GGI3和Mor21v,分别为1.937,-0.06,Hat为0.435,在模型应用域范围内。The biomagnification factor (BMF) of BDE 203 (SMILES: BrC1=C(OC2=CC(Br)=C(Br)C(Br)=C2Br)C(Br)=CC(Br)=C1Br) was predicted using the model constructed in Example 1. First, based on the molecular structure of the chemical substance, two descriptors, GGI3 and Mor21v, were calculated using the Dragon software, which were 1.937 and -0.06, respectively, and Hat was 0.435, which was within the application domain of the model.

BMF=-5.04472+0.8374*GGI3-35.46426*Mor21vBMF=-5.04472+0.8374*GGI3-35.46426*Mor21v

BMF=-5.04472+0.8374*(1.937)-35.46426*(-0.06)=3.25BMF=-5.04472+0.8374*(1.937)-35.46426*(-0.06)=3.25

则BDE 203(BMF)预测值为3.25,与试验测定结果(1.73)接近。The predicted value of BDE 203 (BMF) is 3.25, which is close to the experimental result (1.73).

实施例5Example 5

利用实施例1构建的模型,预测BDE 196(SMILES:BrC1=CC=C(OC2=C(Br)C(Br)=C(Br)C(Br)=C2Br)C(Br)=C1Br)的生物放大因子BMF。首先根据化学物质分子结构,使用Dragon软件计算出2种描述符GGI3和Mor21v,分别为2.375,-0.016,Hat为0.651,在模型应用域范围内。The biomagnification factor (BMF) of BDE 196 (SMILES: BrC1=CC=C(OC2=C(Br)C(Br)=C(Br)C(Br)=C2Br)C(Br)=C1Br) was predicted using the model constructed in Example 1. First, based on the molecular structure of the chemical substance, two descriptors, GGI3 and Mor21v, were calculated using the Dragon software, which were 2.375 and -0.016, respectively, and Hat was 0.651, which was within the application domain of the model.

BMF=-5.04472+0.8374*GGI3-35.46426*Mor21vBMF=-5.04472+0.8374*(2.375)-35.46426*(-0.016)=2.05则BDE 196(BMF)预测值为2.05,与试验测定结果(1.43)接近。BMF=-5.04472+0.8374*GGI3-35.46426*Mor21vBMF=-5.04472+0.8374*(2.375)-35.46426*(-0.016)=2.05, so the predicted value of BDE 196 (BMF) is 2.05, which is close to the experimental measurement result (1.43).

Claims (6)

1. A low nutrient grade food chain bio-scale predictive model of an organic chemical, characterized in that the low nutrient grade food chain bio-scale predictive model is as follows:
PECoral,predator = PECwater*BCFfish*BMF (1)
BMF=-5.04472+0.8374*GGI3-35.46426*Mor21v (2)
Wherein PEC oral,predator refers to the in vivo concentration of low nutrient level predators, mg.kg wetfish -1;PECwater refers to the predicted concentration in water, mg/L; BCF fish refers to the biological enrichment factor of fish, L.kg wetfish -1; BMF refers to biological amplification factor; GGI3 refers to the 3rd order topological charge index; mor21v refers to 3D-MoRSE-weighted atomic Van der Waals volumes.
2. The method for constructing a low nutrient grade food chain bio-amplification predictive model of an organic chemical according to claim 1, comprising the steps of:
(1) Sample collection and screening;
(2) Calculating a molecular descriptor;
(3) Constructing a model;
(4) And (5) model verification.
3. The method for constructing a low nutrient level food chain bio-scale predictive model of an organic chemical according to claim 2, wherein step (1) is specifically: laboratory data obtained biological amplification factor data containing 9 organic compounds, which cover the organics of PBDEs and HBCDs, were first divided into training and test sets using the grouping method Kennard & Stone method, which was divided into 6 training and 3 validation sets.
4. The method for constructing a low nutrient level food chain bio-scale up prediction model of an organic chemical according to claim 2, wherein the step (2) is specifically: firstly, constructing molecular structures of 9 organic compounds in ChemDraw software, and then introducing HypeChem programs to optimize the molecules; the optimization is divided into two steps: firstly, performing preliminary energy optimization by using an MM+molecular force field method, then performing more accurate configuration optimization on a structure by using a semi-empirical quantum mechanical AM1 method, and introducing the optimized structure into DRAGON5.4 software to calculate 1664 theoretical molecular descriptors of different types; these descriptors are preprocessed prior to modeling, i.e., constant terms, near-constant terms, and molecular descriptors with high correlation are deleted, and 1169 descriptors are finally left for the following variable selection process.
5. The method for constructing a low nutrient level food chain bio-scale up prediction model of an organic chemical according to claim 2, wherein the step (3) is specifically: using a genetic algorithm to select a set of descriptors that have a high correlation with biological enrichment, this process is implemented in MobyDigs; after genetic algorithm variable selection, a linear QSAR model is established by a multiple linear regression MLR method, and model evaluation function selection is subjected to one-way interactive inspection, namely when the performance of the model is not obviously changed after one descriptor is added, the optimal descriptor number is reached; in the method, the number of the optimal descriptors is 7; the relevant parameters in modeling are set as follows: the population size is 100, the allowed number of variables to be made large by the initial model is maximum allowed variables, the variation equilibrium value is mutation trade-off, T is 0.5, and the cross-over and variation mutation probabilities are based on the T parameter.
6. The method for constructing a low nutrient level food chain bio-scale predictive model of an organic chemical according to claim 2, wherein step (4) is specifically: after genetic algorithm variable selection, a linear QSAR model, namely an MLR model, is established by a multiple linear regression method, and a linear MLR equation is as follows:
BMF=-5.04472+0.8374*GGI3-35.46426*Mor21v
ntr=6Q2 LOO=0.9981R2 fitting=0.9996R2 adj=0.9994RMSEtr=0.0665R2 boot=0.7175
next=3R2 ext=0.836,Q2 ext=0.8662R2 adj=0.9994RMSEext=0.0289
Wherein GGI3 represents 3-order topological charge index, mor21v represents 3D-MoRSE-weighted atomic Van der Waals volume, GGI3 has positive correlation with biological amplification factor, mor21v has negative correlation with biological amplification factor, training set and validation set RMSE are 0.0665 and 0.0289 respectively.
CN202410895638.XA 2024-07-05 2024-07-05 Method for constructing low-nutrition-level food chain biological amplification prediction model of organic chemicals Active CN118800357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410895638.XA CN118800357B (en) 2024-07-05 2024-07-05 Method for constructing low-nutrition-level food chain biological amplification prediction model of organic chemicals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410895638.XA CN118800357B (en) 2024-07-05 2024-07-05 Method for constructing low-nutrition-level food chain biological amplification prediction model of organic chemicals

Publications (2)

Publication Number Publication Date
CN118800357A true CN118800357A (en) 2024-10-18
CN118800357B CN118800357B (en) 2025-06-27

Family

ID=93026012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410895638.XA Active CN118800357B (en) 2024-07-05 2024-07-05 Method for constructing low-nutrition-level food chain biological amplification prediction model of organic chemicals

Country Status (1)

Country Link
CN (1) CN118800357B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186111A1 (en) * 2002-12-24 2004-09-23 Qun Sun Therapeutic agents useful for treating pain
CN101673321A (en) * 2009-10-17 2010-03-17 大连理工大学 Method for fast predicting organic pollutant n-caprylic alcohol/air distribution coefficient based on molecular structure
WO2012083274A2 (en) * 2010-12-16 2012-06-21 Nodality, Inc. Methods for diagnosis, prognosis and methods of treatment
CN103761431A (en) * 2014-01-10 2014-04-30 大连理工大学 Method for predicting fish bio-concentration factors of organic chemicals by quantitative structure-activity relationship
CN110619925A (en) * 2019-09-27 2019-12-27 大连理工大学 Method for directly predicting biological effectiveness of organic pollutants
CN110853701A (en) * 2019-11-07 2020-02-28 大连理工大学 Method for predicting fish biological enrichment factor of organic compound by adopting multi-parameter linear free energy relation model
CN111310299A (en) * 2019-12-24 2020-06-19 生态环境部南京环境科学研究所 Construction method of prediction model of ozone digestion rate in wastewater with PPCPs-like organic pollutants
CN111564187A (en) * 2020-05-08 2020-08-21 东北师范大学 Method and system for predicting reaction rate constant of organic matter and singlet oxygen
CN111909706A (en) * 2020-08-07 2020-11-10 清华大学深圳国际研究生院 ZVZ composite material, preparation method thereof and method for degrading halogenated organic matters

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186111A1 (en) * 2002-12-24 2004-09-23 Qun Sun Therapeutic agents useful for treating pain
CN101673321A (en) * 2009-10-17 2010-03-17 大连理工大学 Method for fast predicting organic pollutant n-caprylic alcohol/air distribution coefficient based on molecular structure
WO2012083274A2 (en) * 2010-12-16 2012-06-21 Nodality, Inc. Methods for diagnosis, prognosis and methods of treatment
CN103761431A (en) * 2014-01-10 2014-04-30 大连理工大学 Method for predicting fish bio-concentration factors of organic chemicals by quantitative structure-activity relationship
CN110619925A (en) * 2019-09-27 2019-12-27 大连理工大学 Method for directly predicting biological effectiveness of organic pollutants
CN110853701A (en) * 2019-11-07 2020-02-28 大连理工大学 Method for predicting fish biological enrichment factor of organic compound by adopting multi-parameter linear free energy relation model
CN111310299A (en) * 2019-12-24 2020-06-19 生态环境部南京环境科学研究所 Construction method of prediction model of ozone digestion rate in wastewater with PPCPs-like organic pollutants
CN111564187A (en) * 2020-05-08 2020-08-21 东北师范大学 Method and system for predicting reaction rate constant of organic matter and singlet oxygen
CN111909706A (en) * 2020-08-07 2020-11-10 清华大学深圳国际研究生院 ZVZ composite material, preparation method thereof and method for degrading halogenated organic matters

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JON A.ARNOT等: "A food web bioaccumulation model for organic chemicals in aquatic ecosystems", 《ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY》, 31 December 2004 (2004-12-31), pages 2343 - 2355 *
丁蕊: "有机化学品鱼体生物积累性的QSAR预测研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅰ辑》, no. 01, 15 January 2022 (2022-01-15), pages 027 - 281 *

Also Published As

Publication number Publication date
CN118800357B (en) 2025-06-27

Similar Documents

Publication Publication Date Title
Zhang et al. Microbial dynamics and soil physicochemical properties explain large‐scale variations in soil organic carbon
Gandomi et al. Genetic programming for experimental big data mining: A case study on concrete creep formulation
Güllü Function finding via genetic expression programming for strength and elastic properties of clay treated with bottom ash
CN110534163B (en) Method for predicting octanol/water distribution coefficient of organic compound by adopting multi-parameter linear free energy relation model
Amin et al. Experimental and machine learning approaches to investigate the effect of waste glass powder on the flexural strength of cement mortar
Shah et al. Performance evaluation of soft computing for modeling the strength properties of waste substitute green concrete
Alani et al. An evolutionary approach to modelling concrete degradation due to sulphuric acid attack
CN111310299A (en) Construction method of prediction model of ozone digestion rate in wastewater with PPCPs-like organic pollutants
CN110853701A (en) Method for predicting fish biological enrichment factor of organic compound by adopting multi-parameter linear free energy relation model
CN103345544B (en) Adopt logistic regression method prediction organic chemicals biological degradability
Yudhistira et al. Optimizing concrete mix design for cost and carbon reduction using machine learning
Ngo et al. Application of machine learning models for the optimisation of compressive strength and water resistance of geopolymer stabilised compacted earth
Ershadi et al. Applicability of machine learning models for the assessment of long-term pollutant leaching from solid waste materials
Hosseinnia et al. Machine learning formulation for predicting concrete carbonation depth: A sustainability analysis and optimal mixture design
Qin et al. A fuzzy composting process model
Bashir et al. A new strategy using intelligent hybrid learning for prediction of water binder ratio of concrete with rice husk ash as a supplementary cementitious material
Sun et al. Quantitative effects of composting state variables on C/N ratio through GA-aided multivariate analysis
Iscen et al. Molecular simulation strategies for understanding the degradation mechanisms of acrylic polymers
Ghorbani et al. Machine learning-based prediction of resilient modulus for blends of tire-derived aggregates and demolition wastes
Wang et al. Predictive modeling of compressive strength in tailings concrete using explainable machine learning approaches
Li et al. Predicting calcium carbonate yield from wet carbonation of recycled cement paste using interpretable ensemble machine learning
CN118800357A (en) A predictive model for biomagnification of organic chemicals in low trophic levels of food chains
Sathiparan et al. Prediction of characteristics of pervious concrete by machine learning technique using mix parameters and non-destructive test measurements
CN111261238A (en) Construction method of PPCPs organic chemical mesophilic anaerobic digestion removal rate prediction model
Hao et al. Influence of soil organic carbon fractions on the soil priming effect under different vegetation restoration modes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant