Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective

Accurately predicting plant cuticle–air partition coefficients (<i>K</i><sub>ca</sub>) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured <i>K</i><sub>ca...

Full description

Bibliographic Details
Main Authors: Tianyun Tao, Cuicui Tao, Tengyi Zhu
Format: Article
Language:English
Published: MDPI AG 2024-03-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/29/6/1381
_version_ 1797239849081634816
author Tianyun Tao
Cuicui Tao
Tengyi Zhu
author_facet Tianyun Tao
Cuicui Tao
Tengyi Zhu
author_sort Tianyun Tao
collection DOAJ
description Accurately predicting plant cuticle–air partition coefficients (<i>K</i><sub>ca</sub>) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured <i>K</i><sub>ca</sub> values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing <i>K</i><sub>ca</sub> values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting <i>K</i><sub>ca</sub>. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>adj</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.925, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>LOO</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.756, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>BOOT</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.864, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.837, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.811, and <i>CCC</i> = 0.891) is recommended as the best model for predicting <i>K</i><sub>ca</sub> due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.
first_indexed 2024-04-24T17:58:04Z
format Article
id doaj.art-2f2d21832d434fab981b925d3d654dde
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-04-24T17:58:04Z
publishDate 2024-03-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-2f2d21832d434fab981b925d3d654dde2024-03-27T13:57:13ZengMDPI AGMolecules1420-30492024-03-01296138110.3390/molecules29061381Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure PerspectiveTianyun Tao0Cuicui Tao1Tengyi Zhu2College of Agriculture, Yangzhou University, Yangzhou 225009, ChinaSchool of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, ChinaSchool of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, ChinaAccurately predicting plant cuticle–air partition coefficients (<i>K</i><sub>ca</sub>) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured <i>K</i><sub>ca</sub> values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing <i>K</i><sub>ca</sub> values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting <i>K</i><sub>ca</sub>. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>adj</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.925, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>LOO</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.756, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>BOOT</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.864, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.837, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.811, and <i>CCC</i> = 0.891) is recommended as the best model for predicting <i>K</i><sub>ca</sub> due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.https://www.mdpi.com/1420-3049/29/6/1381organic pollutantsplant cuticle–air partition coefficientQSPRmachine learning
spellingShingle Tianyun Tao
Cuicui Tao
Tengyi Zhu
Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
Molecules
organic pollutants
plant cuticle–air partition coefficient
QSPR
machine learning
title Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
title_full Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
title_fullStr Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
title_full_unstemmed Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
title_short Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
title_sort machine learning based prediction of plant cuticle air partition coefficients for organic pollutants revealing mechanisms from a molecular structure perspective
topic organic pollutants
plant cuticle–air partition coefficient
QSPR
machine learning
url https://www.mdpi.com/1420-3049/29/6/1381
work_keys_str_mv AT tianyuntao machinelearningbasedpredictionofplantcuticleairpartitioncoefficientsfororganicpollutantsrevealingmechanismsfromamolecularstructureperspective
AT cuicuitao machinelearningbasedpredictionofplantcuticleairpartitioncoefficientsfororganicpollutantsrevealingmechanismsfromamolecularstructureperspective
AT tengyizhu machinelearningbasedpredictionofplantcuticleairpartitioncoefficientsfororganicpollutantsrevealingmechanismsfromamolecularstructureperspective