Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
Accurately predicting plant cuticle–air partition coefficients (<i>K</i><sub>ca</sub>) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured <i>K</i><sub>ca...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-03-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/29/6/1381 |
_version_ | 1797239849081634816 |
---|---|
author | Tianyun Tao Cuicui Tao Tengyi Zhu |
author_facet | Tianyun Tao Cuicui Tao Tengyi Zhu |
author_sort | Tianyun Tao |
collection | DOAJ |
description | Accurately predicting plant cuticle–air partition coefficients (<i>K</i><sub>ca</sub>) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured <i>K</i><sub>ca</sub> values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing <i>K</i><sub>ca</sub> values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting <i>K</i><sub>ca</sub>. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>adj</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.925, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>LOO</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.756, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>BOOT</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.864, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.837, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.811, and <i>CCC</i> = 0.891) is recommended as the best model for predicting <i>K</i><sub>ca</sub> due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering. |
first_indexed | 2024-04-24T17:58:04Z |
format | Article |
id | doaj.art-2f2d21832d434fab981b925d3d654dde |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-04-24T17:58:04Z |
publishDate | 2024-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-2f2d21832d434fab981b925d3d654dde2024-03-27T13:57:13ZengMDPI AGMolecules1420-30492024-03-01296138110.3390/molecules29061381Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure PerspectiveTianyun Tao0Cuicui Tao1Tengyi Zhu2College of Agriculture, Yangzhou University, Yangzhou 225009, ChinaSchool of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, ChinaSchool of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, ChinaAccurately predicting plant cuticle–air partition coefficients (<i>K</i><sub>ca</sub>) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured <i>K</i><sub>ca</sub> values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing <i>K</i><sub>ca</sub> values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting <i>K</i><sub>ca</sub>. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>adj</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.925, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>LOO</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.756, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>BOOT</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.864, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>R</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.837, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>Q</mi><mrow><mi>ext</mi></mrow><mn>2</mn></msubsup></semantics></math></inline-formula> = 0.811, and <i>CCC</i> = 0.891) is recommended as the best model for predicting <i>K</i><sub>ca</sub> due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.https://www.mdpi.com/1420-3049/29/6/1381organic pollutantsplant cuticle–air partition coefficientQSPRmachine learning |
spellingShingle | Tianyun Tao Cuicui Tao Tengyi Zhu Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective Molecules organic pollutants plant cuticle–air partition coefficient QSPR machine learning |
title | Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective |
title_full | Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective |
title_fullStr | Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective |
title_full_unstemmed | Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective |
title_short | Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective |
title_sort | machine learning based prediction of plant cuticle air partition coefficients for organic pollutants revealing mechanisms from a molecular structure perspective |
topic | organic pollutants plant cuticle–air partition coefficient QSPR machine learning |
url | https://www.mdpi.com/1420-3049/29/6/1381 |
work_keys_str_mv | AT tianyuntao machinelearningbasedpredictionofplantcuticleairpartitioncoefficientsfororganicpollutantsrevealingmechanismsfromamolecularstructureperspective AT cuicuitao machinelearningbasedpredictionofplantcuticleairpartitioncoefficientsfororganicpollutantsrevealingmechanismsfromamolecularstructureperspective AT tengyizhu machinelearningbasedpredictionofplantcuticleairpartitioncoefficientsfororganicpollutantsrevealingmechanismsfromamolecularstructureperspective |