Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach
Abstract The charged aerosol detector (CAD) is the latest representative of aerosol-based detectors that generate a response independent of the analytes’ chemical structure. This study was aimed at accurately predicting the CAD response of homologous fatty acids under varying experimental conditions...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-07-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-021-00532-0 |
_version_ | 1818738900177256448 |
---|---|
author | Ruben Pawellek Jovana Krmar Adrian Leistner Nevena Djajić Biljana Otašević Ana Protić Ulrike Holzgrabe |
author_facet | Ruben Pawellek Jovana Krmar Adrian Leistner Nevena Djajić Biljana Otašević Ana Protić Ulrike Holzgrabe |
author_sort | Ruben Pawellek |
collection | DOAJ |
description | Abstract The charged aerosol detector (CAD) is the latest representative of aerosol-based detectors that generate a response independent of the analytes’ chemical structure. This study was aimed at accurately predicting the CAD response of homologous fatty acids under varying experimental conditions. Fatty acids from C12 to C18 were used as model substances due to semivolatile characterics that caused non-uniform CAD behaviour. Considering both experimental conditions and molecular descriptors, a mixed quantitative structure–property relationship (QSPR) modeling was performed using Gradient Boosted Trees (GBT). The ensemble of 10 decisions trees (learning rate set at 0.55, the maximal depth set at 5, and the sample rate set at 1.0) was able to explain approximately 99% (Q2: 0.987, RMSE: 0.051) of the observed variance in CAD responses. Validation using an external test compound confirmed the high predictive ability of the model established (R2: 0.990, RMSEP: 0.050). With respect to the intrinsic attribute selection strategy, GBT used almost all independent variables during model building. Finally, it attributed the highest importance to the power function value, the flow rate of the mobile phase, evaporation temperature, the content of the organic solvent in the mobile phase and the molecular descriptors such as molecular weight (MW), Radial Distribution Function—080/weighted by mass (RDF080m) and average coefficient of the last eigenvector from distance/detour matrix (Ve2_D/Dt). The identification of the factors most relevant to the CAD responsiveness has contributed to a better understanding of the underlying mechanisms of signal generation. An increased CAD response that was obtained for acetone as organic modifier demonstrated its potential to replace the more expensive and environmentally harmful acetonitrile. |
first_indexed | 2024-12-18T01:16:17Z |
format | Article |
id | doaj.art-8a0cdc642f4543929945594110545d92 |
institution | Directory Open Access Journal |
issn | 1758-2946 |
language | English |
last_indexed | 2024-12-18T01:16:17Z |
publishDate | 2021-07-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj.art-8a0cdc642f4543929945594110545d922022-12-21T21:25:56ZengBMCJournal of Cheminformatics1758-29462021-07-0113111410.1186/s13321-021-00532-0Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approachRuben Pawellek0Jovana Krmar1Adrian Leistner2Nevena Djajić3Biljana Otašević4Ana Protić5Ulrike Holzgrabe6Institute for Pharmacy and Food Chemistry, University of WürzburgDepartment of Drug Analysis, Faculty of Pharmacy, University of BelgradeInstitute for Pharmacy and Food Chemistry, University of WürzburgDepartment of Drug Analysis, Faculty of Pharmacy, University of BelgradeDepartment of Drug Analysis, Faculty of Pharmacy, University of BelgradeDepartment of Drug Analysis, Faculty of Pharmacy, University of BelgradeInstitute for Pharmacy and Food Chemistry, University of WürzburgAbstract The charged aerosol detector (CAD) is the latest representative of aerosol-based detectors that generate a response independent of the analytes’ chemical structure. This study was aimed at accurately predicting the CAD response of homologous fatty acids under varying experimental conditions. Fatty acids from C12 to C18 were used as model substances due to semivolatile characterics that caused non-uniform CAD behaviour. Considering both experimental conditions and molecular descriptors, a mixed quantitative structure–property relationship (QSPR) modeling was performed using Gradient Boosted Trees (GBT). The ensemble of 10 decisions trees (learning rate set at 0.55, the maximal depth set at 5, and the sample rate set at 1.0) was able to explain approximately 99% (Q2: 0.987, RMSE: 0.051) of the observed variance in CAD responses. Validation using an external test compound confirmed the high predictive ability of the model established (R2: 0.990, RMSEP: 0.050). With respect to the intrinsic attribute selection strategy, GBT used almost all independent variables during model building. Finally, it attributed the highest importance to the power function value, the flow rate of the mobile phase, evaporation temperature, the content of the organic solvent in the mobile phase and the molecular descriptors such as molecular weight (MW), Radial Distribution Function—080/weighted by mass (RDF080m) and average coefficient of the last eigenvector from distance/detour matrix (Ve2_D/Dt). The identification of the factors most relevant to the CAD responsiveness has contributed to a better understanding of the underlying mechanisms of signal generation. An increased CAD response that was obtained for acetone as organic modifier demonstrated its potential to replace the more expensive and environmentally harmful acetonitrile.https://doi.org/10.1186/s13321-021-00532-0High-performance liquid chromatography (HPLC)Charged aerosol detector (CAD)Gradient boosted trees (GBT)Quantitative structure–property relationship modeling (QSPR)Fatty acids |
spellingShingle | Ruben Pawellek Jovana Krmar Adrian Leistner Nevena Djajić Biljana Otašević Ana Protić Ulrike Holzgrabe Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach Journal of Cheminformatics High-performance liquid chromatography (HPLC) Charged aerosol detector (CAD) Gradient boosted trees (GBT) Quantitative structure–property relationship modeling (QSPR) Fatty acids |
title | Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach |
title_full | Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach |
title_fullStr | Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach |
title_full_unstemmed | Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach |
title_short | Charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features: a machine learning approach |
title_sort | charged aerosol detector response modeling for fatty acids based on experimental settings and molecular features a machine learning approach |
topic | High-performance liquid chromatography (HPLC) Charged aerosol detector (CAD) Gradient boosted trees (GBT) Quantitative structure–property relationship modeling (QSPR) Fatty acids |
url | https://doi.org/10.1186/s13321-021-00532-0 |
work_keys_str_mv | AT rubenpawellek chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach AT jovanakrmar chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach AT adrianleistner chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach AT nevenadjajic chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach AT biljanaotasevic chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach AT anaprotic chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach AT ulrikeholzgrabe chargedaerosoldetectorresponsemodelingforfattyacidsbasedonexperimentalsettingsandmolecularfeaturesamachinelearningapproach |