Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
The prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow f...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-02-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/26/4/1048 |
_version_ | 1797396284272803840 |
---|---|
author | Jeffrey Plante Beth A. Caine Paul L. A. Popelier |
author_facet | Jeffrey Plante Beth A. Caine Paul L. A. Popelier |
author_sort | Jeffrey Plante |
collection | DOAJ |
description | The prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK<sub>a</sub> prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pK<sub>a</sub> of the ionisable centre. In the current work, we augment our dataset with pK<sub>a</sub> values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone. |
first_indexed | 2024-03-09T00:48:09Z |
format | Article |
id | doaj.art-f511b899d1a94a69838f498397102628 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-09T00:48:09Z |
publishDate | 2021-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-f511b899d1a94a69838f4983971026282023-12-11T17:24:16ZengMDPI AGMolecules1420-30492021-02-01264104810.3390/molecules26041048Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived ValuesJeffrey Plante0Beth A. Caine1Paul L. A. Popelier2Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, UKManchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UKManchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UKThe prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK<sub>a</sub> prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pK<sub>a</sub> of the ionisable centre. In the current work, we augment our dataset with pK<sub>a</sub> values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.https://www.mdpi.com/1420-3049/26/4/1048pKa predictionab initiobond lengthcarbon acid |
spellingShingle | Jeffrey Plante Beth A. Caine Paul L. A. Popelier Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values Molecules pKa prediction ab initio bond length carbon acid |
title | Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values |
title_full | Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values |
title_fullStr | Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values |
title_full_unstemmed | Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values |
title_short | Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values |
title_sort | enhancing carbon acid pk sub a sub prediction by augmentation of sparse experimental datasets with accurate aibl qm derived values |
topic | pKa prediction ab initio bond length carbon acid |
url | https://www.mdpi.com/1420-3049/26/4/1048 |
work_keys_str_mv | AT jeffreyplante enhancingcarbonacidpksubasubpredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues AT bethacaine enhancingcarbonacidpksubasubpredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues AT paullapopelier enhancingcarbonacidpksubasubpredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues |