Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values

The prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow f...

Full description

Bibliographic Details
Main Authors: Jeffrey Plante, Beth A. Caine, Paul L. A. Popelier
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/26/4/1048
_version_ 1797396284272803840
author Jeffrey Plante
Beth A. Caine
Paul L. A. Popelier
author_facet Jeffrey Plante
Beth A. Caine
Paul L. A. Popelier
author_sort Jeffrey Plante
collection DOAJ
description The prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK<sub>a</sub> prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pK<sub>a</sub> of the ionisable centre. In the current work, we augment our dataset with pK<sub>a</sub> values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.
first_indexed 2024-03-09T00:48:09Z
format Article
id doaj.art-f511b899d1a94a69838f498397102628
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-09T00:48:09Z
publishDate 2021-02-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-f511b899d1a94a69838f4983971026282023-12-11T17:24:16ZengMDPI AGMolecules1420-30492021-02-01264104810.3390/molecules26041048Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived ValuesJeffrey Plante0Beth A. Caine1Paul L. A. Popelier2Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, UKManchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UKManchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UKThe prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK<sub>a</sub> prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pK<sub>a</sub> of the ionisable centre. In the current work, we augment our dataset with pK<sub>a</sub> values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.https://www.mdpi.com/1420-3049/26/4/1048pKa predictionab initiobond lengthcarbon acid
spellingShingle Jeffrey Plante
Beth A. Caine
Paul L. A. Popelier
Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
Molecules
pKa prediction
ab initio
bond length
carbon acid
title Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_full Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_fullStr Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_full_unstemmed Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_short Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_sort enhancing carbon acid pk sub a sub prediction by augmentation of sparse experimental datasets with accurate aibl qm derived values
topic pKa prediction
ab initio
bond length
carbon acid
url https://www.mdpi.com/1420-3049/26/4/1048
work_keys_str_mv AT jeffreyplante enhancingcarbonacidpksubasubpredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues
AT bethacaine enhancingcarbonacidpksubasubpredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues
AT paullapopelier enhancingcarbonacidpksubasubpredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues