PubChem3D: conformer ensemble accuracy

<p>Abstract</p> <p>Background</p> <p>PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D)...

Full description

Bibliographic Details
Main Authors: Kim Sunghwan, Bolton Evan E, Bryant Stephen H
Format: Article
Language:English
Published: BMC 2013-01-01
Series:Journal of Cheminformatics
Online Access:http://www.jcheminf.com/content/5/1/1
_version_ 1818917520094003200
author Kim Sunghwan
Bolton Evan E
Bryant Stephen H
author_facet Kim Sunghwan
Bolton Evan E
Bryant Stephen H
author_sort Kim Sunghwan
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined “bioactive” conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules.</p> <p>Results</p> <p>The conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (<it>ST</it><sup><it>ST-opt</it></sup>) and combo-Tanimoto (<it>ComboT</it><sup><it>ST-opt</it></sup>), and color-optimized color-Tanimoto (<it>CT</it><sup><it>CT-opt</it></sup>) and combo-Tanimoto (<it>ComboT</it><sup><it>CT-opt</it></sup>). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble’s RMSD to the bioactive conformer (by 0.18 ± 0.12 Å), and decreasing the <it>ST</it><sup><it>ST-opt</it></sup>, <it>ComboT</it><sup><it>ST-opt</it></sup>, <it>CT</it><sup><it>CT-opt</it></sup>, and <it>ComboT</it><sup><it>CT-opt</it></sup> scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively).</p> <p>Conclusion</p> <p>This study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of <it>ST</it><sup><it>ST-opt</it></sup>, <it>ComboT</it><sup><it>ST-opt</it></sup>, <it>CT</it><sup><it>CT-opt</it></sup>, and <it>ComboT</it><sup><it>CT-opt</it></sup>, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.</p>
first_indexed 2024-12-20T00:35:22Z
format Article
id doaj.art-b5111ecaea8241259801756773061488
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-20T00:35:22Z
publishDate 2013-01-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-b5111ecaea82412598017567730614882022-12-21T19:59:46ZengBMCJournal of Cheminformatics1758-29462013-01-0151110.1186/1758-2946-5-1PubChem3D: conformer ensemble accuracyKim SunghwanBolton Evan EBryant Stephen H<p>Abstract</p> <p>Background</p> <p>PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined “bioactive” conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules.</p> <p>Results</p> <p>The conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (<it>ST</it><sup><it>ST-opt</it></sup>) and combo-Tanimoto (<it>ComboT</it><sup><it>ST-opt</it></sup>), and color-optimized color-Tanimoto (<it>CT</it><sup><it>CT-opt</it></sup>) and combo-Tanimoto (<it>ComboT</it><sup><it>CT-opt</it></sup>). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble’s RMSD to the bioactive conformer (by 0.18 ± 0.12 Å), and decreasing the <it>ST</it><sup><it>ST-opt</it></sup>, <it>ComboT</it><sup><it>ST-opt</it></sup>, <it>CT</it><sup><it>CT-opt</it></sup>, and <it>ComboT</it><sup><it>CT-opt</it></sup> scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively).</p> <p>Conclusion</p> <p>This study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of <it>ST</it><sup><it>ST-opt</it></sup>, <it>ComboT</it><sup><it>ST-opt</it></sup>, <it>CT</it><sup><it>CT-opt</it></sup>, and <it>ComboT</it><sup><it>CT-opt</it></sup>, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.</p>http://www.jcheminf.com/content/5/1/1
spellingShingle Kim Sunghwan
Bolton Evan E
Bryant Stephen H
PubChem3D: conformer ensemble accuracy
Journal of Cheminformatics
title PubChem3D: conformer ensemble accuracy
title_full PubChem3D: conformer ensemble accuracy
title_fullStr PubChem3D: conformer ensemble accuracy
title_full_unstemmed PubChem3D: conformer ensemble accuracy
title_short PubChem3D: conformer ensemble accuracy
title_sort pubchem3d conformer ensemble accuracy
url http://www.jcheminf.com/content/5/1/1
work_keys_str_mv AT kimsunghwan pubchem3dconformerensembleaccuracy
AT boltonevane pubchem3dconformerensembleaccuracy
AT bryantstephenh pubchem3dconformerensembleaccuracy