Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts i...

Full description

Bibliographic Details
Main Authors: Mariusz Butkiewicz, Edward W. Lowe, Ralf Mueller, Jeffrey L. Mendenhall, Pedro L. Teixeira, C. David Weaver, Jens Meiler
Format: Article
Language:English
Published: MDPI AG 2013-01-01
Series:Molecules
Subjects:
Online Access:http://www.mdpi.com/1420-3049/18/1/735
_version_ 1818024022168829952
author Mariusz Butkiewicz
Edward W. Lowe
Ralf Mueller
Jeffrey L. Mendenhall
Pedro L. Teixeira
C. David Weaver
Jens Meiler
author_facet Mariusz Butkiewicz
Edward W. Lowe
Ralf Mueller
Jeffrey L. Mendenhall
Pedro L. Teixeira
C. David Weaver
Jens Meiler
author_sort Mariusz Butkiewicz
collection DOAJ
description With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.
first_indexed 2024-12-10T03:53:36Z
format Article
id doaj.art-05810d011c6e4ec4840b7636bed37de9
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-12-10T03:53:36Z
publishDate 2013-01-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-05810d011c6e4ec4840b7636bed37de92022-12-22T02:03:11ZengMDPI AGMolecules1420-30492013-01-0118173575610.3390/molecules18010735Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem DatabaseMariusz ButkiewiczEdward W. LoweRalf MuellerJeffrey L. MendenhallPedro L. TeixeiraC. David WeaverJens MeilerWith the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.http://www.mdpi.com/1420-3049/18/1/735virtual screeningmachine learningquantitative structure-activity relations (QSAR)high-throughput screening (HTS)cheminformaticsPubChemBCL
spellingShingle Mariusz Butkiewicz
Edward W. Lowe
Ralf Mueller
Jeffrey L. Mendenhall
Pedro L. Teixeira
C. David Weaver
Jens Meiler
Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
Molecules
virtual screening
machine learning
quantitative structure-activity relations (QSAR)
high-throughput screening (HTS)
cheminformatics
PubChem
BCL
title Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
title_full Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
title_fullStr Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
title_full_unstemmed Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
title_short Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
title_sort benchmarking ligand based virtual high throughput screening with the pubchem database
topic virtual screening
machine learning
quantitative structure-activity relations (QSAR)
high-throughput screening (HTS)
cheminformatics
PubChem
BCL
url http://www.mdpi.com/1420-3049/18/1/735
work_keys_str_mv AT mariuszbutkiewicz benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase
AT edwardwlowe benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase
AT ralfmueller benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase
AT jeffreylmendenhall benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase
AT pedrolteixeira benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase
AT cdavidweaver benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase
AT jensmeiler benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase