Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts i...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2013-01-01
|
Series: | Molecules |
Subjects: | |
Online Access: | http://www.mdpi.com/1420-3049/18/1/735 |
_version_ | 1818024022168829952 |
---|---|
author | Mariusz Butkiewicz Edward W. Lowe Ralf Mueller Jeffrey L. Mendenhall Pedro L. Teixeira C. David Weaver Jens Meiler |
author_facet | Mariusz Butkiewicz Edward W. Lowe Ralf Mueller Jeffrey L. Mendenhall Pedro L. Teixeira C. David Weaver Jens Meiler |
author_sort | Mariusz Butkiewicz |
collection | DOAJ |
description | With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed. |
first_indexed | 2024-12-10T03:53:36Z |
format | Article |
id | doaj.art-05810d011c6e4ec4840b7636bed37de9 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-12-10T03:53:36Z |
publishDate | 2013-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-05810d011c6e4ec4840b7636bed37de92022-12-22T02:03:11ZengMDPI AGMolecules1420-30492013-01-0118173575610.3390/molecules18010735Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem DatabaseMariusz ButkiewiczEdward W. LoweRalf MuellerJeffrey L. MendenhallPedro L. TeixeiraC. David WeaverJens MeilerWith the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.http://www.mdpi.com/1420-3049/18/1/735virtual screeningmachine learningquantitative structure-activity relations (QSAR)high-throughput screening (HTS)cheminformaticsPubChemBCL |
spellingShingle | Mariusz Butkiewicz Edward W. Lowe Ralf Mueller Jeffrey L. Mendenhall Pedro L. Teixeira C. David Weaver Jens Meiler Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database Molecules virtual screening machine learning quantitative structure-activity relations (QSAR) high-throughput screening (HTS) cheminformatics PubChem BCL |
title | Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database |
title_full | Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database |
title_fullStr | Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database |
title_full_unstemmed | Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database |
title_short | Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database |
title_sort | benchmarking ligand based virtual high throughput screening with the pubchem database |
topic | virtual screening machine learning quantitative structure-activity relations (QSAR) high-throughput screening (HTS) cheminformatics PubChem BCL |
url | http://www.mdpi.com/1420-3049/18/1/735 |
work_keys_str_mv | AT mariuszbutkiewicz benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase AT edwardwlowe benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase AT ralfmueller benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase AT jeffreylmendenhall benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase AT pedrolteixeira benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase AT cdavidweaver benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase AT jensmeiler benchmarkingligandbasedvirtualhighthroughputscreeningwiththepubchemdatabase |