A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics

Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from mu...

Full description

Bibliographic Details
Main Authors: Laura Isigkeit, Apirat Chaikuad, Daniel Merk
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/27/8/2513
_version_ 1797444604650323968
author Laura Isigkeit
Apirat Chaikuad
Daniel Merk
author_facet Laura Isigkeit
Apirat Chaikuad
Daniel Merk
author_sort Laura Isigkeit
collection DOAJ
description Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.
first_indexed 2024-03-09T13:14:59Z
format Article
id doaj.art-de1193a8148f441eb40e9108ebc86b5b
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-09T13:14:59Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-de1193a8148f441eb40e9108ebc86b5b2023-11-30T21:38:06ZengMDPI AGMolecules1420-30492022-04-01278251310.3390/molecules27082513A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and ChemogenomicsLaura Isigkeit0Apirat Chaikuad1Daniel Merk2Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, GermanyInstitute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, GermanyInstitute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, GermanyPublicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.https://www.mdpi.com/1420-3049/27/8/2513big datadata curationmedicinal chemistrymachine learningde novo design
spellingShingle Laura Isigkeit
Apirat Chaikuad
Daniel Merk
A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
Molecules
big data
data curation
medicinal chemistry
machine learning
de novo design
title A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
title_full A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
title_fullStr A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
title_full_unstemmed A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
title_short A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
title_sort consensus compound bioactivity dataset for data driven drug design and chemogenomics
topic big data
data curation
medicinal chemistry
machine learning
de novo design
url https://www.mdpi.com/1420-3049/27/8/2513
work_keys_str_mv AT lauraisigkeit aconsensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics
AT apiratchaikuad aconsensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics
AT danielmerk aconsensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics
AT lauraisigkeit consensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics
AT apiratchaikuad consensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics
AT danielmerk consensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics