A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from mu...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/27/8/2513 |
_version_ | 1797444604650323968 |
---|---|
author | Laura Isigkeit Apirat Chaikuad Daniel Merk |
author_facet | Laura Isigkeit Apirat Chaikuad Daniel Merk |
author_sort | Laura Isigkeit |
collection | DOAJ |
description | Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics. |
first_indexed | 2024-03-09T13:14:59Z |
format | Article |
id | doaj.art-de1193a8148f441eb40e9108ebc86b5b |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-09T13:14:59Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-de1193a8148f441eb40e9108ebc86b5b2023-11-30T21:38:06ZengMDPI AGMolecules1420-30492022-04-01278251310.3390/molecules27082513A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and ChemogenomicsLaura Isigkeit0Apirat Chaikuad1Daniel Merk2Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, GermanyInstitute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, GermanyInstitute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, GermanyPublicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.https://www.mdpi.com/1420-3049/27/8/2513big datadata curationmedicinal chemistrymachine learningde novo design |
spellingShingle | Laura Isigkeit Apirat Chaikuad Daniel Merk A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics Molecules big data data curation medicinal chemistry machine learning de novo design |
title | A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics |
title_full | A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics |
title_fullStr | A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics |
title_full_unstemmed | A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics |
title_short | A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics |
title_sort | consensus compound bioactivity dataset for data driven drug design and chemogenomics |
topic | big data data curation medicinal chemistry machine learning de novo design |
url | https://www.mdpi.com/1420-3049/27/8/2513 |
work_keys_str_mv | AT lauraisigkeit aconsensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics AT apiratchaikuad aconsensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics AT danielmerk aconsensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics AT lauraisigkeit consensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics AT apiratchaikuad consensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics AT danielmerk consensuscompoundbioactivitydatasetfordatadrivendrugdesignandchemogenomics |