Natural products subsets: Generation and characterization

Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp3 carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups....

Full description

Bibliographic Details
Main Authors: Ana L. Chávez-Hernández, José L. Medina-Franco
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:Artificial Intelligence in the Life Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667318523000107
_version_ 1797797094600212480
author Ana L. Chávez-Hernández
José L. Medina-Franco
author_facet Ana L. Chávez-Hernández
José L. Medina-Franco
author_sort Ana L. Chávez-Hernández
collection DOAJ
description Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp3 carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in de novo design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at https://github.com/DIFACQUIM/Natural-products-subsets-generation. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.
first_indexed 2024-03-13T03:44:00Z
format Article
id doaj.art-d4c3b38eabb54b57bb4e2106619daf63
institution Directory Open Access Journal
issn 2667-3185
language English
last_indexed 2024-03-13T03:44:00Z
publishDate 2023-12-01
publisher Elsevier
record_format Article
series Artificial Intelligence in the Life Sciences
spelling doaj.art-d4c3b38eabb54b57bb4e2106619daf632023-06-23T04:45:03ZengElsevierArtificial Intelligence in the Life Sciences2667-31852023-12-013100066Natural products subsets: Generation and characterizationAna L. Chávez-Hernández0José L. Medina-Franco1DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, México City 04510, MexicoCorrespondence author.; DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, México City 04510, MexicoNatural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp3 carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in de novo design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at https://github.com/DIFACQUIM/Natural-products-subsets-generation. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.http://www.sciencedirect.com/science/article/pii/S2667318523000107Artificial intelligenceChemical spaceChemical multiverseChiralityDe novo designDeep learning
spellingShingle Ana L. Chávez-Hernández
José L. Medina-Franco
Natural products subsets: Generation and characterization
Artificial Intelligence in the Life Sciences
Artificial intelligence
Chemical space
Chemical multiverse
Chirality
De novo design
Deep learning
title Natural products subsets: Generation and characterization
title_full Natural products subsets: Generation and characterization
title_fullStr Natural products subsets: Generation and characterization
title_full_unstemmed Natural products subsets: Generation and characterization
title_short Natural products subsets: Generation and characterization
title_sort natural products subsets generation and characterization
topic Artificial intelligence
Chemical space
Chemical multiverse
Chirality
De novo design
Deep learning
url http://www.sciencedirect.com/science/article/pii/S2667318523000107
work_keys_str_mv AT analchavezhernandez naturalproductssubsetsgenerationandcharacterization
AT joselmedinafranco naturalproductssubsetsgenerationandcharacterization