SMILES-based deep generative scaffold decorator for de-novo drug design

Abstract Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-b...

Full description

Bibliographic Details
Main Authors: Josep Arús-Pous, Atanas Patronov, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen, Ola Engkvist
Format: Article
Language:English
Published: BMC 2020-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13321-020-00441-8
_version_ 1819111975153565696
author Josep Arús-Pous
Atanas Patronov
Esben Jannik Bjerrum
Christian Tyrchan
Jean-Louis Reymond
Hongming Chen
Ola Engkvist
author_facet Josep Arús-Pous
Atanas Patronov
Esben Jannik Bjerrum
Christian Tyrchan
Jean-Louis Reymond
Hongming Chen
Ola Engkvist
author_sort Josep Arús-Pous
collection DOAJ
description Abstract Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.
first_indexed 2024-12-22T04:06:09Z
format Article
id doaj.art-c20116d0411b447eaa88d36151b88006
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-22T04:06:09Z
publishDate 2020-05-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-c20116d0411b447eaa88d36151b880062022-12-21T18:39:37ZengBMCJournal of Cheminformatics1758-29462020-05-0112111810.1186/s13321-020-00441-8SMILES-based deep generative scaffold decorator for de-novo drug designJosep Arús-Pous0Atanas Patronov1Esben Jannik Bjerrum2Christian Tyrchan3Jean-Louis Reymond4Hongming Chen5Ola Engkvist6Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZenecaMolecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZenecaMolecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZenecaMedicinal Chemistry, Respiratory Inflammation, and Autoimmune (RIA), BioPharmaceutical R&D, AstraZenecaDepartment of Chemistry and Biochemistry, University of BernChemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health -Guangdong LaboratoryMolecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZenecaAbstract Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.http://link.springer.com/article/10.1186/s13321-020-00441-8Deep learningGenerative modelsSMILESRandomized SMILESRecurrent neural networksFragment-based drug discovery
spellingShingle Josep Arús-Pous
Atanas Patronov
Esben Jannik Bjerrum
Christian Tyrchan
Jean-Louis Reymond
Hongming Chen
Ola Engkvist
SMILES-based deep generative scaffold decorator for de-novo drug design
Journal of Cheminformatics
Deep learning
Generative models
SMILES
Randomized SMILES
Recurrent neural networks
Fragment-based drug discovery
title SMILES-based deep generative scaffold decorator for de-novo drug design
title_full SMILES-based deep generative scaffold decorator for de-novo drug design
title_fullStr SMILES-based deep generative scaffold decorator for de-novo drug design
title_full_unstemmed SMILES-based deep generative scaffold decorator for de-novo drug design
title_short SMILES-based deep generative scaffold decorator for de-novo drug design
title_sort smiles based deep generative scaffold decorator for de novo drug design
topic Deep learning
Generative models
SMILES
Randomized SMILES
Recurrent neural networks
Fragment-based drug discovery
url http://link.springer.com/article/10.1186/s13321-020-00441-8
work_keys_str_mv AT joseparuspous smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT atanaspatronov smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT esbenjannikbjerrum smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT christiantyrchan smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT jeanlouisreymond smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT hongmingchen smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT olaengkvist smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign