Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natu...

Full description

Bibliographic Details
Main Authors: Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, Mark P. Waller
Format: Article
Language:English
Published: American Chemical Society 2017-12-01
Series:ACS Central Science
Online Access:http://dx.doi.org/10.1021/acscentsci.7b00512
_version_ 1818506889916317696
author Marwin H. S. Segler
Thierry Kogej
Christian Tyrchan
Mark P. Waller
author_facet Marwin H. S. Segler
Thierry Kogej
Christian Tyrchan
Mark P. Waller
author_sort Marwin H. S. Segler
collection DOAJ
description In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.
first_indexed 2024-12-10T22:10:48Z
format Article
id doaj.art-7a39575145a141cf9f675a73d24123e3
institution Directory Open Access Journal
issn 2374-7943
2374-7951
language English
last_indexed 2024-12-10T22:10:48Z
publishDate 2017-12-01
publisher American Chemical Society
record_format Article
series ACS Central Science
spelling doaj.art-7a39575145a141cf9f675a73d24123e32022-12-22T01:31:36ZengAmerican Chemical SocietyACS Central Science2374-79432374-79512017-12-014112013110.1021/acscentsci.7b00512Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural NetworksMarwin H. S. Segler0Thierry Kogej1Christian Tyrchan2Mark P. Waller3Institute of Organic Chemistry & Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, Münster, GermanyHit Discovery, Discovery Sciences, AstraZeneca R&D, Gothenburg, SwedenDepartment of Medicinal Chemistry, IMED RIA, AstraZeneca R&D, Gothenburg, SwedenDepartment of Physics & International Centre for Quantum and Molecular Structures, Shanghai University, Shanghai, ChinaIn de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.http://dx.doi.org/10.1021/acscentsci.7b00512
spellingShingle Marwin H. S. Segler
Thierry Kogej
Christian Tyrchan
Mark P. Waller
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
ACS Central Science
title Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
title_full Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
title_fullStr Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
title_full_unstemmed Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
title_short Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
title_sort generating focused molecule libraries for drug discovery with recurrent neural networks
url http://dx.doi.org/10.1021/acscentsci.7b00512
work_keys_str_mv AT marwinhssegler generatingfocusedmoleculelibrariesfordrugdiscoverywithrecurrentneuralnetworks
AT thierrykogej generatingfocusedmoleculelibrariesfordrugdiscoverywithrecurrentneuralnetworks
AT christiantyrchan generatingfocusedmoleculelibrariesfordrugdiscoverywithrecurrentneuralnetworks
AT markpwaller generatingfocusedmoleculelibrariesfordrugdiscoverywithrecurrentneuralnetworks