Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning

Deep reinforcement learning methods have been shown to be potentially powerful tools for de novo design. Recurrent-neural-network-based techniques are the most widely used methods in this space. In this work we examine the behaviour of recurrent-neural-network-based methods when there are few (or no...

Full description

Bibliographic Details
Main Authors:	Mokaya, M, Imrie, F, van Hoorn, WP, Kalisz, A, Bradley, AR, Deane, CM
Format:	Journal article
Language:	English
Published:	Springer Nature 2023

_version_	1797110301969088512
author	Mokaya, M Imrie, F van Hoorn, WP Kalisz, A Bradley, AR Deane, CM
author_facet	Mokaya, M Imrie, F van Hoorn, WP Kalisz, A Bradley, AR Deane, CM
author_sort	Mokaya, M
collection	OXFORD
description	Deep reinforcement learning methods have been shown to be potentially powerful tools for de novo design. Recurrent-neural-network-based techniques are the most widely used methods in this space. In this work we examine the behaviour of recurrent-neural-network-based methods when there are few (or no) examples of molecules with the desired properties in the training data. We find that targeted molecular generation is usually possible, but the diversity of generated molecules is often reduced and it is not possible to control the composition of generated molecular sets. To help overcome these issues, we propose a new curriculum-learning-inspired recurrent iterative optimization procedure that enables the optimization of generated molecules for seen and unseen molecular profiles, and allows the user to control whether a molecular profile is explored or exploited. Using our method, we generate specific and diverse sets of molecules with up to 18 times more scaffolds than standard methods for the same sample size; however, our results also point to substantial limitations of one-dimensional molecular representations, as used in this space. We find that the success or failure of a given molecular optimization problem depends on the choice of simplified molecular-input line-entry system (SMILES).
first_indexed	2024-03-07T07:53:03Z
format	Journal article
id	oxford-uuid:6d93cdbb-f3d2-4bb8-b2b8-42467d876fc3
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:53:03Z
publishDate	2023
publisher	Springer Nature
record_format	dspace
spelling	oxford-uuid:6d93cdbb-f3d2-4bb8-b2b8-42467d876fc32023-07-27T09:33:36ZTesting the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learningJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6d93cdbb-f3d2-4bb8-b2b8-42467d876fc3EnglishSymplectic ElementsSpringer Nature2023Mokaya, MImrie, Fvan Hoorn, WPKalisz, ABradley, ARDeane, CMDeep reinforcement learning methods have been shown to be potentially powerful tools for de novo design. Recurrent-neural-network-based techniques are the most widely used methods in this space. In this work we examine the behaviour of recurrent-neural-network-based methods when there are few (or no) examples of molecules with the desired properties in the training data. We find that targeted molecular generation is usually possible, but the diversity of generated molecules is often reduced and it is not possible to control the composition of generated molecular sets. To help overcome these issues, we propose a new curriculum-learning-inspired recurrent iterative optimization procedure that enables the optimization of generated molecules for seen and unseen molecular profiles, and allows the user to control whether a molecular profile is explored or exploited. Using our method, we generate specific and diverse sets of molecules with up to 18 times more scaffolds than standard methods for the same sample size; however, our results also point to substantial limitations of one-dimensional molecular representations, as used in this space. We find that the success or failure of a given molecular optimization problem depends on the choice of simplified molecular-input line-entry system (SMILES).
spellingShingle	Mokaya, M Imrie, F van Hoorn, WP Kalisz, A Bradley, AR Deane, CM Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
title	Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
title_full	Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
title_fullStr	Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
title_full_unstemmed	Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
title_short	Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
title_sort	testing the limits of smiles based de novo molecular generation with curriculum and deep reinforcement learning
work_keys_str_mv	AT mokayam testingthelimitsofsmilesbaseddenovomoleculargenerationwithcurriculumanddeepreinforcementlearning AT imrief testingthelimitsofsmilesbaseddenovomoleculargenerationwithcurriculumanddeepreinforcementlearning AT vanhoornwp testingthelimitsofsmilesbaseddenovomoleculargenerationwithcurriculumanddeepreinforcementlearning AT kalisza testingthelimitsofsmilesbaseddenovomoleculargenerationwithcurriculumanddeepreinforcementlearning AT bradleyar testingthelimitsofsmilesbaseddenovomoleculargenerationwithcurriculumanddeepreinforcementlearning AT deanecm testingthelimitsofsmilesbaseddenovomoleculargenerationwithcurriculumanddeepreinforcementlearning

Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning

Similar Items