Molecular generation by Fast Assembly of (Deep)SMILES fragments

Abstract Background In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. Results In this article, a simp...

Full description

Bibliographic Details
Main Authors: Francois Berenger, Koji Tsuda
Format: Article
Language:English
Published: BMC 2021-11-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-021-00566-4
_version_ 1819004554257104896
author Francois Berenger
Koji Tsuda
author_facet Francois Berenger
Koji Tsuda
author_sort Francois Berenger
collection DOAJ
description Abstract Background In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. Results In this article, a simple method is described to generate only valid molecules at high frequency ( $$>300,000$$ > 300 , 000 molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ( $$>340,000$$ > 340 , 000 molecule/s) because it relies almost exclusively on string operations. The “Fast Assembly of SMILES Fragments” software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.
first_indexed 2024-12-20T23:38:44Z
format Article
id doaj.art-5439dd3838a54210ab3db8f99d806114
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-20T23:38:44Z
publishDate 2021-11-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-5439dd3838a54210ab3db8f99d8061142022-12-21T19:23:09ZengBMCJournal of Cheminformatics1758-29462021-11-0113111010.1186/s13321-021-00566-4Molecular generation by Fast Assembly of (Deep)SMILES fragmentsFrancois Berenger0Koji Tsuda1Graduate School of Frontier Sciences, The University of TokyoGraduate School of Frontier Sciences, The University of TokyoAbstract Background In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. Results In this article, a simple method is described to generate only valid molecules at high frequency ( $$>300,000$$ > 300 , 000 molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ( $$>340,000$$ > 340 , 000 molecule/s) because it relies almost exclusively on string operations. The “Fast Assembly of SMILES Fragments” software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.https://doi.org/10.1186/s13321-021-00566-4Molecular generationMolecular fragmentsSMILESDeepSMILES
spellingShingle Francois Berenger
Koji Tsuda
Molecular generation by Fast Assembly of (Deep)SMILES fragments
Journal of Cheminformatics
Molecular generation
Molecular fragments
SMILES
DeepSMILES
title Molecular generation by Fast Assembly of (Deep)SMILES fragments
title_full Molecular generation by Fast Assembly of (Deep)SMILES fragments
title_fullStr Molecular generation by Fast Assembly of (Deep)SMILES fragments
title_full_unstemmed Molecular generation by Fast Assembly of (Deep)SMILES fragments
title_short Molecular generation by Fast Assembly of (Deep)SMILES fragments
title_sort molecular generation by fast assembly of deep smiles fragments
topic Molecular generation
Molecular fragments
SMILES
DeepSMILES
url https://doi.org/10.1186/s13321-021-00566-4
work_keys_str_mv AT francoisberenger moleculargenerationbyfastassemblyofdeepsmilesfragments
AT kojitsuda moleculargenerationbyfastassemblyofdeepsmilesfragments