STOUT: SMILES to IUPAC names using neural machine translation

Abstract Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the c...

Full description

Bibliographic Details
Main Authors: Kohulan Rajan, Achim Zielesny, Christoph Steinbeck
Format: Article
Language:English
Published: BMC 2021-04-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-021-00512-4
_version_ 1818866015869599744
author Kohulan Rajan
Achim Zielesny
Christoph Steinbeck
author_facet Kohulan Rajan
Achim Zielesny
Christoph Steinbeck
author_sort Kohulan Rajan
collection DOAJ
description Abstract Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.
first_indexed 2024-12-19T10:56:44Z
format Article
id doaj.art-047c066be151498aaa46e25b8ed3f0c2
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-19T10:56:44Z
publishDate 2021-04-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-047c066be151498aaa46e25b8ed3f0c22022-12-21T20:24:47ZengBMCJournal of Cheminformatics1758-29462021-04-0113111410.1186/s13321-021-00512-4STOUT: SMILES to IUPAC names using neural machine translationKohulan Rajan0Achim Zielesny1Christoph Steinbeck2Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University JenaInstitute for Bioinformatics and Chemoinformatics, Westphalian University of Applied SciencesInstitute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University JenaAbstract Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.https://doi.org/10.1186/s13321-021-00512-4Neural machine translationChemical languageIUPAC namesSMILESDeepSMILESSELFIES
spellingShingle Kohulan Rajan
Achim Zielesny
Christoph Steinbeck
STOUT: SMILES to IUPAC names using neural machine translation
Journal of Cheminformatics
Neural machine translation
Chemical language
IUPAC names
SMILES
DeepSMILES
SELFIES
title STOUT: SMILES to IUPAC names using neural machine translation
title_full STOUT: SMILES to IUPAC names using neural machine translation
title_fullStr STOUT: SMILES to IUPAC names using neural machine translation
title_full_unstemmed STOUT: SMILES to IUPAC names using neural machine translation
title_short STOUT: SMILES to IUPAC names using neural machine translation
title_sort stout smiles to iupac names using neural machine translation
topic Neural machine translation
Chemical language
IUPAC names
SMILES
DeepSMILES
SELFIES
url https://doi.org/10.1186/s13321-021-00512-4
work_keys_str_mv AT kohulanrajan stoutsmilestoiupacnamesusingneuralmachinetranslation
AT achimzielesny stoutsmilestoiupacnamesusingneuralmachinetranslation
AT christophsteinbeck stoutsmilestoiupacnamesusingneuralmachinetranslation