STOUT: SMILES to IUPAC names using neural machine translation
Abstract Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the c...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-04-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-021-00512-4 |
_version_ | 1818866015869599744 |
---|---|
author | Kohulan Rajan Achim Zielesny Christoph Steinbeck |
author_facet | Kohulan Rajan Achim Zielesny Christoph Steinbeck |
author_sort | Kohulan Rajan |
collection | DOAJ |
description | Abstract Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds. |
first_indexed | 2024-12-19T10:56:44Z |
format | Article |
id | doaj.art-047c066be151498aaa46e25b8ed3f0c2 |
institution | Directory Open Access Journal |
issn | 1758-2946 |
language | English |
last_indexed | 2024-12-19T10:56:44Z |
publishDate | 2021-04-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj.art-047c066be151498aaa46e25b8ed3f0c22022-12-21T20:24:47ZengBMCJournal of Cheminformatics1758-29462021-04-0113111410.1186/s13321-021-00512-4STOUT: SMILES to IUPAC names using neural machine translationKohulan Rajan0Achim Zielesny1Christoph Steinbeck2Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University JenaInstitute for Bioinformatics and Chemoinformatics, Westphalian University of Applied SciencesInstitute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University JenaAbstract Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.https://doi.org/10.1186/s13321-021-00512-4Neural machine translationChemical languageIUPAC namesSMILESDeepSMILESSELFIES |
spellingShingle | Kohulan Rajan Achim Zielesny Christoph Steinbeck STOUT: SMILES to IUPAC names using neural machine translation Journal of Cheminformatics Neural machine translation Chemical language IUPAC names SMILES DeepSMILES SELFIES |
title | STOUT: SMILES to IUPAC names using neural machine translation |
title_full | STOUT: SMILES to IUPAC names using neural machine translation |
title_fullStr | STOUT: SMILES to IUPAC names using neural machine translation |
title_full_unstemmed | STOUT: SMILES to IUPAC names using neural machine translation |
title_short | STOUT: SMILES to IUPAC names using neural machine translation |
title_sort | stout smiles to iupac names using neural machine translation |
topic | Neural machine translation Chemical language IUPAC names SMILES DeepSMILES SELFIES |
url | https://doi.org/10.1186/s13321-021-00512-4 |
work_keys_str_mv | AT kohulanrajan stoutsmilestoiupacnamesusingneuralmachinetranslation AT achimzielesny stoutsmilestoiupacnamesusingneuralmachinetranslation AT christophsteinbeck stoutsmilestoiupacnamesusingneuralmachinetranslation |