Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization

Abstract Tokenization is an important preprocessing step in natural language processing that may have a significant influence on prediction quality. This research showed that the traditional SMILES tokenization has a certain limitation that results in tokens failing to reflect the true nature of mol...

Full description

Bibliographic Details
Main Authors: Umit V. Ucak, Islambek Ashyrmamatov, Juyong Lee
Format: Article
Language:English
Published: BMC 2023-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-023-00725-9