Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization
Abstract Tokenization is an important preprocessing step in natural language processing that may have a significant influence on prediction quality. This research showed that the traditional SMILES tokenization has a certain limitation that results in tokens failing to reflect the true nature of mol...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-05-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-023-00725-9 |