A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input

COSMOSAC is a model that allows apriori predictions of activity coefficients for characterizing solute-solvent interactions. The method requires the input of sigma profile, the charge distribution on the surface of the molecules, which can be obtained through quantum mechanics calculation. Since Sig...

Full description

Bibliographic Details
Main Authors: Jia-Lin Kang, Chen-Tse Chiu, Jau Shiue Huang, David Shan-Hill Wong
Format: Article
Language:English
Published: Elsevier 2022-03-01
Series:Digital Chemical Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772508122000072
_version_ 1818756514662318080
author Jia-Lin Kang
Chen-Tse Chiu
Jau Shiue Huang
David Shan-Hill Wong
author_facet Jia-Lin Kang
Chen-Tse Chiu
Jau Shiue Huang
David Shan-Hill Wong
author_sort Jia-Lin Kang
collection DOAJ
description COSMOSAC is a model that allows apriori predictions of activity coefficients for characterizing solute-solvent interactions. The method requires the input of sigma profile, the charge distribution on the surface of the molecules, which can be obtained through quantum mechanics calculation. Since Sigma profile is a unique function of molecular structure, it is desirable that they can be obtained using a surrogate model of the quantum computation with a molecular description as input. Previously, a model, the Universal Digital Chemical Space (UDCS), that was developed that allowed us to calculate the Sigma profiles used Simplified Molecular-Input-Line-Entry system (SMILES) as input. In this work, an improved version of this approach was developed using a Transformer model to encode the SMILES text string. Successive input elements in the text string, known as K-mers was also encoded and errors of predicted moments of Sigma profiles and prediction of activity coefficient of reference solvents were also considered as in the loss function. Results showed that while the prediction accuracy of Sigma profile (coefficient of determination R2) were not significantly improved, prediction accuracy of the first and second moment, especially the poorer ranked results; as well as the activity coefficients can be significantly improved with the inclusion of higher K-mers. Further improvement can be achieved with the inclusion of activity loss which substantially improved the accuracy of the 5th and 25th percentile of the moment loss and the activity coefficient of the species in n-hexane.
first_indexed 2024-12-18T05:56:15Z
format Article
id doaj.art-88af2d7bb6c14cabafcf126e7a2e117d
institution Directory Open Access Journal
issn 2772-5081
language English
last_indexed 2024-12-18T05:56:15Z
publishDate 2022-03-01
publisher Elsevier
record_format Article
series Digital Chemical Engineering
spelling doaj.art-88af2d7bb6c14cabafcf126e7a2e117d2022-12-21T21:18:47ZengElsevierDigital Chemical Engineering2772-50812022-03-012100016A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES inputJia-Lin Kang0Chen-Tse Chiu1Jau Shiue Huang2David Shan-Hill Wong3Department of Chemical and Material Engineering, National Yunlin University of Science and Technology, Yunlin 64002, Taiwan; Corresponding authors.Department of Chemical and Material Engineering, National Yunlin University of Science and Technology, Yunlin 64002, TaiwanDepartment of Chemical Engineering, National Tsing Hua University, Hsinchu 30013, TaiwanDepartment of Chemical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan; Corresponding authors.COSMOSAC is a model that allows apriori predictions of activity coefficients for characterizing solute-solvent interactions. The method requires the input of sigma profile, the charge distribution on the surface of the molecules, which can be obtained through quantum mechanics calculation. Since Sigma profile is a unique function of molecular structure, it is desirable that they can be obtained using a surrogate model of the quantum computation with a molecular description as input. Previously, a model, the Universal Digital Chemical Space (UDCS), that was developed that allowed us to calculate the Sigma profiles used Simplified Molecular-Input-Line-Entry system (SMILES) as input. In this work, an improved version of this approach was developed using a Transformer model to encode the SMILES text string. Successive input elements in the text string, known as K-mers was also encoded and errors of predicted moments of Sigma profiles and prediction of activity coefficient of reference solvents were also considered as in the loss function. Results showed that while the prediction accuracy of Sigma profile (coefficient of determination R2) were not significantly improved, prediction accuracy of the first and second moment, especially the poorer ranked results; as well as the activity coefficients can be significantly improved with the inclusion of higher K-mers. Further improvement can be achieved with the inclusion of activity loss which substantially improved the accuracy of the 5th and 25th percentile of the moment loss and the activity coefficient of the species in n-hexane.http://www.sciencedirect.com/science/article/pii/S2772508122000072TransformerSMILESK-mer,Sigma profileCOSMOSAC activity coefficient
spellingShingle Jia-Lin Kang
Chen-Tse Chiu
Jau Shiue Huang
David Shan-Hill Wong
A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
Digital Chemical Engineering
Transformer
SMILES
K-mer,Sigma profile
COSMOSAC activity coefficient
title A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
title_full A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
title_fullStr A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
title_full_unstemmed A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
title_short A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
title_sort surrogate model of sigma profile and cosmosac activity coefficient predictions of using transformer with smiles input
topic Transformer
SMILES
K-mer,Sigma profile
COSMOSAC activity coefficient
url http://www.sciencedirect.com/science/article/pii/S2772508122000072
work_keys_str_mv AT jialinkang asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT chentsechiu asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT jaushiuehuang asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT davidshanhillwong asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT jialinkang surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT chentsechiu surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT jaushiuehuang surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput
AT davidshanhillwong surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput