A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input
COSMOSAC is a model that allows apriori predictions of activity coefficients for characterizing solute-solvent interactions. The method requires the input of sigma profile, the charge distribution on the surface of the molecules, which can be obtained through quantum mechanics calculation. Since Sig...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-03-01
|
Series: | Digital Chemical Engineering |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772508122000072 |
_version_ | 1818756514662318080 |
---|---|
author | Jia-Lin Kang Chen-Tse Chiu Jau Shiue Huang David Shan-Hill Wong |
author_facet | Jia-Lin Kang Chen-Tse Chiu Jau Shiue Huang David Shan-Hill Wong |
author_sort | Jia-Lin Kang |
collection | DOAJ |
description | COSMOSAC is a model that allows apriori predictions of activity coefficients for characterizing solute-solvent interactions. The method requires the input of sigma profile, the charge distribution on the surface of the molecules, which can be obtained through quantum mechanics calculation. Since Sigma profile is a unique function of molecular structure, it is desirable that they can be obtained using a surrogate model of the quantum computation with a molecular description as input. Previously, a model, the Universal Digital Chemical Space (UDCS), that was developed that allowed us to calculate the Sigma profiles used Simplified Molecular-Input-Line-Entry system (SMILES) as input. In this work, an improved version of this approach was developed using a Transformer model to encode the SMILES text string. Successive input elements in the text string, known as K-mers was also encoded and errors of predicted moments of Sigma profiles and prediction of activity coefficient of reference solvents were also considered as in the loss function. Results showed that while the prediction accuracy of Sigma profile (coefficient of determination R2) were not significantly improved, prediction accuracy of the first and second moment, especially the poorer ranked results; as well as the activity coefficients can be significantly improved with the inclusion of higher K-mers. Further improvement can be achieved with the inclusion of activity loss which substantially improved the accuracy of the 5th and 25th percentile of the moment loss and the activity coefficient of the species in n-hexane. |
first_indexed | 2024-12-18T05:56:15Z |
format | Article |
id | doaj.art-88af2d7bb6c14cabafcf126e7a2e117d |
institution | Directory Open Access Journal |
issn | 2772-5081 |
language | English |
last_indexed | 2024-12-18T05:56:15Z |
publishDate | 2022-03-01 |
publisher | Elsevier |
record_format | Article |
series | Digital Chemical Engineering |
spelling | doaj.art-88af2d7bb6c14cabafcf126e7a2e117d2022-12-21T21:18:47ZengElsevierDigital Chemical Engineering2772-50812022-03-012100016A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES inputJia-Lin Kang0Chen-Tse Chiu1Jau Shiue Huang2David Shan-Hill Wong3Department of Chemical and Material Engineering, National Yunlin University of Science and Technology, Yunlin 64002, Taiwan; Corresponding authors.Department of Chemical and Material Engineering, National Yunlin University of Science and Technology, Yunlin 64002, TaiwanDepartment of Chemical Engineering, National Tsing Hua University, Hsinchu 30013, TaiwanDepartment of Chemical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan; Corresponding authors.COSMOSAC is a model that allows apriori predictions of activity coefficients for characterizing solute-solvent interactions. The method requires the input of sigma profile, the charge distribution on the surface of the molecules, which can be obtained through quantum mechanics calculation. Since Sigma profile is a unique function of molecular structure, it is desirable that they can be obtained using a surrogate model of the quantum computation with a molecular description as input. Previously, a model, the Universal Digital Chemical Space (UDCS), that was developed that allowed us to calculate the Sigma profiles used Simplified Molecular-Input-Line-Entry system (SMILES) as input. In this work, an improved version of this approach was developed using a Transformer model to encode the SMILES text string. Successive input elements in the text string, known as K-mers was also encoded and errors of predicted moments of Sigma profiles and prediction of activity coefficient of reference solvents were also considered as in the loss function. Results showed that while the prediction accuracy of Sigma profile (coefficient of determination R2) were not significantly improved, prediction accuracy of the first and second moment, especially the poorer ranked results; as well as the activity coefficients can be significantly improved with the inclusion of higher K-mers. Further improvement can be achieved with the inclusion of activity loss which substantially improved the accuracy of the 5th and 25th percentile of the moment loss and the activity coefficient of the species in n-hexane.http://www.sciencedirect.com/science/article/pii/S2772508122000072TransformerSMILESK-mer,Sigma profileCOSMOSAC activity coefficient |
spellingShingle | Jia-Lin Kang Chen-Tse Chiu Jau Shiue Huang David Shan-Hill Wong A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input Digital Chemical Engineering Transformer SMILES K-mer,Sigma profile COSMOSAC activity coefficient |
title | A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input |
title_full | A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input |
title_fullStr | A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input |
title_full_unstemmed | A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input |
title_short | A surrogate model of sigma profile and COSMOSAC activity coefficient predictions of using transformer with SMILES input |
title_sort | surrogate model of sigma profile and cosmosac activity coefficient predictions of using transformer with smiles input |
topic | Transformer SMILES K-mer,Sigma profile COSMOSAC activity coefficient |
url | http://www.sciencedirect.com/science/article/pii/S2772508122000072 |
work_keys_str_mv | AT jialinkang asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT chentsechiu asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT jaushiuehuang asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT davidshanhillwong asurrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT jialinkang surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT chentsechiu surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT jaushiuehuang surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput AT davidshanhillwong surrogatemodelofsigmaprofileandcosmosacactivitycoefficientpredictionsofusingtransformerwithsmilesinput |