A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap

The bandgap (Egap) is the energy difference between the highest valence band and the lowest conduction band. Generally, the conductivity of a solid material increases as its Egap decreases. As the amount of experimental data stored in online databases continue to increase over the years, it has a...

Full description

Bibliographic Details
Main Author: Goh, Kai Leong
Other Authors: Lu Yunpeng
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157102
_version_ 1826124762851573760
author Goh, Kai Leong
author2 Lu Yunpeng
author_facet Lu Yunpeng
Goh, Kai Leong
author_sort Goh, Kai Leong
collection NTU
description The bandgap (Egap) is the energy difference between the highest valence band and the lowest conduction band. Generally, the conductivity of a solid material increases as its Egap decreases. As the amount of experimental data stored in online databases continue to increase over the years, it has allowed the possibility of using quantitative structure–property relationship (QSPR) modelling to predict the physical properties of synthetic materials. Recently, a paper by the Ramprasad Group has reported a highly accurate QSPR model for predicting the Egap values of a dataset of 4209 polymers. This paper presents an alternative QSPR model named LGB-Stack, which has achieved even higher accuracy scores using the same dataset. LGB-Stack performs a two-level stacked generalisation with the help of the LightGBM (Light Gradient Boosting Machine) algorithm, where multiple weak models are firstly trained, and secondly combined into a stronger final model. This paper also presents an extremely fast and efficient method of geometry optimisation that employs the Merck Molecular Force Field (MMFF). Prior to the actual model training, the Simplified Molecular Input Line Entry System (SMILES) notations of the polymers in the dataset were converted and optimised into 3D molecular objects using the MMFF method. Subsequently, four different molecular fingerprints were generated based on the 3D molecular objects, and used as the initial input features for training the weak models. The outputs of the weak models were used as the new input features for training the final model, which completes the LGB-Stack model training process.
first_indexed 2024-10-01T06:25:41Z
format Final Year Project (FYP)
id ntu-10356/157102
institution Nanyang Technological University
language English
last_indexed 2024-10-01T06:25:41Z
publishDate 2022
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1571022023-02-28T23:16:18Z A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap Goh, Kai Leong Lu Yunpeng School of Physical and Mathematical Sciences YPLu@ntu.edu.sg Science::Chemistry The bandgap (Egap) is the energy difference between the highest valence band and the lowest conduction band. Generally, the conductivity of a solid material increases as its Egap decreases. As the amount of experimental data stored in online databases continue to increase over the years, it has allowed the possibility of using quantitative structure–property relationship (QSPR) modelling to predict the physical properties of synthetic materials. Recently, a paper by the Ramprasad Group has reported a highly accurate QSPR model for predicting the Egap values of a dataset of 4209 polymers. This paper presents an alternative QSPR model named LGB-Stack, which has achieved even higher accuracy scores using the same dataset. LGB-Stack performs a two-level stacked generalisation with the help of the LightGBM (Light Gradient Boosting Machine) algorithm, where multiple weak models are firstly trained, and secondly combined into a stronger final model. This paper also presents an extremely fast and efficient method of geometry optimisation that employs the Merck Molecular Force Field (MMFF). Prior to the actual model training, the Simplified Molecular Input Line Entry System (SMILES) notations of the polymers in the dataset were converted and optimised into 3D molecular objects using the MMFF method. Subsequently, four different molecular fingerprints were generated based on the 3D molecular objects, and used as the initial input features for training the weak models. The outputs of the weak models were used as the new input features for training the final model, which completes the LGB-Stack model training process. Bachelor of Science in Chemistry and Biological Chemistry 2022-05-04T08:35:44Z 2022-05-04T08:35:44Z 2022 Final Year Project (FYP) Goh, K. L. (2022). A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157102 https://hdl.handle.net/10356/157102 en CHEM/21/095 application/pdf Nanyang Technological University
spellingShingle Science::Chemistry
Goh, Kai Leong
A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
title A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
title_full A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
title_fullStr A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
title_full_unstemmed A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
title_short A stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
title_sort stacked generalisation with gradient boosting for highly accurate predictions of polymer bandgap
topic Science::Chemistry
url https://hdl.handle.net/10356/157102
work_keys_str_mv AT gohkaileong astackedgeneralisationwithgradientboostingforhighlyaccuratepredictionsofpolymerbandgap
AT gohkaileong stackedgeneralisationwithgradientboostingforhighlyaccuratepredictionsofpolymerbandgap