A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts

A recent trend in chemical synthesis is photo-catalysis, which uses photo-active catalyst materials that are semiconductor materials. A well-known electronic property of semiconducting materials is the band gap. A photo-catalyst’s desired band gap range is between 1.5 eV and 6.2 eV. A rational desig...

Full description

Bibliographic Details
Main Authors: Avan Kumar, Sreedevi Upadhyayula, Hariprasad Kodamana
Format: Article
Language:English
Published: Elsevier 2023-09-01
Series:Digital Chemical Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772508123000273
_version_ 1797798569353150464
author Avan Kumar
Sreedevi Upadhyayula
Hariprasad Kodamana
author_facet Avan Kumar
Sreedevi Upadhyayula
Hariprasad Kodamana
author_sort Avan Kumar
collection DOAJ
description A recent trend in chemical synthesis is photo-catalysis, which uses photo-active catalyst materials that are semiconductor materials. A well-known electronic property of semiconducting materials is the band gap. A photo-catalyst’s desired band gap range is between 1.5 eV and 6.2 eV. A rational design and synthesis of photo-active catalysts require knowledge of the band gap as an initial screening parameter. Herein, we propose an integrated deep learning-based framework to classify the photo-active catalysts and predict their band gap using compositional features. To this extent, we have utilized the dataset extracted from the “catalyst hub” site by web scraping with the help of a Python script. Extensive data cleaning and pre-processing are done to make input data amenable for training the models. Also, more valuable features are made using two methods: (a) one hot-encoding and (b) calculating the mean of the embeddings of catalysts computed by Mat2Vec, a pre-trained transformer-based model. With the help of this generated feature set, we have proposed a two-stage deep-learning framework for classification and regression tasks. In the first stage, a 2D-Convolutional Neural Net (CNN)-based classifier is used to classify whether a catalyst belongs to the photo-active catalyst class. After the first stage screening, in the second stage, we use a 1D-VGG-based gradient boosting framework to predict the band gap of the photo-active catalyst only using compositional features as inputs. 2D-CNN for the classification task has an accuracy of 0.903 and 0.886 for the train and test datasets, respectively. Further, the proposed integrated model that uses 1D-Convolutional layers of VGG followed by the XGBoostRegressor has a test R2 of 0.750, much higher than baseline models reported in the literature.
first_indexed 2024-03-13T04:05:44Z
format Article
id doaj.art-e0d4fd3a2dd740bf9bc0f060270f78bf
institution Directory Open Access Journal
issn 2772-5081
language English
last_indexed 2024-03-13T04:05:44Z
publishDate 2023-09-01
publisher Elsevier
record_format Article
series Digital Chemical Engineering
spelling doaj.art-e0d4fd3a2dd740bf9bc0f060270f78bf2023-06-21T07:01:35ZengElsevierDigital Chemical Engineering2772-50812023-09-018100109A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalystsAvan Kumar0Sreedevi Upadhyayula1Hariprasad Kodamana2Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, IndiaDepartment of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, IndiaDepartment of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, India; Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, 110016, India; Corresponding author at: Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, India.A recent trend in chemical synthesis is photo-catalysis, which uses photo-active catalyst materials that are semiconductor materials. A well-known electronic property of semiconducting materials is the band gap. A photo-catalyst’s desired band gap range is between 1.5 eV and 6.2 eV. A rational design and synthesis of photo-active catalysts require knowledge of the band gap as an initial screening parameter. Herein, we propose an integrated deep learning-based framework to classify the photo-active catalysts and predict their band gap using compositional features. To this extent, we have utilized the dataset extracted from the “catalyst hub” site by web scraping with the help of a Python script. Extensive data cleaning and pre-processing are done to make input data amenable for training the models. Also, more valuable features are made using two methods: (a) one hot-encoding and (b) calculating the mean of the embeddings of catalysts computed by Mat2Vec, a pre-trained transformer-based model. With the help of this generated feature set, we have proposed a two-stage deep-learning framework for classification and regression tasks. In the first stage, a 2D-Convolutional Neural Net (CNN)-based classifier is used to classify whether a catalyst belongs to the photo-active catalyst class. After the first stage screening, in the second stage, we use a 1D-VGG-based gradient boosting framework to predict the band gap of the photo-active catalyst only using compositional features as inputs. 2D-CNN for the classification task has an accuracy of 0.903 and 0.886 for the train and test datasets, respectively. Further, the proposed integrated model that uses 1D-Convolutional layers of VGG followed by the XGBoostRegressor has a test R2 of 0.750, much higher than baseline models reported in the literature.http://www.sciencedirect.com/science/article/pii/S2772508123000273Photo-active catalystBand gapDeep learning modelsCNNVGGGradient boosting
spellingShingle Avan Kumar
Sreedevi Upadhyayula
Hariprasad Kodamana
A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts
Digital Chemical Engineering
Photo-active catalyst
Band gap
Deep learning models
CNN
VGG
Gradient boosting
title A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts
title_full A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts
title_fullStr A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts
title_full_unstemmed A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts
title_short A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts
title_sort convolutional neural network based gradient boosting framework for prediction of the band gap of photo active catalysts
topic Photo-active catalyst
Band gap
Deep learning models
CNN
VGG
Gradient boosting
url http://www.sciencedirect.com/science/article/pii/S2772508123000273
work_keys_str_mv AT avankumar aconvolutionalneuralnetworkbasedgradientboostingframeworkforpredictionofthebandgapofphotoactivecatalysts
AT sreedeviupadhyayula aconvolutionalneuralnetworkbasedgradientboostingframeworkforpredictionofthebandgapofphotoactivecatalysts
AT hariprasadkodamana aconvolutionalneuralnetworkbasedgradientboostingframeworkforpredictionofthebandgapofphotoactivecatalysts
AT avankumar convolutionalneuralnetworkbasedgradientboostingframeworkforpredictionofthebandgapofphotoactivecatalysts
AT sreedeviupadhyayula convolutionalneuralnetworkbasedgradientboostingframeworkforpredictionofthebandgapofphotoactivecatalysts
AT hariprasadkodamana convolutionalneuralnetworkbasedgradientboostingframeworkforpredictionofthebandgapofphotoactivecatalysts