Enzyme catalytic residue prediction using deep learning methods

Identification of catalytic residues in enzymes have important applications ranging from drug discovery to protein engineering. However, locating catalytic residues in laboratory is time consuming and costly. Through high throughput computational methods, potential catalytic residues could be elucid...

Full description

Bibliographic Details
Main Author: Guan, Jia Sheng
Other Authors: Mu Yuguang
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171862
_version_ 1811682808887771136
author Guan, Jia Sheng
author2 Mu Yuguang
author_facet Mu Yuguang
Guan, Jia Sheng
author_sort Guan, Jia Sheng
collection NTU
description Identification of catalytic residues in enzymes have important applications ranging from drug discovery to protein engineering. However, locating catalytic residues in laboratory is time consuming and costly. Through high throughput computational methods, potential catalytic residues could be elucidated. While many models trained to predict catalytic residues were published, there are still unexplored combinations of model features and data preparation methods. In this project, graph neural network (GNN) and multi-layer perceptron (MLP) models were constructed to predict catalytic residues. The choice of edge weight equation was discovered to have huge impact on GNN model performance. Embeddings from a large protein language model, Evolutionary Scale Modeling 2 (ESM-2), were experimented and found suitable as features for MLP and GNN models, rivaling many published models in performance. Atchley factors as features were investigated but results hinted that the information might have already been included in the ESM-2 embeddings. To address knowledge gap, structural information of entire protein complex was considered as GNN model feature but found no benefits as compared to using only monomer structures as in published models. To resolve class imbalance issue, down-sampling of non-catalytic to catalytic residues to a 10:1 ratio was tested but it did not improve models’ performances.
first_indexed 2024-10-01T04:02:44Z
format Final Year Project (FYP)
id ntu-10356/171862
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:02:44Z
publishDate 2023
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1718622023-11-20T15:32:40Z Enzyme catalytic residue prediction using deep learning methods Guan, Jia Sheng Mu Yuguang School of Biological Sciences YGMu@ntu.edu.sg Science::Biological sciences Identification of catalytic residues in enzymes have important applications ranging from drug discovery to protein engineering. However, locating catalytic residues in laboratory is time consuming and costly. Through high throughput computational methods, potential catalytic residues could be elucidated. While many models trained to predict catalytic residues were published, there are still unexplored combinations of model features and data preparation methods. In this project, graph neural network (GNN) and multi-layer perceptron (MLP) models were constructed to predict catalytic residues. The choice of edge weight equation was discovered to have huge impact on GNN model performance. Embeddings from a large protein language model, Evolutionary Scale Modeling 2 (ESM-2), were experimented and found suitable as features for MLP and GNN models, rivaling many published models in performance. Atchley factors as features were investigated but results hinted that the information might have already been included in the ESM-2 embeddings. To address knowledge gap, structural information of entire protein complex was considered as GNN model feature but found no benefits as compared to using only monomer structures as in published models. To resolve class imbalance issue, down-sampling of non-catalytic to catalytic residues to a 10:1 ratio was tested but it did not improve models’ performances. Bachelor of Science in Biological Sciences 2023-11-14T06:42:31Z 2023-11-14T06:42:31Z 2023 Final Year Project (FYP) Guan, J. S. (2023). Enzyme catalytic residue prediction using deep learning methods. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171862 https://hdl.handle.net/10356/171862 en application/pdf Nanyang Technological University
spellingShingle Science::Biological sciences
Guan, Jia Sheng
Enzyme catalytic residue prediction using deep learning methods
title Enzyme catalytic residue prediction using deep learning methods
title_full Enzyme catalytic residue prediction using deep learning methods
title_fullStr Enzyme catalytic residue prediction using deep learning methods
title_full_unstemmed Enzyme catalytic residue prediction using deep learning methods
title_short Enzyme catalytic residue prediction using deep learning methods
title_sort enzyme catalytic residue prediction using deep learning methods
topic Science::Biological sciences
url https://hdl.handle.net/10356/171862
work_keys_str_mv AT guanjiasheng enzymecatalyticresiduepredictionusingdeeplearningmethods