Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction
Protein-protein interaction (PPI) networks have proven to be a valuable tool in systems biology to facilitate the discovery and understanding of protein function. However, experimental PPI data remains sparse in most model organisms and even more so in other species. Existing methods for computation...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/139467 https://orcid.org/0000-0002-0170-3029 |
_version_ | 1826192582602915840 |
---|---|
author | Sledzieski, Samuel |
author2 | Berger, Bonnie |
author_facet | Berger, Bonnie Sledzieski, Samuel |
author_sort | Sledzieski, Samuel |
collection | MIT |
description | Protein-protein interaction (PPI) networks have proven to be a valuable tool in systems biology to facilitate the discovery and understanding of protein function. However, experimental PPI data remains sparse in most model organisms and even more so in other species. Existing methods for computational prediction of PPIs seek to address this limitation, and while they perform well when sufficient within-species training data is available, they generalize poorly when specific types and sizes of training data are not available in the species of interest. Here, we predict physical interactions between two proteins using only their primary sequence, and maintain high accuracy with limited training data and across species. We combine advances in neural language modeling and structurally-motivated design to develop D-SCRIPT, a deep learning model which is interpretable and generalizable to species with limited training data. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared to the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3-D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply this work for functional discovery in several non-model species and explore the viability of the D-SCRIPT framework for protein binding pocket classification. Our work suggests that recent advances in deep learning language modeling of protein structure can be leveraged for protein interaction prediction from sequence, even in species where little data is available. |
first_indexed | 2024-09-23T09:22:51Z |
format | Thesis |
id | mit-1721.1/139467 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:22:51Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1394672022-01-15T04:00:40Z Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction Sledzieski, Samuel Berger, Bonnie Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Protein-protein interaction (PPI) networks have proven to be a valuable tool in systems biology to facilitate the discovery and understanding of protein function. However, experimental PPI data remains sparse in most model organisms and even more so in other species. Existing methods for computational prediction of PPIs seek to address this limitation, and while they perform well when sufficient within-species training data is available, they generalize poorly when specific types and sizes of training data are not available in the species of interest. Here, we predict physical interactions between two proteins using only their primary sequence, and maintain high accuracy with limited training data and across species. We combine advances in neural language modeling and structurally-motivated design to develop D-SCRIPT, a deep learning model which is interpretable and generalizable to species with limited training data. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared to the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3-D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply this work for functional discovery in several non-model species and explore the viability of the D-SCRIPT framework for protein binding pocket classification. Our work suggests that recent advances in deep learning language modeling of protein structure can be leveraged for protein interaction prediction from sequence, even in species where little data is available. S.M. 2022-01-14T15:13:08Z 2022-01-14T15:13:08Z 2021-06 2021-06-24T19:40:27.483Z Thesis https://hdl.handle.net/1721.1/139467 https://orcid.org/0000-0002-0170-3029 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Sledzieski, Samuel Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction |
title | Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction |
title_full | Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction |
title_fullStr | Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction |
title_full_unstemmed | Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction |
title_short | Structurally Motivated Deep Learning for Genome Scale Protein Interaction Prediction |
title_sort | structurally motivated deep learning for genome scale protein interaction prediction |
url | https://hdl.handle.net/1721.1/139467 https://orcid.org/0000-0002-0170-3029 |
work_keys_str_mv | AT sledzieskisamuel structurallymotivateddeeplearningforgenomescaleproteininteractionprediction |