Learning hierarchical motif embeddings for protein engineering

Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.

Bibliographic Details
Main Author: Karydis, Thrasyvoulos
Other Authors: Joseph M. Jacobson.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2017
Subjects:
Online Access:http://hdl.handle.net/1721.1/109659
_version_ 1811081613759479808
author Karydis, Thrasyvoulos
author2 Joseph M. Jacobson.
author_facet Joseph M. Jacobson.
Karydis, Thrasyvoulos
author_sort Karydis, Thrasyvoulos
collection MIT
description Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.
first_indexed 2024-09-23T11:49:35Z
format Thesis
id mit-1721.1/109659
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T11:49:35Z
publishDate 2017
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1096592019-04-11T03:26:07Z Learning hierarchical motif embeddings for protein engineering Karydis, Thrasyvoulos Joseph M. Jacobson. Program in Media Arts and Sciences (Massachusetts Institute of Technology) Program in Media Arts and Sciences (Massachusetts Institute of Technology) Program in Media Arts and Sciences () Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 75-79). This thesis lays the foundation for an integrated machine learning framework for the evolutionary analysis, search and design of proteins, based on a hierarchical decomposition of proteins into a set of functional motif embeddings. We introduce, CoMET - Convolutional Motif Embeddings Tool, a machine learning framework that allows the automated extraction of nonlinear motif representations from large sets of protein sequences. At the core of CoMET, lies a Deep Convolutional Neural Network, trained to learn a basis set of motif embeddings by minimizing any desired objective function. CoMET is successfully trained to extract all known motifs across Transcription Factors and CRISPR Associated proteins, without requiring any prior knowledge about the nature of the motifs or their distribution. We demonstrate that motif embeddings can model efficiently inter- and intra- family relationships. Furthermore, we provide novel protein meta-family clusters, formed by taking into account a hierarchical conserved motif phylogeny for each protein instead of a single ultra-conserved region. Lastly, we investigate the generative ability of CoMET and develop computational methods that allow the directed evolution of proteins towards altered or novel functions. We trained a highly accurate predictive model on the DNA recognition code of the Type II restriction enzymes. Based on the promising prediction results, we used the trained models to generate de novo restriction enzymes and paved the way towards the computational design of a restriction enzyme that will cut a given arbitrary DNA sequence with high precision. by Thrasyvoulos Karydis. S.M. 2017-06-06T19:23:55Z 2017-06-06T19:23:55Z 2017 2017 Thesis http://hdl.handle.net/1721.1/109659 987250344 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 79 pages application/pdf Massachusetts Institute of Technology
spellingShingle Program in Media Arts and Sciences ()
Karydis, Thrasyvoulos
Learning hierarchical motif embeddings for protein engineering
title Learning hierarchical motif embeddings for protein engineering
title_full Learning hierarchical motif embeddings for protein engineering
title_fullStr Learning hierarchical motif embeddings for protein engineering
title_full_unstemmed Learning hierarchical motif embeddings for protein engineering
title_short Learning hierarchical motif embeddings for protein engineering
title_sort learning hierarchical motif embeddings for protein engineering
topic Program in Media Arts and Sciences ()
url http://hdl.handle.net/1721.1/109659
work_keys_str_mv AT karydisthrasyvoulos learninghierarchicalmotifembeddingsforproteinengineering