Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem

Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to...

Full description

Bibliographic Details
Main Author: Yim, Jason
Other Authors: Jaakkola, Tommi S.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150230
_version_ 1826194922363944960
author Yim, Jason
author2 Jaakkola, Tommi S.
author_facet Jaakkola, Tommi S.
Yim, Jason
author_sort Yim, Jason
collection MIT
description Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.
first_indexed 2024-09-23T10:04:15Z
format Thesis
id mit-1721.1/150230
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:04:15Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1502302023-04-01T03:49:32Z Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem Yim, Jason Jaakkola, Tommi S. Barzilay, Regina Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif. S.M. 2023-03-31T14:41:12Z 2023-03-31T14:41:12Z 2023-02 2023-02-28T14:36:10.371Z Thesis https://hdl.handle.net/1721.1/150230 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Yim, Jason
Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem
title Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem
title_full Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem
title_fullStr Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem
title_full_unstemmed Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem
title_short Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding problem
title_sort diffusion probabilistic modeling of protein backbones in 3d for the motif scaffolding problem
url https://hdl.handle.net/1721.1/150230
work_keys_str_mv AT yimjason diffusionprobabilisticmodelingofproteinbackbonesin3dforthemotifscaffoldingproblem