Learning the Language of Antibody Hypervariability Through Biological Property Prediction

Machine learning-based protein language models (PLMs) have proven to be successful in a variety of structure and function-prediction contexts. However, foundational PLMs (those trained on the corpus of all proteins) rely on evolutionary co-conservation of protein sub-sequences, but this distribution...

Full description

Bibliographic Details
Main Author: Im, Chiho
Other Authors: Berger, Bonnie
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151427
_version_ 1811089306311196672
author Im, Chiho
author2 Berger, Bonnie
author_facet Berger, Bonnie
Im, Chiho
author_sort Im, Chiho
collection MIT
description Machine learning-based protein language models (PLMs) have proven to be successful in a variety of structure and function-prediction contexts. However, foundational PLMs (those trained on the corpus of all proteins) rely on evolutionary co-conservation of protein sub-sequences, but this distributional hypothesis does not hold for antibody hypervariable regions. Consequently, methods like AlphaFold 2 have relatively weak performance on antibody sequences. In this work, we propose AbMAP (Antibody Mutagenesis-Augmented Processing), a new transfer learning framework that fine-tunes foundational models specifically for antibody-sequence inputs by supervising on examples of antibody structure and binding specificity. We demonstrate how our feature representations can be applied to the accurate prediction of an antibody’s local and global 3D structures, mutational effects on antigen binding specificity, as well as identification of its paratope. The scalability of AbMAP newly enables large-scale analysis of human antibody repertoires. We find that the AbMAP representations of individual repertoires have remarkable overlap, more so than can be discerned by sequence analysis. Our findings provide robust evidence in support of the hypothesis that antibody repertoires across individuals converge towards similar structural and functional coverage. We anticipate AbMAP will accelerate efficient and effective design and modeling of antibodies and expedite antibody-based therapeutics discovery.
first_indexed 2024-09-23T14:17:06Z
format Thesis
id mit-1721.1/151427
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:17:06Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1514272023-08-01T03:24:46Z Learning the Language of Antibody Hypervariability Through Biological Property Prediction Im, Chiho Berger, Bonnie Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Machine learning-based protein language models (PLMs) have proven to be successful in a variety of structure and function-prediction contexts. However, foundational PLMs (those trained on the corpus of all proteins) rely on evolutionary co-conservation of protein sub-sequences, but this distributional hypothesis does not hold for antibody hypervariable regions. Consequently, methods like AlphaFold 2 have relatively weak performance on antibody sequences. In this work, we propose AbMAP (Antibody Mutagenesis-Augmented Processing), a new transfer learning framework that fine-tunes foundational models specifically for antibody-sequence inputs by supervising on examples of antibody structure and binding specificity. We demonstrate how our feature representations can be applied to the accurate prediction of an antibody’s local and global 3D structures, mutational effects on antigen binding specificity, as well as identification of its paratope. The scalability of AbMAP newly enables large-scale analysis of human antibody repertoires. We find that the AbMAP representations of individual repertoires have remarkable overlap, more so than can be discerned by sequence analysis. Our findings provide robust evidence in support of the hypothesis that antibody repertoires across individuals converge towards similar structural and functional coverage. We anticipate AbMAP will accelerate efficient and effective design and modeling of antibodies and expedite antibody-based therapeutics discovery. M.Eng. 2023-07-31T19:38:55Z 2023-07-31T19:38:55Z 2023-06 2023-06-06T16:35:25.769Z Thesis https://hdl.handle.net/1721.1/151427 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Im, Chiho
Learning the Language of Antibody Hypervariability Through Biological Property Prediction
title Learning the Language of Antibody Hypervariability Through Biological Property Prediction
title_full Learning the Language of Antibody Hypervariability Through Biological Property Prediction
title_fullStr Learning the Language of Antibody Hypervariability Through Biological Property Prediction
title_full_unstemmed Learning the Language of Antibody Hypervariability Through Biological Property Prediction
title_short Learning the Language of Antibody Hypervariability Through Biological Property Prediction
title_sort learning the language of antibody hypervariability through biological property prediction
url https://hdl.handle.net/1721.1/151427
work_keys_str_mv AT imchiho learningthelanguageofantibodyhypervariabilitythroughbiologicalpropertyprediction