Learning the Language of Antibody Hypervariability Through Biological Property Prediction
Machine learning-based protein language models (PLMs) have proven to be successful in a variety of structure and function-prediction contexts. However, foundational PLMs (those trained on the corpus of all proteins) rely on evolutionary co-conservation of protein sub-sequences, but this distribution...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151427 |
_version_ | 1811089306311196672 |
---|---|
author | Im, Chiho |
author2 | Berger, Bonnie |
author_facet | Berger, Bonnie Im, Chiho |
author_sort | Im, Chiho |
collection | MIT |
description | Machine learning-based protein language models (PLMs) have proven to be successful in a variety of structure and function-prediction contexts. However, foundational PLMs (those trained on the corpus of all proteins) rely on evolutionary co-conservation of protein sub-sequences, but this distributional hypothesis does not hold for antibody hypervariable regions. Consequently, methods like AlphaFold 2 have relatively weak performance on antibody sequences. In this work, we propose AbMAP (Antibody Mutagenesis-Augmented Processing), a new transfer learning framework that fine-tunes foundational models specifically for antibody-sequence inputs by supervising on examples of antibody structure and binding specificity. We demonstrate how our feature representations can be applied to the accurate prediction of an antibody’s local and global 3D structures, mutational effects on antigen binding specificity, as well as identification of its paratope. The scalability of AbMAP newly enables large-scale analysis of human antibody repertoires. We find that the AbMAP representations of individual repertoires have remarkable overlap, more so than can be discerned by sequence analysis. Our findings provide robust evidence in support of the hypothesis that antibody repertoires across individuals converge towards similar structural and functional coverage. We anticipate AbMAP will accelerate efficient and effective design and modeling of antibodies and expedite antibody-based therapeutics discovery. |
first_indexed | 2024-09-23T14:17:06Z |
format | Thesis |
id | mit-1721.1/151427 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:17:06Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1514272023-08-01T03:24:46Z Learning the Language of Antibody Hypervariability Through Biological Property Prediction Im, Chiho Berger, Bonnie Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Machine learning-based protein language models (PLMs) have proven to be successful in a variety of structure and function-prediction contexts. However, foundational PLMs (those trained on the corpus of all proteins) rely on evolutionary co-conservation of protein sub-sequences, but this distributional hypothesis does not hold for antibody hypervariable regions. Consequently, methods like AlphaFold 2 have relatively weak performance on antibody sequences. In this work, we propose AbMAP (Antibody Mutagenesis-Augmented Processing), a new transfer learning framework that fine-tunes foundational models specifically for antibody-sequence inputs by supervising on examples of antibody structure and binding specificity. We demonstrate how our feature representations can be applied to the accurate prediction of an antibody’s local and global 3D structures, mutational effects on antigen binding specificity, as well as identification of its paratope. The scalability of AbMAP newly enables large-scale analysis of human antibody repertoires. We find that the AbMAP representations of individual repertoires have remarkable overlap, more so than can be discerned by sequence analysis. Our findings provide robust evidence in support of the hypothesis that antibody repertoires across individuals converge towards similar structural and functional coverage. We anticipate AbMAP will accelerate efficient and effective design and modeling of antibodies and expedite antibody-based therapeutics discovery. M.Eng. 2023-07-31T19:38:55Z 2023-07-31T19:38:55Z 2023-06 2023-06-06T16:35:25.769Z Thesis https://hdl.handle.net/1721.1/151427 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Im, Chiho Learning the Language of Antibody Hypervariability Through Biological Property Prediction |
title | Learning the Language of Antibody Hypervariability
Through Biological Property Prediction |
title_full | Learning the Language of Antibody Hypervariability
Through Biological Property Prediction |
title_fullStr | Learning the Language of Antibody Hypervariability
Through Biological Property Prediction |
title_full_unstemmed | Learning the Language of Antibody Hypervariability
Through Biological Property Prediction |
title_short | Learning the Language of Antibody Hypervariability
Through Biological Property Prediction |
title_sort | learning the language of antibody hypervariability through biological property prediction |
url | https://hdl.handle.net/1721.1/151427 |
work_keys_str_mv | AT imchiho learningthelanguageofantibodyhypervariabilitythroughbiologicalpropertyprediction |