Leveraging BCR data to improve computational design of therapeutic antibodies

Antibodies are highly useful therapeutics but their development can be costly and resource intensive. As increasing quantities of high quality B cell receptor (BCR) data become available, the challenge of how to best utilise the data to improve the antibody development process remains. In the result...

Full description

Bibliographic Details
Main Author: Robinson, S
Other Authors: Deane, C
Format: Thesis
Language:English
Published: 2023
Subjects:
Description
Summary:Antibodies are highly useful therapeutics but their development can be costly and resource intensive. As increasing quantities of high quality B cell receptor (BCR) data become available, the challenge of how to best utilise the data to improve the antibody development process remains. In the results chapters of this thesis, we begin by describing our novel approach to numbering antibody sequences, ANARCII. Our model, trained using deep learning-based sequence embeddings derived from BCR data, can correctly number antibody sequences that existing tools cannot. Next, we describe our development of existing antibody comparison tools Ab-Ligity and paratyping, to create computational pipelines for identification of antibodies likely to bind to a target antigen, with or without a known binding antibody. We compare antibodies based on sequence, paratope and structural similarity to predict similarlybinding antibodies from an antibody model library. We expand on the approach of identifying structurally similar, functionally similar, antibodies in our final results chapter where we describe our novel computational epitope binning method, SPACE. Our method of clustering antibody models based on predicted structural similarity can group sequence-dissimilar antibodies into epitope bins with high accuracy, as demonstrated on coronavirus-binding and malaria-binding antibodies. We conclude with potential future research directions, including how latest developments in antibody structure prediction and protein language models might be incorporated into our work.