Summary: | Antibodies are highly useful therapeutics but their development can be costly and
resource intensive. As increasing quantities of high quality B cell receptor (BCR)
data become available, the challenge of how to best utilise the data to improve the
antibody development process remains. In the results chapters of this thesis, we begin
by describing our novel approach to numbering antibody sequences, ANARCII. Our
model, trained using deep learning-based sequence embeddings derived from BCR
data, can correctly number antibody sequences that existing tools cannot. Next,
we describe our development of existing antibody comparison tools Ab-Ligity and
paratyping, to create computational pipelines for identification of antibodies likely to
bind to a target antigen, with or without a known binding antibody. We compare
antibodies based on sequence, paratope and structural similarity to predict similarlybinding
antibodies from an antibody model library. We expand on the approach of
identifying structurally similar, functionally similar, antibodies in our final results
chapter where we describe our novel computational epitope binning method, SPACE.
Our method of clustering antibody models based on predicted structural similarity
can group sequence-dissimilar antibodies into epitope bins with high accuracy, as
demonstrated on coronavirus-binding and malaria-binding antibodies. We conclude
with potential future research directions, including how latest developments in antibody
structure prediction and protein language models might be incorporated into
our work.
|