Fast clonal family inference from large-scale B cell repertoire sequencing data

Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce...

Full description

Bibliographic Details
Main Authors: Kaixuan Wang, Xihao Hu, Jian Zhang
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Cell Reports: Methods
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667237523002539
Description
Summary:Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response. Motivation: The study of antibody repertoires and B cell activation is essential to understanding immune system function and developing effective treatments for various diseases. One important aspect of this research is identifying clonal families, which are groups of B cells that arise from a common ancestor and diversify through proliferation and somatic hypermutation. However, accurately and quickly clustering highly diverse clonally related sequences from large datasets remains challenging. To address this issue, we propose a heuristic method to rapidly and accurately infer clonal families from massive and diverse BCR sequences. Our method has been rigorously tested and shown to provide efficient and biologically meaningful results, contributing to a deeper understanding of B cell activation and antibody-related research.
ISSN:2667-2375