Summary: | Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response. Motivation: The study of antibody repertoires and B cell activation is essential to understanding immune system function and developing effective treatments for various diseases. One important aspect of this research is identifying clonal families, which are groups of B cells that arise from a common ancestor and diversify through proliferation and somatic hypermutation. However, accurately and quickly clustering highly diverse clonally related sequences from large datasets remains challenging. To address this issue, we propose a heuristic method to rapidly and accurately infer clonal families from massive and diverse BCR sequences. Our method has been rigorously tested and shown to provide efficient and biologically meaningful results, contributing to a deeper understanding of B cell activation and antibody-related research.
|