Fast clonal family inference from large-scale B cell repertoire sequencing data

Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce...

Full description

Bibliographic Details
Main Authors: Kaixuan Wang, Xihao Hu, Jian Zhang
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Cell Reports: Methods
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667237523002539
_version_ 1797649654378135552
author Kaixuan Wang
Xihao Hu
Jian Zhang
author_facet Kaixuan Wang
Xihao Hu
Jian Zhang
author_sort Kaixuan Wang
collection DOAJ
description Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response. Motivation: The study of antibody repertoires and B cell activation is essential to understanding immune system function and developing effective treatments for various diseases. One important aspect of this research is identifying clonal families, which are groups of B cells that arise from a common ancestor and diversify through proliferation and somatic hypermutation. However, accurately and quickly clustering highly diverse clonally related sequences from large datasets remains challenging. To address this issue, we propose a heuristic method to rapidly and accurately infer clonal families from massive and diverse BCR sequences. Our method has been rigorously tested and shown to provide efficient and biologically meaningful results, contributing to a deeper understanding of B cell activation and antibody-related research.
first_indexed 2024-03-11T15:49:52Z
format Article
id doaj.art-5615d258e44b4fdeac9ab41ce2b4c951
institution Directory Open Access Journal
issn 2667-2375
language English
last_indexed 2024-03-11T15:49:52Z
publishDate 2023-10-01
publisher Elsevier
record_format Article
series Cell Reports: Methods
spelling doaj.art-5615d258e44b4fdeac9ab41ce2b4c9512023-10-26T04:18:32ZengElsevierCell Reports: Methods2667-23752023-10-01310100601Fast clonal family inference from large-scale B cell repertoire sequencing dataKaixuan Wang0Xihao Hu1Jian Zhang2Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, ChinaGV20 Therapeutics, Cambridge, MA, USAAcademy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Corresponding authorSummary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response. Motivation: The study of antibody repertoires and B cell activation is essential to understanding immune system function and developing effective treatments for various diseases. One important aspect of this research is identifying clonal families, which are groups of B cells that arise from a common ancestor and diversify through proliferation and somatic hypermutation. However, accurately and quickly clustering highly diverse clonally related sequences from large datasets remains challenging. To address this issue, we propose a heuristic method to rapidly and accurately infer clonal families from massive and diverse BCR sequences. Our method has been rigorously tested and shown to provide efficient and biologically meaningful results, contributing to a deeper understanding of B cell activation and antibody-related research.http://www.sciencedirect.com/science/article/pii/S2667237523002539CP: ImmunologyCP: Systems biology
spellingShingle Kaixuan Wang
Xihao Hu
Jian Zhang
Fast clonal family inference from large-scale B cell repertoire sequencing data
Cell Reports: Methods
CP: Immunology
CP: Systems biology
title Fast clonal family inference from large-scale B cell repertoire sequencing data
title_full Fast clonal family inference from large-scale B cell repertoire sequencing data
title_fullStr Fast clonal family inference from large-scale B cell repertoire sequencing data
title_full_unstemmed Fast clonal family inference from large-scale B cell repertoire sequencing data
title_short Fast clonal family inference from large-scale B cell repertoire sequencing data
title_sort fast clonal family inference from large scale b cell repertoire sequencing data
topic CP: Immunology
CP: Systems biology
url http://www.sciencedirect.com/science/article/pii/S2667237523002539
work_keys_str_mv AT kaixuanwang fastclonalfamilyinferencefromlargescalebcellrepertoiresequencingdata
AT xihaohu fastclonalfamilyinferencefromlargescalebcellrepertoiresequencingdata
AT jianzhang fastclonalfamilyinferencefromlargescalebcellrepertoiresequencingdata