Fast clonal family inference from large-scale B cell repertoire sequencing data
Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-10-01
|
Series: | Cell Reports: Methods |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2667237523002539 |
_version_ | 1797649654378135552 |
---|---|
author | Kaixuan Wang Xihao Hu Jian Zhang |
author_facet | Kaixuan Wang Xihao Hu Jian Zhang |
author_sort | Kaixuan Wang |
collection | DOAJ |
description | Summary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response. Motivation: The study of antibody repertoires and B cell activation is essential to understanding immune system function and developing effective treatments for various diseases. One important aspect of this research is identifying clonal families, which are groups of B cells that arise from a common ancestor and diversify through proliferation and somatic hypermutation. However, accurately and quickly clustering highly diverse clonally related sequences from large datasets remains challenging. To address this issue, we propose a heuristic method to rapidly and accurately infer clonal families from massive and diverse BCR sequences. Our method has been rigorously tested and shown to provide efficient and biologically meaningful results, contributing to a deeper understanding of B cell activation and antibody-related research. |
first_indexed | 2024-03-11T15:49:52Z |
format | Article |
id | doaj.art-5615d258e44b4fdeac9ab41ce2b4c951 |
institution | Directory Open Access Journal |
issn | 2667-2375 |
language | English |
last_indexed | 2024-03-11T15:49:52Z |
publishDate | 2023-10-01 |
publisher | Elsevier |
record_format | Article |
series | Cell Reports: Methods |
spelling | doaj.art-5615d258e44b4fdeac9ab41ce2b4c9512023-10-26T04:18:32ZengElsevierCell Reports: Methods2667-23752023-10-01310100601Fast clonal family inference from large-scale B cell repertoire sequencing dataKaixuan Wang0Xihao Hu1Jian Zhang2Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, ChinaGV20 Therapeutics, Cambridge, MA, USAAcademy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Corresponding authorSummary: Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response. Motivation: The study of antibody repertoires and B cell activation is essential to understanding immune system function and developing effective treatments for various diseases. One important aspect of this research is identifying clonal families, which are groups of B cells that arise from a common ancestor and diversify through proliferation and somatic hypermutation. However, accurately and quickly clustering highly diverse clonally related sequences from large datasets remains challenging. To address this issue, we propose a heuristic method to rapidly and accurately infer clonal families from massive and diverse BCR sequences. Our method has been rigorously tested and shown to provide efficient and biologically meaningful results, contributing to a deeper understanding of B cell activation and antibody-related research.http://www.sciencedirect.com/science/article/pii/S2667237523002539CP: ImmunologyCP: Systems biology |
spellingShingle | Kaixuan Wang Xihao Hu Jian Zhang Fast clonal family inference from large-scale B cell repertoire sequencing data Cell Reports: Methods CP: Immunology CP: Systems biology |
title | Fast clonal family inference from large-scale B cell repertoire sequencing data |
title_full | Fast clonal family inference from large-scale B cell repertoire sequencing data |
title_fullStr | Fast clonal family inference from large-scale B cell repertoire sequencing data |
title_full_unstemmed | Fast clonal family inference from large-scale B cell repertoire sequencing data |
title_short | Fast clonal family inference from large-scale B cell repertoire sequencing data |
title_sort | fast clonal family inference from large scale b cell repertoire sequencing data |
topic | CP: Immunology CP: Systems biology |
url | http://www.sciencedirect.com/science/article/pii/S2667237523002539 |
work_keys_str_mv | AT kaixuanwang fastclonalfamilyinferencefromlargescalebcellrepertoiresequencingdata AT xihaohu fastclonalfamilyinferencefromlargescalebcellrepertoiresequencingdata AT jianzhang fastclonalfamilyinferencefromlargescalebcellrepertoiresequencingdata |