BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, u...

Full description

Bibliographic Details
Main Authors: Ruibang Luo, Yiu-Lun Wong, Wai-Chun Law, Lap-Kei Lee, Jeanno Cheung, Chi-Man Liu, Tak-Wah Lam
Format: Article
Language:English
Published: PeerJ Inc. 2014-06-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/421.pdf
_version_ 1827606488312446976
author Ruibang Luo
Yiu-Lun Wong
Wai-Chun Law
Lap-Kei Lee
Jeanno Cheung
Chi-Man Liu
Tak-Wah Lam
author_facet Ruibang Luo
Yiu-Lun Wong
Wai-Chun Law
Lap-Kei Lee
Jeanno Cheung
Chi-Man Liu
Tak-Wah Lam
author_sort Ruibang Luo
collection DOAJ
description This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.
first_indexed 2024-03-09T06:38:59Z
format Article
id doaj.art-287c37fc02b4449bba4143991d4626e6
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:38:59Z
publishDate 2014-06-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-287c37fc02b4449bba4143991d4626e62023-12-03T10:54:07ZengPeerJ Inc.PeerJ2167-83592014-06-012e42110.7717/peerj.421421BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPURuibang Luo0Yiu-Lun Wong1Wai-Chun Law2Lap-Kei Lee3Jeanno Cheung4Chi-Man Liu5Tak-Wah Lam6HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongThis paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.https://peerj.com/articles/421.pdfSecondary analysisWhole-genome seqeuncingWhole-exome sequencingGPUVariant callingGenomics
spellingShingle Ruibang Luo
Yiu-Lun Wong
Wai-Chun Law
Lap-Kei Lee
Jeanno Cheung
Chi-Man Liu
Tak-Wah Lam
BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
PeerJ
Secondary analysis
Whole-genome seqeuncing
Whole-exome sequencing
GPU
Variant calling
Genomics
title BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_full BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_fullStr BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_full_unstemmed BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_short BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_sort balsa integrated secondary analysis for whole genome and whole exome sequencing accelerated by gpu
topic Secondary analysis
Whole-genome seqeuncing
Whole-exome sequencing
GPU
Variant calling
Genomics
url https://peerj.com/articles/421.pdf
work_keys_str_mv AT ruibangluo balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT yiulunwong balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT waichunlaw balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT lapkeilee balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT jeannocheung balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT chimanliu balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT takwahlam balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu