BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, u...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2014-06-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/421.pdf |
_version_ | 1827606488312446976 |
---|---|
author | Ruibang Luo Yiu-Lun Wong Wai-Chun Law Lap-Kei Lee Jeanno Cheung Chi-Man Liu Tak-Wah Lam |
author_facet | Ruibang Luo Yiu-Lun Wong Wai-Chun Law Lap-Kei Lee Jeanno Cheung Chi-Man Liu Tak-Wah Lam |
author_sort | Ruibang Luo |
collection | DOAJ |
description | This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa. |
first_indexed | 2024-03-09T06:38:59Z |
format | Article |
id | doaj.art-287c37fc02b4449bba4143991d4626e6 |
institution | Directory Open Access Journal |
issn | 2167-8359 |
language | English |
last_indexed | 2024-03-09T06:38:59Z |
publishDate | 2014-06-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ |
spelling | doaj.art-287c37fc02b4449bba4143991d4626e62023-12-03T10:54:07ZengPeerJ Inc.PeerJ2167-83592014-06-012e42110.7717/peerj.421421BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPURuibang Luo0Yiu-Lun Wong1Wai-Chun Law2Lap-Kei Lee3Jeanno Cheung4Chi-Man Liu5Tak-Wah Lam6HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongHKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong KongThis paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.https://peerj.com/articles/421.pdfSecondary analysisWhole-genome seqeuncingWhole-exome sequencingGPUVariant callingGenomics |
spellingShingle | Ruibang Luo Yiu-Lun Wong Wai-Chun Law Lap-Kei Lee Jeanno Cheung Chi-Man Liu Tak-Wah Lam BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU PeerJ Secondary analysis Whole-genome seqeuncing Whole-exome sequencing GPU Variant calling Genomics |
title | BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU |
title_full | BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU |
title_fullStr | BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU |
title_full_unstemmed | BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU |
title_short | BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU |
title_sort | balsa integrated secondary analysis for whole genome and whole exome sequencing accelerated by gpu |
topic | Secondary analysis Whole-genome seqeuncing Whole-exome sequencing GPU Variant calling Genomics |
url | https://peerj.com/articles/421.pdf |
work_keys_str_mv | AT ruibangluo balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu AT yiulunwong balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu AT waichunlaw balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu AT lapkeilee balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu AT jeannocheung balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu AT chimanliu balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu AT takwahlam balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu |