Fast and SNP-aware short read alignment with SALT

Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore,...

Full description

Bibliographic Details
Main Authors: Wei Quan, Bo Liu, Yadong Wang
Format: Article
Language:English
Published: BMC 2021-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04088-6
_version_ 1819109322523672576
author Wei Quan
Bo Liu
Yadong Wang
author_facet Wei Quan
Bo Liu
Yadong Wang
author_sort Wei Quan
collection DOAJ
description Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT .
first_indexed 2024-12-22T03:23:59Z
format Article
id doaj.art-0568ff74ce2e463080050f8e780f7647
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-22T03:23:59Z
publishDate 2021-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-0568ff74ce2e463080050f8e780f76472022-12-21T18:40:39ZengBMCBMC Bioinformatics1471-21052021-08-0122S911310.1186/s12859-021-04088-6Fast and SNP-aware short read alignment with SALTWei Quan0Bo Liu1Yadong Wang2School of Computer Science and Technology, Harbin Institute of TechnologySchool of Computer Science and Technology, Harbin Institute of TechnologySchool of Computer Science and Technology, Harbin Institute of TechnologyAbstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT .https://doi.org/10.1186/s12859-021-04088-6NGSAlignmentSNP-aware
spellingShingle Wei Quan
Bo Liu
Yadong Wang
Fast and SNP-aware short read alignment with SALT
BMC Bioinformatics
NGS
Alignment
SNP-aware
title Fast and SNP-aware short read alignment with SALT
title_full Fast and SNP-aware short read alignment with SALT
title_fullStr Fast and SNP-aware short read alignment with SALT
title_full_unstemmed Fast and SNP-aware short read alignment with SALT
title_short Fast and SNP-aware short read alignment with SALT
title_sort fast and snp aware short read alignment with salt
topic NGS
Alignment
SNP-aware
url https://doi.org/10.1186/s12859-021-04088-6
work_keys_str_mv AT weiquan fastandsnpawareshortreadalignmentwithsalt
AT boliu fastandsnpawareshortreadalignmentwithsalt
AT yadongwang fastandsnpawareshortreadalignmentwithsalt