kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph

With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a refe...

Full description

Bibliographic Details
Main Authors: Ze-Gang Wei, Xing-Guo Fan, Hao Zhang, Xiao-Dan Zhang, Fei Liu, Yu Qian, Shao-Wu Zhang
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-05-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.890651/full
_version_ 1828796628088127488
author Ze-Gang Wei
Xing-Guo Fan
Hao Zhang
Xiao-Dan Zhang
Fei Liu
Yu Qian
Shao-Wu Zhang
author_facet Ze-Gang Wei
Xing-Guo Fan
Hao Zhang
Xiao-Dan Zhang
Fei Liu
Yu Qian
Shao-Wu Zhang
author_sort Ze-Gang Wei
collection DOAJ
description With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.
first_indexed 2024-12-12T04:26:02Z
format Article
id doaj.art-9f0ae66459654f31aef4ed87d2e624b7
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-12T04:26:02Z
publishDate 2022-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-9f0ae66459654f31aef4ed87d2e624b72022-12-22T00:38:12ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-05-011310.3389/fgene.2022.890651890651kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood GraphZe-Gang Wei0Xing-Guo Fan1Hao Zhang2Xiao-Dan Zhang3Fei Liu4Yu Qian5Shao-Wu Zhang6Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaKey Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, ChinaWith the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.https://www.frontiersin.org/articles/10.3389/fgene.2022.890651/fullsequence alignmentsequence mappingsingle molecular sequencingthird-generation sequencinglong noisy reads
spellingShingle Ze-Gang Wei
Xing-Guo Fan
Hao Zhang
Xiao-Dan Zhang
Fei Liu
Yu Qian
Shao-Wu Zhang
kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
Frontiers in Genetics
sequence alignment
sequence mapping
single molecular sequencing
third-generation sequencing
long noisy reads
title kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_full kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_fullStr kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_full_unstemmed kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_short kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_sort kngmap sensitive and fast mapping algorithm for noisy long reads based on the k mer neighborhood graph
topic sequence alignment
sequence mapping
single molecular sequencing
third-generation sequencing
long noisy reads
url https://www.frontiersin.org/articles/10.3389/fgene.2022.890651/full
work_keys_str_mv AT zegangwei kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT xingguofan kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT haozhang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT xiaodanzhang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT feiliu kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT yuqian kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT shaowuzhang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph