kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a refe...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-05-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2022.890651/full |
_version_ | 1828796628088127488 |
---|---|
author | Ze-Gang Wei Xing-Guo Fan Hao Zhang Xiao-Dan Zhang Fei Liu Yu Qian Shao-Wu Zhang |
author_facet | Ze-Gang Wei Xing-Guo Fan Hao Zhang Xiao-Dan Zhang Fei Liu Yu Qian Shao-Wu Zhang |
author_sort | Ze-Gang Wei |
collection | DOAJ |
description | With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage. |
first_indexed | 2024-12-12T04:26:02Z |
format | Article |
id | doaj.art-9f0ae66459654f31aef4ed87d2e624b7 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-12T04:26:02Z |
publishDate | 2022-05-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-9f0ae66459654f31aef4ed87d2e624b72022-12-22T00:38:12ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-05-011310.3389/fgene.2022.890651890651kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood GraphZe-Gang Wei0Xing-Guo Fan1Hao Zhang2Xiao-Dan Zhang3Fei Liu4Yu Qian5Shao-Wu Zhang6Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaInstitute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, ChinaKey Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, ChinaWith the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.https://www.frontiersin.org/articles/10.3389/fgene.2022.890651/fullsequence alignmentsequence mappingsingle molecular sequencingthird-generation sequencinglong noisy reads |
spellingShingle | Ze-Gang Wei Xing-Guo Fan Hao Zhang Xiao-Dan Zhang Fei Liu Yu Qian Shao-Wu Zhang kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph Frontiers in Genetics sequence alignment sequence mapping single molecular sequencing third-generation sequencing long noisy reads |
title | kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph |
title_full | kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph |
title_fullStr | kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph |
title_full_unstemmed | kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph |
title_short | kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph |
title_sort | kngmap sensitive and fast mapping algorithm for noisy long reads based on the k mer neighborhood graph |
topic | sequence alignment sequence mapping single molecular sequencing third-generation sequencing long noisy reads |
url | https://www.frontiersin.org/articles/10.3389/fgene.2022.890651/full |
work_keys_str_mv | AT zegangwei kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph AT xingguofan kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph AT haozhang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph AT xiaodanzhang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph AT feiliu kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph AT yuqian kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph AT shaowuzhang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph |