Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data

Abstract Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computationa...

Full description

Bibliographic Details
Main Authors: Yichen Henry Liu, Can Luo, Staunton G. Golding, Jacob B. Ioffe, Xin Maizie Zhou
Format: Article
Language:English
Published: Nature Portfolio 2024-03-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-46614-z
_version_ 1797247145656451072
author Yichen Henry Liu
Can Luo
Staunton G. Golding
Jacob B. Ioffe
Xin Maizie Zhou
author_facet Yichen Henry Liu
Can Luo
Staunton G. Golding
Jacob B. Ioffe
Xin Maizie Zhou
author_sort Yichen Henry Liu
collection DOAJ
description Abstract Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
first_indexed 2024-04-24T19:54:03Z
format Article
id doaj.art-99142cf3d0274c468305226d7de086e6
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-04-24T19:54:03Z
publishDate 2024-03-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-99142cf3d0274c468305226d7de086e62024-03-24T12:26:15ZengNature PortfolioNature Communications2041-17232024-03-0115112210.1038/s41467-024-46614-zTradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing dataYichen Henry Liu0Can Luo1Staunton G. Golding2Jacob B. Ioffe3Xin Maizie Zhou4Department of Computer Science, Vanderbilt UniversityDepartment of Biomedical Engineering, Vanderbilt UniversityDepartment of Biomedical Engineering, Vanderbilt UniversityDepartment of Computer Science, Vanderbilt UniversityDepartment of Computer Science, Vanderbilt UniversityAbstract Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.https://doi.org/10.1038/s41467-024-46614-z
spellingShingle Yichen Henry Liu
Can Luo
Staunton G. Golding
Jacob B. Ioffe
Xin Maizie Zhou
Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
Nature Communications
title Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
title_full Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
title_fullStr Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
title_full_unstemmed Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
title_short Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
title_sort tradeoffs in alignment and assembly based methods for structural variant detection with long read sequencing data
url https://doi.org/10.1038/s41467-024-46614-z
work_keys_str_mv AT yichenhenryliu tradeoffsinalignmentandassemblybasedmethodsforstructuralvariantdetectionwithlongreadsequencingdata
AT canluo tradeoffsinalignmentandassemblybasedmethodsforstructuralvariantdetectionwithlongreadsequencingdata
AT stauntonggolding tradeoffsinalignmentandassemblybasedmethodsforstructuralvariantdetectionwithlongreadsequencingdata
AT jacobbioffe tradeoffsinalignmentandassemblybasedmethodsforstructuralvariantdetectionwithlongreadsequencingdata
AT xinmaiziezhou tradeoffsinalignmentandassemblybasedmethodsforstructuralvariantdetectionwithlongreadsequencingdata