NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads

Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has ma...

Full description

Bibliographic Details
Main Authors: Eleni Adam, Desh Ranjan, Harold Riethman
Format: Article
Language:English
Published: BMC 2022-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-05081-3
_version_ 1811291583342968832
author Eleni Adam
Desh Ranjan
Harold Riethman
author_facet Eleni Adam
Desh Ranjan
Harold Riethman
author_sort Eleni Adam
collection DOAJ
description Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). Conclusion In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable.
first_indexed 2024-04-13T04:31:35Z
format Article
id doaj.art-a2d89157d1a84158887587a323459a60
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T04:31:35Z
publishDate 2022-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-a2d89157d1a84158887587a323459a602022-12-22T03:02:18ZengBMCBMC Bioinformatics1471-21052022-12-0123111610.1186/s12859-022-05081-3NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-readsEleni Adam0Desh Ranjan1Harold Riethman2Department of Computer Science, Old Dominion UniversityDepartment of Computer Science, Old Dominion UniversityMedical Diagnostic and Translational Sciences, Old Dominion UniversityAbstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). Conclusion In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable.https://doi.org/10.1186/s12859-022-05081-3TelomeresSubtelomeresSegmental duplicationsTandem repeatsHybrid assemblyNanopore
spellingShingle Eleni Adam
Desh Ranjan
Harold Riethman
NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
BMC Bioinformatics
Telomeres
Subtelomeres
Segmental duplications
Tandem repeats
Hybrid assembly
Nanopore
title NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
title_full NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
title_fullStr NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
title_full_unstemmed NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
title_short NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
title_sort npgreat assembly of human subtelomere regions with the use of ultralong nanopore reads and linked reads
topic Telomeres
Subtelomeres
Segmental duplications
Tandem repeats
Hybrid assembly
Nanopore
url https://doi.org/10.1186/s12859-022-05081-3
work_keys_str_mv AT eleniadam npgreatassemblyofhumansubtelomereregionswiththeuseofultralongnanoporereadsandlinkedreads
AT deshranjan npgreatassemblyofhumansubtelomereregionswiththeuseofultralongnanoporereadsandlinkedreads
AT haroldriethman npgreatassemblyofhumansubtelomereregionswiththeuseofultralongnanoporereadsandlinkedreads