NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has ma...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-022-05081-3 |
_version_ | 1811291583342968832 |
---|---|
author | Eleni Adam Desh Ranjan Harold Riethman |
author_facet | Eleni Adam Desh Ranjan Harold Riethman |
author_sort | Eleni Adam |
collection | DOAJ |
description | Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). Conclusion In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable. |
first_indexed | 2024-04-13T04:31:35Z |
format | Article |
id | doaj.art-a2d89157d1a84158887587a323459a60 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-13T04:31:35Z |
publishDate | 2022-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-a2d89157d1a84158887587a323459a602022-12-22T03:02:18ZengBMCBMC Bioinformatics1471-21052022-12-0123111610.1186/s12859-022-05081-3NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-readsEleni Adam0Desh Ranjan1Harold Riethman2Department of Computer Science, Old Dominion UniversityDepartment of Computer Science, Old Dominion UniversityMedical Diagnostic and Translational Sciences, Old Dominion UniversityAbstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). Conclusion In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable.https://doi.org/10.1186/s12859-022-05081-3TelomeresSubtelomeresSegmental duplicationsTandem repeatsHybrid assemblyNanopore |
spellingShingle | Eleni Adam Desh Ranjan Harold Riethman NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads BMC Bioinformatics Telomeres Subtelomeres Segmental duplications Tandem repeats Hybrid assembly Nanopore |
title | NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads |
title_full | NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads |
title_fullStr | NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads |
title_full_unstemmed | NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads |
title_short | NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads |
title_sort | npgreat assembly of human subtelomere regions with the use of ultralong nanopore reads and linked reads |
topic | Telomeres Subtelomeres Segmental duplications Tandem repeats Hybrid assembly Nanopore |
url | https://doi.org/10.1186/s12859-022-05081-3 |
work_keys_str_mv | AT eleniadam npgreatassemblyofhumansubtelomereregionswiththeuseofultralongnanoporereadsandlinkedreads AT deshranjan npgreatassemblyofhumansubtelomereregionswiththeuseofultralongnanoporereadsandlinkedreads AT haroldriethman npgreatassemblyofhumansubtelomereregionswiththeuseofultralongnanoporereadsandlinkedreads |