FMLRC: Hybrid long read error correction using an FM-index

Abstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and thr...

Full description

Bibliographic Details
Main Authors: Jeremy R. Wang, James Holt, Leonard McMillan, Corbin D. Jones
Format: Article
Language:English
Published: BMC 2018-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2051-3
_version_ 1811265313778434048
author Jeremy R. Wang
James Holt
Leonard McMillan
Corbin D. Jones
author_facet Jeremy R. Wang
James Holt
Leonard McMillan
Corbin D. Jones
author_sort Jeremy R. Wang
collection DOAJ
description Abstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
first_indexed 2024-04-12T20:20:38Z
format Article
id doaj.art-66b022e366b14f87b35a8b95e1fc3194
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T20:20:38Z
publishDate 2018-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-66b022e366b14f87b35a8b95e1fc31942022-12-22T03:18:00ZengBMCBMC Bioinformatics1471-21052018-02-0119111110.1186/s12859-018-2051-3FMLRC: Hybrid long read error correction using an FM-indexJeremy R. Wang0James Holt1Leonard McMillan2Corbin D. Jones3Department of Genetics, University of North Carolina at Chapel HillDepartment of Computer Science, University of North Carolina at Chapel HillDepartment of Computer Science, University of North Carolina at Chapel HillDepartment of Biology and Integrative Program for Biological and Genome Sciences, University of North Carolina at Chapel HillAbstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.http://link.springer.com/article/10.1186/s12859-018-2051-3de novo assemblyHybrid error correctionLong readPacbioBWTFM-Index
spellingShingle Jeremy R. Wang
James Holt
Leonard McMillan
Corbin D. Jones
FMLRC: Hybrid long read error correction using an FM-index
BMC Bioinformatics
de novo assembly
Hybrid error correction
Long read
Pacbio
BWT
FM-Index
title FMLRC: Hybrid long read error correction using an FM-index
title_full FMLRC: Hybrid long read error correction using an FM-index
title_fullStr FMLRC: Hybrid long read error correction using an FM-index
title_full_unstemmed FMLRC: Hybrid long read error correction using an FM-index
title_short FMLRC: Hybrid long read error correction using an FM-index
title_sort fmlrc hybrid long read error correction using an fm index
topic de novo assembly
Hybrid error correction
Long read
Pacbio
BWT
FM-Index
url http://link.springer.com/article/10.1186/s12859-018-2051-3
work_keys_str_mv AT jeremyrwang fmlrchybridlongreaderrorcorrectionusinganfmindex
AT jamesholt fmlrchybridlongreaderrorcorrectionusinganfmindex
AT leonardmcmillan fmlrchybridlongreaderrorcorrectionusinganfmindex
AT corbindjones fmlrchybridlongreaderrorcorrectionusinganfmindex