FMLRC: Hybrid long read error correction using an FM-index

Abstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and thr...

Full description

Bibliographic Details
Main Authors:	Jeremy R. Wang, James Holt, Leonard McMillan, Corbin D. Jones
Format:	Article
Language:	English
Published:	BMC 2018-02-01
Series:	BMC Bioinformatics
Subjects:	de novo assembly Hybrid error correction Long read Pacbio BWT FM-Index
Online Access:	http://link.springer.com/article/10.1186/s12859-018-2051-3

_version_	1811265313778434048
author	Jeremy R. Wang James Holt Leonard McMillan Corbin D. Jones
author_facet	Jeremy R. Wang James Holt Leonard McMillan Corbin D. Jones
author_sort	Jeremy R. Wang
collection	DOAJ
description	Abstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
first_indexed	2024-04-12T20:20:38Z
format	Article
id	doaj.art-66b022e366b14f87b35a8b95e1fc3194
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-12T20:20:38Z
publishDate	2018-02-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-66b022e366b14f87b35a8b95e1fc31942022-12-22T03:18:00ZengBMCBMC Bioinformatics1471-21052018-02-0119111110.1186/s12859-018-2051-3FMLRC: Hybrid long read error correction using an FM-indexJeremy R. Wang0James Holt1Leonard McMillan2Corbin D. Jones3Department of Genetics, University of North Carolina at Chapel HillDepartment of Computer Science, University of North Carolina at Chapel HillDepartment of Computer Science, University of North Carolina at Chapel HillDepartment of Biology and Integrative Program for Biological and Genome Sciences, University of North Carolina at Chapel HillAbstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.http://link.springer.com/article/10.1186/s12859-018-2051-3de novo assemblyHybrid error correctionLong readPacbioBWTFM-Index
spellingShingle	Jeremy R. Wang James Holt Leonard McMillan Corbin D. Jones FMLRC: Hybrid long read error correction using an FM-index BMC Bioinformatics de novo assembly Hybrid error correction Long read Pacbio BWT FM-Index
title	FMLRC: Hybrid long read error correction using an FM-index
title_full	FMLRC: Hybrid long read error correction using an FM-index
title_fullStr	FMLRC: Hybrid long read error correction using an FM-index
title_full_unstemmed	FMLRC: Hybrid long read error correction using an FM-index
title_short	FMLRC: Hybrid long read error correction using an FM-index
title_sort	fmlrc hybrid long read error correction using an fm index
topic	de novo assembly Hybrid error correction Long read Pacbio BWT FM-Index
url	http://link.springer.com/article/10.1186/s12859-018-2051-3
work_keys_str_mv	AT jeremyrwang fmlrchybridlongreaderrorcorrectionusinganfmindex AT jamesholt fmlrchybridlongreaderrorcorrectionusinganfmindex AT leonardmcmillan fmlrchybridlongreaderrorcorrectionusinganfmindex AT corbindjones fmlrchybridlongreaderrorcorrectionusinganfmindex

FMLRC: Hybrid long read error correction using an FM-index

Similar Items