Kermit: linkage map guided long read assembly

Abstract Background  With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large col...

Full description

Bibliographic Details
Main Authors: Riku Walve, Pasi Rastas, Leena Salmela
Format: Article
Language:English
Published: BMC 2019-03-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13015-019-0143-x
_version_ 1818845699840671744
author Riku Walve
Pasi Rastas
Leena Salmela
author_facet Riku Walve
Pasi Rastas
Leena Salmela
author_sort Riku Walve
collection DOAJ
description Abstract Background  With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. Results  We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly. Conclusions  We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.
first_indexed 2024-12-19T05:33:49Z
format Article
id doaj.art-2c59a6c371f1469498d217042c874b6f
institution Directory Open Access Journal
issn 1748-7188
language English
last_indexed 2024-12-19T05:33:49Z
publishDate 2019-03-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj.art-2c59a6c371f1469498d217042c874b6f2022-12-21T20:34:10ZengBMCAlgorithms for Molecular Biology1748-71882019-03-0114111010.1186/s13015-019-0143-xKermit: linkage map guided long read assemblyRiku Walve0Pasi Rastas1Leena Salmela2Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of HelsinkiInstitute of Biotechnology, University of HelsinkiDepartment of Computer Science, Helsinki Institute for Information Technology HIIT, University of HelsinkiAbstract Background  With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. Results  We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly. Conclusions  We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.http://link.springer.com/article/10.1186/s13015-019-0143-xGenome assemblyLinkage mapsColoured overlap graph
spellingShingle Riku Walve
Pasi Rastas
Leena Salmela
Kermit: linkage map guided long read assembly
Algorithms for Molecular Biology
Genome assembly
Linkage maps
Coloured overlap graph
title Kermit: linkage map guided long read assembly
title_full Kermit: linkage map guided long read assembly
title_fullStr Kermit: linkage map guided long read assembly
title_full_unstemmed Kermit: linkage map guided long read assembly
title_short Kermit: linkage map guided long read assembly
title_sort kermit linkage map guided long read assembly
topic Genome assembly
Linkage maps
Coloured overlap graph
url http://link.springer.com/article/10.1186/s13015-019-0143-x
work_keys_str_mv AT rikuwalve kermitlinkagemapguidedlongreadassembly
AT pasirastas kermitlinkagemapguidedlongreadassembly
AT leenasalmela kermitlinkagemapguidedlongreadassembly