Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.

Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downs...

Full description

Bibliographic Details
Main Authors: Tsunglin Liu, Cheng-Hung Tsai, Wen-Bin Lee, Jung-Hsien Chiang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3726674?pdf=render
_version_ 1819237799899955200
author Tsunglin Liu
Cheng-Hung Tsai
Wen-Bin Lee
Jung-Hsien Chiang
author_facet Tsunglin Liu
Cheng-Hung Tsai
Wen-Bin Lee
Jung-Hsien Chiang
author_sort Tsunglin Liu
collection DOAJ
description Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.
first_indexed 2024-12-23T13:26:05Z
format Article
id doaj.art-8792505374bc424bbe5f9dc46009e5ff
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-23T13:26:05Z
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-8792505374bc424bbe5f9dc46009e5ff2022-12-21T17:45:19ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0187e6950310.1371/journal.pone.0069503Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.Tsunglin LiuCheng-Hung TsaiWen-Bin LeeJung-Hsien ChiangNext-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.http://europepmc.org/articles/PMC3726674?pdf=render
spellingShingle Tsunglin Liu
Cheng-Hung Tsai
Wen-Bin Lee
Jung-Hsien Chiang
Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.
PLoS ONE
title Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.
title_full Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.
title_fullStr Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.
title_full_unstemmed Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.
title_short Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.
title_sort optimizing information in next generation sequencing ngs reads for improving de novo genome assembly
url http://europepmc.org/articles/PMC3726674?pdf=render
work_keys_str_mv AT tsunglinliu optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly
AT chenghungtsai optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly
AT wenbinlee optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly
AT junghsienchiang optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly