Effects of GC bias in next-generation-sequencing data on de novo genome assembly.

Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NG...

Full description

Bibliographic Details
Main Authors: Yen-Chun Chen, Tsunglin Liu, Chun-Hui Yu, Tzen-Yuh Chiang, Chi-Chuan Hwang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3639258?pdf=render
_version_ 1818498700497911808
author Yen-Chun Chen
Tsunglin Liu
Chun-Hui Yu
Tzen-Yuh Chiang
Chi-Chuan Hwang
author_facet Yen-Chun Chen
Tsunglin Liu
Chun-Hui Yu
Tzen-Yuh Chiang
Chi-Chuan Hwang
author_sort Yen-Chun Chen
collection DOAJ
description Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.
first_indexed 2024-12-10T20:19:07Z
format Article
id doaj.art-5ded69ec4f504964a4c1e80db31ad877
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-10T20:19:07Z
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-5ded69ec4f504964a4c1e80db31ad8772022-12-22T01:35:05ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0184e6285610.1371/journal.pone.0062856Effects of GC bias in next-generation-sequencing data on de novo genome assembly.Yen-Chun ChenTsunglin LiuChun-Hui YuTzen-Yuh ChiangChi-Chuan HwangNext-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.http://europepmc.org/articles/PMC3639258?pdf=render
spellingShingle Yen-Chun Chen
Tsunglin Liu
Chun-Hui Yu
Tzen-Yuh Chiang
Chi-Chuan Hwang
Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
PLoS ONE
title Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
title_full Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
title_fullStr Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
title_full_unstemmed Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
title_short Effects of GC bias in next-generation-sequencing data on de novo genome assembly.
title_sort effects of gc bias in next generation sequencing data on de novo genome assembly
url http://europepmc.org/articles/PMC3639258?pdf=render
work_keys_str_mv AT yenchunchen effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT tsunglinliu effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT chunhuiyu effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT tzenyuhchiang effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT chichuanhwang effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly