Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency

<p>Abstract</p> <p>Background</p> <p>Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However...

Full description

Bibliographic Details
Main Authors: Hsu Ming-Tsung, Su Chien-Hao, Weng Francis C, Wang Tse-Yi, Tsai Huai-Kuang, Wang Daryi
Format: Article
Language:English
Published: BMC 2010-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/565
_version_ 1818149692391817216
author Hsu Ming-Tsung
Su Chien-Hao
Weng Francis C
Wang Tse-Yi
Tsai Huai-Kuang
Wang Daryi
author_facet Hsu Ming-Tsung
Su Chien-Hao
Weng Francis C
Wang Tse-Yi
Tsai Huai-Kuang
Wang Daryi
author_sort Hsu Ming-Tsung
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies.</p> <p>Results</p> <p>Among the discarded data, we found that 23.7 ± 3.9% of singletons and 14.1 ± 1.0% of contigs were assigned to taxa. The recovery rates for singletons were higher than those for contigs. The <it>Pearson </it>correlation coefficient revealed a high degree of similarity (0.94 ± 0.03 at the phylum rank and 0.80 ± 0.11 at the family rank) between the proposed taxonomic binning approach and those reported in original studies. In addition, an evaluation using simulated data demonstrated the reliability of the proposed approach.</p> <p>Conclusions</p> <p>Our findings suggest that taking account of conserved neighboring gene adjacency improves taxonomic assignment when analyzing metagenomes using Sanger sequencing. In other words, utilizing the conserved gene order as a criterion will reduce the amount of data discarded when analyzing metagenomes.</p>
first_indexed 2024-12-11T13:11:04Z
format Article
id doaj.art-db9e3ad2642f4a4d8e5fea1baad94981
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T13:11:04Z
publishDate 2010-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-db9e3ad2642f4a4d8e5fea1baad949812022-12-22T01:06:10ZengBMCBMC Bioinformatics1471-21052010-11-0111156510.1186/1471-2105-11-565Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacencyHsu Ming-TsungSu Chien-HaoWeng Francis CWang Tse-YiTsai Huai-KuangWang Daryi<p>Abstract</p> <p>Background</p> <p>Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies.</p> <p>Results</p> <p>Among the discarded data, we found that 23.7 ± 3.9% of singletons and 14.1 ± 1.0% of contigs were assigned to taxa. The recovery rates for singletons were higher than those for contigs. The <it>Pearson </it>correlation coefficient revealed a high degree of similarity (0.94 ± 0.03 at the phylum rank and 0.80 ± 0.11 at the family rank) between the proposed taxonomic binning approach and those reported in original studies. In addition, an evaluation using simulated data demonstrated the reliability of the proposed approach.</p> <p>Conclusions</p> <p>Our findings suggest that taking account of conserved neighboring gene adjacency improves taxonomic assignment when analyzing metagenomes using Sanger sequencing. In other words, utilizing the conserved gene order as a criterion will reduce the amount of data discarded when analyzing metagenomes.</p>http://www.biomedcentral.com/1471-2105/11/565
spellingShingle Hsu Ming-Tsung
Su Chien-Hao
Weng Francis C
Wang Tse-Yi
Tsai Huai-Kuang
Wang Daryi
Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
BMC Bioinformatics
title Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_full Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_fullStr Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_full_unstemmed Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_short Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_sort reanalyze unassigned reads in sanger based metagenomic data using conserved gene adjacency
url http://www.biomedcentral.com/1471-2105/11/565
work_keys_str_mv AT hsumingtsung reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT suchienhao reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT wengfrancisc reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT wangtseyi reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT tsaihuaikuang reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT wangdaryi reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency