Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We he...

Full description

Bibliographic Details
Main Authors: Colin F Davenport, Jens Neugebauer, Nils Beckmann, Benedikt Friedrich, Burim Kameri, Svea Kokott, Malte Paetow, Björn Siekmann, Matthias Wieding-Drewes, Markus Wienhöfer, Stefan Wolf, Burkhard Tümmler, Volker Ahlers, Frauke Sprengel
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3424124?pdf=render
_version_ 1830479860893483008
author Colin F Davenport
Jens Neugebauer
Nils Beckmann
Benedikt Friedrich
Burim Kameri
Svea Kokott
Malte Paetow
Björn Siekmann
Matthias Wieding-Drewes
Markus Wienhöfer
Stefan Wolf
Burkhard Tümmler
Volker Ahlers
Frauke Sprengel
author_facet Colin F Davenport
Jens Neugebauer
Nils Beckmann
Benedikt Friedrich
Burim Kameri
Svea Kokott
Malte Paetow
Björn Siekmann
Matthias Wieding-Drewes
Markus Wienhöfer
Stefan Wolf
Burkhard Tümmler
Volker Ahlers
Frauke Sprengel
author_sort Colin F Davenport
collection DOAJ
description Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer.The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.
first_indexed 2024-12-21T16:54:53Z
format Article
id doaj.art-5c7ca8188f014259879a9befb094ac5d
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-21T16:54:53Z
publishDate 2012-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-5c7ca8188f014259879a9befb094ac5d2022-12-21T18:56:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0178e4122410.1371/journal.pone.0041224Genometa--a fast and accurate classifier for short metagenomic shotgun reads.Colin F DavenportJens NeugebauerNils BeckmannBenedikt FriedrichBurim KameriSvea KokottMalte PaetowBjörn SiekmannMatthias Wieding-DrewesMarkus WienhöferStefan WolfBurkhard TümmlerVolker AhlersFrauke SprengelMetagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer.The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.http://europepmc.org/articles/PMC3424124?pdf=render
spellingShingle Colin F Davenport
Jens Neugebauer
Nils Beckmann
Benedikt Friedrich
Burim Kameri
Svea Kokott
Malte Paetow
Björn Siekmann
Matthias Wieding-Drewes
Markus Wienhöfer
Stefan Wolf
Burkhard Tümmler
Volker Ahlers
Frauke Sprengel
Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
PLoS ONE
title Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
title_full Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
title_fullStr Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
title_full_unstemmed Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
title_short Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
title_sort genometa a fast and accurate classifier for short metagenomic shotgun reads
url http://europepmc.org/articles/PMC3424124?pdf=render
work_keys_str_mv AT colinfdavenport genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT jensneugebauer genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT nilsbeckmann genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT benediktfriedrich genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT burimkameri genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT sveakokott genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT maltepaetow genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT bjornsiekmann genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT matthiaswiedingdrewes genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT markuswienhofer genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT stefanwolf genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT burkhardtummler genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT volkerahlers genometaafastandaccurateclassifierforshortmetagenomicshotgunreads
AT fraukesprengel genometaafastandaccurateclassifierforshortmetagenomicshotgunreads