MetaGen: reference-free learning with multiple metagenomic samples

Abstract A major goal of metagenomics is to identify and study the entire collection of microbial species in a set of targeted samples. We describe a statistical metagenomic algorithm that simultaneously identifies microbial species and estimates their abundances without using reference genomes. As...

Full description

Bibliographic Details
Main Authors: Xin Xing, Jun S. Liu, Wenxuan Zhong
Format: Article
Language:English
Published: BMC 2017-10-01
Series:Genome Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13059-017-1323-y
_version_ 1819083816823685120
author Xin Xing
Jun S. Liu
Wenxuan Zhong
author_facet Xin Xing
Jun S. Liu
Wenxuan Zhong
author_sort Xin Xing
collection DOAJ
description Abstract A major goal of metagenomics is to identify and study the entire collection of microbial species in a set of targeted samples. We describe a statistical metagenomic algorithm that simultaneously identifies microbial species and estimates their abundances without using reference genomes. As a trade-off, we require multiple metagenomic samples, usually ≥10 samples, to get highly accurate binning results. Compared to reference-free methods based primarily on k-mer distributions or coverage information, the proposed approach achieves a higher species binning accuracy and is particularly powerful when sequencing coverage is low. We demonstrated the performance of this new method through both simulation and real metagenomic studies. The MetaGen software is available at https://github.com/BioAlgs/MetaGen .
first_indexed 2024-12-21T20:38:35Z
format Article
id doaj.art-f0473b5ef87f44c594224ca35cc641dd
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-12-21T20:38:35Z
publishDate 2017-10-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-f0473b5ef87f44c594224ca35cc641dd2022-12-21T18:51:02ZengBMCGenome Biology1474-760X2017-10-0118111510.1186/s13059-017-1323-yMetaGen: reference-free learning with multiple metagenomic samplesXin Xing0Jun S. Liu1Wenxuan Zhong2Department of Statistics, University of GeorgiaDepartment of Statistics, Harvard UniversityDepartment of Statistics, University of GeorgiaAbstract A major goal of metagenomics is to identify and study the entire collection of microbial species in a set of targeted samples. We describe a statistical metagenomic algorithm that simultaneously identifies microbial species and estimates their abundances without using reference genomes. As a trade-off, we require multiple metagenomic samples, usually ≥10 samples, to get highly accurate binning results. Compared to reference-free methods based primarily on k-mer distributions or coverage information, the proposed approach achieves a higher species binning accuracy and is particularly powerful when sequencing coverage is low. We demonstrated the performance of this new method through both simulation and real metagenomic studies. The MetaGen software is available at https://github.com/BioAlgs/MetaGen .http://link.springer.com/article/10.1186/s13059-017-1323-yMetagenomicsBinningMixture modelMultinomialUnsupervised learning
spellingShingle Xin Xing
Jun S. Liu
Wenxuan Zhong
MetaGen: reference-free learning with multiple metagenomic samples
Genome Biology
Metagenomics
Binning
Mixture model
Multinomial
Unsupervised learning
title MetaGen: reference-free learning with multiple metagenomic samples
title_full MetaGen: reference-free learning with multiple metagenomic samples
title_fullStr MetaGen: reference-free learning with multiple metagenomic samples
title_full_unstemmed MetaGen: reference-free learning with multiple metagenomic samples
title_short MetaGen: reference-free learning with multiple metagenomic samples
title_sort metagen reference free learning with multiple metagenomic samples
topic Metagenomics
Binning
Mixture model
Multinomial
Unsupervised learning
url http://link.springer.com/article/10.1186/s13059-017-1323-y
work_keys_str_mv AT xinxing metagenreferencefreelearningwithmultiplemetagenomicsamples
AT junsliu metagenreferencefreelearningwithmultiplemetagenomicsamples
AT wenxuanzhong metagenreferencefreelearningwithmultiplemetagenomicsamples