CAMISIM: simulating metagenomes and microbial communities

Abstract Background Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively a...

Full description

Bibliographic Details
Main Authors: Adrian Fritz, Peter Hofmann, Stephan Majda, Eik Dahms, Johannes Dröge, Jessika Fiedler, Till R. Lesker, Peter Belmann, Matthew Z. DeMaere, Aaron E. Darling, Alexander Sczyrba, Andreas Bremges, Alice C. McHardy
Format: Article
Language:English
Published: BMC 2019-02-01
Series:Microbiome
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40168-019-0633-6
_version_ 1830476271568551936
author Adrian Fritz
Peter Hofmann
Stephan Majda
Eik Dahms
Johannes Dröge
Jessika Fiedler
Till R. Lesker
Peter Belmann
Matthew Z. DeMaere
Aaron E. Darling
Alexander Sczyrba
Andreas Bremges
Alice C. McHardy
author_facet Adrian Fritz
Peter Hofmann
Stephan Majda
Eik Dahms
Johannes Dröge
Jessika Fiedler
Till R. Lesker
Peter Belmann
Matthew Z. DeMaere
Aaron E. Darling
Alexander Sczyrba
Andreas Bremges
Alice C. McHardy
author_sort Adrian Fritz
collection DOAJ
description Abstract Background Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Results We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. Conclusions CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM
first_indexed 2024-12-21T15:51:38Z
format Article
id doaj.art-691fffe684ce4951892d70988eef06af
institution Directory Open Access Journal
issn 2049-2618
language English
last_indexed 2024-12-21T15:51:38Z
publishDate 2019-02-01
publisher BMC
record_format Article
series Microbiome
spelling doaj.art-691fffe684ce4951892d70988eef06af2022-12-21T18:58:13ZengBMCMicrobiome2049-26182019-02-017111210.1186/s40168-019-0633-6CAMISIM: simulating metagenomes and microbial communitiesAdrian Fritz0Peter Hofmann1Stephan Majda2Eik Dahms3Johannes Dröge4Jessika Fiedler5Till R. Lesker6Peter Belmann7Matthew Z. DeMaere8Aaron E. Darling9Alexander Sczyrba10Andreas Bremges11Alice C. McHardy12Computational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchThe ithree institute, University of Technology SydneyThe ithree institute, University of Technology SydneyCenter for Biotechnology and Faculty of Technology, Bielefeld UniversityComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchComputational Biology of Infection Research, Helmholtz Centre for Infection ResearchAbstract Background Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Results We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. Conclusions CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIMhttp://link.springer.com/article/10.1186/s40168-019-0633-6Metagenomics softwareMicrobial communityBenchmarkingSimulationMetagenome assemblyGenome binning
spellingShingle Adrian Fritz
Peter Hofmann
Stephan Majda
Eik Dahms
Johannes Dröge
Jessika Fiedler
Till R. Lesker
Peter Belmann
Matthew Z. DeMaere
Aaron E. Darling
Alexander Sczyrba
Andreas Bremges
Alice C. McHardy
CAMISIM: simulating metagenomes and microbial communities
Microbiome
Metagenomics software
Microbial community
Benchmarking
Simulation
Metagenome assembly
Genome binning
title CAMISIM: simulating metagenomes and microbial communities
title_full CAMISIM: simulating metagenomes and microbial communities
title_fullStr CAMISIM: simulating metagenomes and microbial communities
title_full_unstemmed CAMISIM: simulating metagenomes and microbial communities
title_short CAMISIM: simulating metagenomes and microbial communities
title_sort camisim simulating metagenomes and microbial communities
topic Metagenomics software
Microbial community
Benchmarking
Simulation
Metagenome assembly
Genome binning
url http://link.springer.com/article/10.1186/s40168-019-0633-6
work_keys_str_mv AT adrianfritz camisimsimulatingmetagenomesandmicrobialcommunities
AT peterhofmann camisimsimulatingmetagenomesandmicrobialcommunities
AT stephanmajda camisimsimulatingmetagenomesandmicrobialcommunities
AT eikdahms camisimsimulatingmetagenomesandmicrobialcommunities
AT johannesdroge camisimsimulatingmetagenomesandmicrobialcommunities
AT jessikafiedler camisimsimulatingmetagenomesandmicrobialcommunities
AT tillrlesker camisimsimulatingmetagenomesandmicrobialcommunities
AT peterbelmann camisimsimulatingmetagenomesandmicrobialcommunities
AT matthewzdemaere camisimsimulatingmetagenomesandmicrobialcommunities
AT aaronedarling camisimsimulatingmetagenomesandmicrobialcommunities
AT alexandersczyrba camisimsimulatingmetagenomesandmicrobialcommunities
AT andreasbremges camisimsimulatingmetagenomesandmicrobialcommunities
AT alicecmchardy camisimsimulatingmetagenomesandmicrobialcommunities