SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]

Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the co...

Full description

Bibliographic Details
Main Authors: Vincent Hervé, Nachida Tadrent, Franck Dedeine
Format: Article
Language:English
Published: F1000 Research Ltd 2023-02-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/11-1522/v2
_version_ 1811162030350008320
author Vincent Hervé
Nachida Tadrent
Franck Dedeine
author_facet Vincent Hervé
Nachida Tadrent
Franck Dedeine
author_sort Vincent Hervé
collection DOAJ
description Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. Methods: To overcome this challenge, we developed SnakeMAGs, a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM, GUNC), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Results: Using termite gut metagenomic datasets, we showed that SnakeMAGs is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Importantly, these additional MAGs showed no significant difference compared to the other ones in terms of completeness, contamination, genome size nor relative abundance. Conclusions: Overall, it should make the reconstruction of MAGs more accessible to microbiologists. SnakeMAGs as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs.
first_indexed 2024-04-10T06:23:05Z
format Article
id doaj.art-b06bb4cbcd0049bd95caef8cdd1f3e12
institution Directory Open Access Journal
issn 2046-1402
language English
last_indexed 2024-04-10T06:23:05Z
publishDate 2023-02-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj.art-b06bb4cbcd0049bd95caef8cdd1f3e122023-03-02T01:00:01ZengF1000 Research LtdF1000Research2046-14022023-02-0111144793SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]Vincent Hervé0https://orcid.org/0000-0003-2450-8514Nachida Tadrent1https://orcid.org/0000-0003-2450-8514Franck Dedeine2https://orcid.org/0000-0002-0646-4725Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, FranceInstitut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, FranceInstitut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, FranceBackground: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. Methods: To overcome this challenge, we developed SnakeMAGs, a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM, GUNC), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Results: Using termite gut metagenomic datasets, we showed that SnakeMAGs is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Importantly, these additional MAGs showed no significant difference compared to the other ones in terms of completeness, contamination, genome size nor relative abundance. Conclusions: Overall, it should make the reconstruction of MAGs more accessible to microbiologists. SnakeMAGs as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs.https://f1000research.com/articles/11-1522/v2Snakemake metagenomics microbiology genomics bioinformatics microbial ecologyeng
spellingShingle Vincent Hervé
Nachida Tadrent
Franck Dedeine
SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]
F1000Research
Snakemake
metagenomics
microbiology
genomics
bioinformatics
microbial ecology
eng
title SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]
title_full SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]
title_fullStr SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]
title_full_unstemmed SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]
title_short SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]
title_sort snakemags a simple efficient flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes version 2 peer review 2 approved
topic Snakemake
metagenomics
microbiology
genomics
bioinformatics
microbial ecology
eng
url https://f1000research.com/articles/11-1522/v2
work_keys_str_mv AT vincentherve snakemagsasimpleefficientflexibleandscalableworkflowtoreconstructprokaryoticgenomesfrommetagenomesversion2peerreview2approved
AT nachidatadrent snakemagsasimpleefficientflexibleandscalableworkflowtoreconstructprokaryoticgenomesfrommetagenomesversion2peerreview2approved
AT franckdedeine snakemagsasimpleefficientflexibleandscalableworkflowtoreconstructprokaryoticgenomesfrommetagenomesversion2peerreview2approved