MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline

Abstract Background Microbial communities have become an important subject of research across multiple disciplines in recent years. These communities are often examined via shotgun metagenomic sequencing, a technology which can offer unique insights into the genomic content of a microbial community....

Full description

Bibliographic Details
Main Authors: Alexander Eng, Adrian J. Verster, Elhanan Borenstein
Format: Article
Language:English
Published: BMC 2020-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03815-9
_version_ 1828543416975228928
author Alexander Eng
Adrian J. Verster
Elhanan Borenstein
author_facet Alexander Eng
Adrian J. Verster
Elhanan Borenstein
author_sort Alexander Eng
collection DOAJ
description Abstract Background Microbial communities have become an important subject of research across multiple disciplines in recent years. These communities are often examined via shotgun metagenomic sequencing, a technology which can offer unique insights into the genomic content of a microbial community. Functional annotation of shotgun metagenomic data has become an increasingly popular method for identifying the aggregate functional capacities encoded by the community’s constituent microbes. Currently available metagenomic functional annotation pipelines, however, suffer from several shortcomings, including limited pipeline customization options, lack of standard raw sequence data pre-processing, and insufficient capabilities for integration with distributed computing systems. Results Here we introduce MetaLAFFA, a functional annotation pipeline designed to take unfiltered shotgun metagenomic data as input and generate functional profiles. MetaLAFFA is implemented as a Snakemake pipeline, which enables convenient integration with distributed computing clusters, allowing users to take full advantage of available computing resources. Default pipeline settings allow new users to run MetaLAFFA according to common practices while a Python module-based configuration system provides advanced users with a flexible interface for pipeline customization. MetaLAFFA also generates summary statistics for each step in the pipeline so that users can better understand pre-processing and annotation quality. Conclusions MetaLAFFA is a new end-to-end metagenomic functional annotation pipeline with distributed computing compatibility and flexible customization options. MetaLAFFA source code is available at https://github.com/borenstein-lab/MetaLAFFA and can be installed via Conda as described in the accompanying documentation.
first_indexed 2024-12-12T02:13:31Z
format Article
id doaj.art-c1239bd2f6e04e94856fcf733dbadeb9
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T02:13:31Z
publishDate 2020-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-c1239bd2f6e04e94856fcf733dbadeb92022-12-22T00:41:51ZengBMCBMC Bioinformatics1471-21052020-10-012111910.1186/s12859-020-03815-9MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipelineAlexander Eng0Adrian J. Verster1Elhanan Borenstein2Department of Genome Sciences, University of WashingtonDepartment of Genome Sciences, University of WashingtonBlavatnik School of Computer Science, Tel Aviv UniversityAbstract Background Microbial communities have become an important subject of research across multiple disciplines in recent years. These communities are often examined via shotgun metagenomic sequencing, a technology which can offer unique insights into the genomic content of a microbial community. Functional annotation of shotgun metagenomic data has become an increasingly popular method for identifying the aggregate functional capacities encoded by the community’s constituent microbes. Currently available metagenomic functional annotation pipelines, however, suffer from several shortcomings, including limited pipeline customization options, lack of standard raw sequence data pre-processing, and insufficient capabilities for integration with distributed computing systems. Results Here we introduce MetaLAFFA, a functional annotation pipeline designed to take unfiltered shotgun metagenomic data as input and generate functional profiles. MetaLAFFA is implemented as a Snakemake pipeline, which enables convenient integration with distributed computing clusters, allowing users to take full advantage of available computing resources. Default pipeline settings allow new users to run MetaLAFFA according to common practices while a Python module-based configuration system provides advanced users with a flexible interface for pipeline customization. MetaLAFFA also generates summary statistics for each step in the pipeline so that users can better understand pre-processing and annotation quality. Conclusions MetaLAFFA is a new end-to-end metagenomic functional annotation pipeline with distributed computing compatibility and flexible customization options. MetaLAFFA source code is available at https://github.com/borenstein-lab/MetaLAFFA and can be installed via Conda as described in the accompanying documentation.http://link.springer.com/article/10.1186/s12859-020-03815-9MetagenomicsFunctional annotationPipelineDistributed computing
spellingShingle Alexander Eng
Adrian J. Verster
Elhanan Borenstein
MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline
BMC Bioinformatics
Metagenomics
Functional annotation
Pipeline
Distributed computing
title MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline
title_full MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline
title_fullStr MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline
title_full_unstemmed MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline
title_short MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline
title_sort metalaffa a flexible end to end distributed computing compatible metagenomic functional annotation pipeline
topic Metagenomics
Functional annotation
Pipeline
Distributed computing
url http://link.springer.com/article/10.1186/s12859-020-03815-9
work_keys_str_mv AT alexandereng metalaffaaflexibleendtoenddistributedcomputingcompatiblemetagenomicfunctionalannotationpipeline
AT adrianjverster metalaffaaflexibleendtoenddistributedcomputingcompatiblemetagenomicfunctionalannotationpipeline
AT elhananborenstein metalaffaaflexibleendtoenddistributedcomputingcompatiblemetagenomicfunctionalannotationpipeline