UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis

Abstract Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow,...

Full description

Bibliographic Details
Main Authors: Eftychia E. Kontou, Axel Walter, Oliver Alka, Julianus Pfeuffer, Timo Sachsenberg, Omkar S. Mohite, Matin Nuhamunada, Oliver Kohlbacher, Tilmann Weber
Format: Article
Language:English
Published: BMC 2023-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-023-00724-w
_version_ 1827947973597724672
author Eftychia E. Kontou
Axel Walter
Oliver Alka
Julianus Pfeuffer
Timo Sachsenberg
Omkar S. Mohite
Matin Nuhamunada
Oliver Kohlbacher
Tilmann Weber
author_facet Eftychia E. Kontou
Axel Walter
Oliver Alka
Julianus Pfeuffer
Timo Sachsenberg
Omkar S. Mohite
Matin Nuhamunada
Oliver Kohlbacher
Tilmann Weber
author_sort Eftychia E. Kontou
collection DOAJ
description Abstract Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. Graphical Abstract
first_indexed 2024-04-09T12:47:07Z
format Article
id doaj.art-3eac90bcea8046f0b2c75985a30b6984
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-04-09T12:47:07Z
publishDate 2023-05-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-3eac90bcea8046f0b2c75985a30b69842023-05-14T11:25:57ZengBMCJournal of Cheminformatics1758-29462023-05-0115111210.1186/s13321-023-00724-wUmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysisEftychia E. Kontou0Axel Walter1Oliver Alka2Julianus Pfeuffer3Timo Sachsenberg4Omkar S. Mohite5Matin Nuhamunada6Oliver Kohlbacher7Tilmann Weber8The Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenVisual and Data-Centric Computing, Zuse Institute BerlinApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenThe Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkThe Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenThe Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkAbstract Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. Graphical Abstracthttps://doi.org/10.1186/s13321-023-00724-wUntargeted metabolomicsProcessingAnalysisHigh-throughput workflowSoftware
spellingShingle Eftychia E. Kontou
Axel Walter
Oliver Alka
Julianus Pfeuffer
Timo Sachsenberg
Omkar S. Mohite
Matin Nuhamunada
Oliver Kohlbacher
Tilmann Weber
UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
Journal of Cheminformatics
Untargeted metabolomics
Processing
Analysis
High-throughput workflow
Software
title UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_full UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_fullStr UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_full_unstemmed UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_short UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_sort umetaflow an untargeted metabolomics workflow for high throughput data processing and analysis
topic Untargeted metabolomics
Processing
Analysis
High-throughput workflow
Software
url https://doi.org/10.1186/s13321-023-00724-w
work_keys_str_mv AT eftychiaekontou umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT axelwalter umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT oliveralka umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT julianuspfeuffer umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT timosachsenberg umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT omkarsmohite umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT matinnuhamunada umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT oliverkohlbacher umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT tilmannweber umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis