UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
Abstract Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow,...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-05-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-023-00724-w |
_version_ | 1827947973597724672 |
---|---|
author | Eftychia E. Kontou Axel Walter Oliver Alka Julianus Pfeuffer Timo Sachsenberg Omkar S. Mohite Matin Nuhamunada Oliver Kohlbacher Tilmann Weber |
author_facet | Eftychia E. Kontou Axel Walter Oliver Alka Julianus Pfeuffer Timo Sachsenberg Omkar S. Mohite Matin Nuhamunada Oliver Kohlbacher Tilmann Weber |
author_sort | Eftychia E. Kontou |
collection | DOAJ |
description | Abstract Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. Graphical Abstract |
first_indexed | 2024-04-09T12:47:07Z |
format | Article |
id | doaj.art-3eac90bcea8046f0b2c75985a30b6984 |
institution | Directory Open Access Journal |
issn | 1758-2946 |
language | English |
last_indexed | 2024-04-09T12:47:07Z |
publishDate | 2023-05-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj.art-3eac90bcea8046f0b2c75985a30b69842023-05-14T11:25:57ZengBMCJournal of Cheminformatics1758-29462023-05-0115111210.1186/s13321-023-00724-wUmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysisEftychia E. Kontou0Axel Walter1Oliver Alka2Julianus Pfeuffer3Timo Sachsenberg4Omkar S. Mohite5Matin Nuhamunada6Oliver Kohlbacher7Tilmann Weber8The Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenVisual and Data-Centric Computing, Zuse Institute BerlinApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenThe Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkThe Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkApplied Bioinformatics, Department of Computer Science, Eberhard Karls University TübingenThe Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkAbstract Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. Graphical Abstracthttps://doi.org/10.1186/s13321-023-00724-wUntargeted metabolomicsProcessingAnalysisHigh-throughput workflowSoftware |
spellingShingle | Eftychia E. Kontou Axel Walter Oliver Alka Julianus Pfeuffer Timo Sachsenberg Omkar S. Mohite Matin Nuhamunada Oliver Kohlbacher Tilmann Weber UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis Journal of Cheminformatics Untargeted metabolomics Processing Analysis High-throughput workflow Software |
title | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_full | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_fullStr | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_full_unstemmed | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_short | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_sort | umetaflow an untargeted metabolomics workflow for high throughput data processing and analysis |
topic | Untargeted metabolomics Processing Analysis High-throughput workflow Software |
url | https://doi.org/10.1186/s13321-023-00724-w |
work_keys_str_mv | AT eftychiaekontou umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT axelwalter umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT oliveralka umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT julianuspfeuffer umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT timosachsenberg umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT omkarsmohite umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT matinnuhamunada umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT oliverkohlbacher umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT tilmannweber umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis |