Reproducibility of mass spectrometry based metabolomics data

Abstract Background Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Simi...

Full description

Bibliographic Details
Main Authors: Tusharkanti Ghosh, Daisy Philtron, Weiming Zhang, Katerina Kechris, Debashis Ghosh
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04336-9
_version_ 1830344979985203200
author Tusharkanti Ghosh
Daisy Philtron
Weiming Zhang
Katerina Kechris
Debashis Ghosh
author_facet Tusharkanti Ghosh
Daisy Philtron
Weiming Zhang
Katerina Kechris
Debashis Ghosh
author_sort Tusharkanti Ghosh
collection DOAJ
description Abstract Background Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, metabolites which are not consistent across replicates can be labeled as irreproducible. In this work, we introduce and evaluate the use (Ma)ximum (R)ank (R)eproducibility (MaRR) to examine reproducibility in mass spectrometry-based metabolomics experiments. We examine reproducibility across technical or biological samples in three different mass spectrometry metabolomics (MS-Metabolomics) data sets. Results We apply MaRR, a nonparametric approach that detects the change from reproducible to irreproducible signals using a maximal rank statistic. The advantage of using MaRR over model-based methods that it does not make parametric assumptions on the underlying distributions or dependence structures of reproducible metabolites. Using three MS Metabolomics data sets generated in the multi-center Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPD) study, we applied the MaRR procedure after data processing to explore reproducibility across technical or biological samples. Under realistic settings of MS-Metabolomics data, the MaRR procedure effectively controls the False Discovery Rate (FDR) when there was a gradual reduction in correlation between replicate pairs for less highly ranked signals. Simulation studies also show that the MaRR procedure tends to have high power for detecting reproducible metabolites in most situations except for smaller values of proportion of reproducible metabolites. Bias (i.e., the difference between the estimated and the true value of reproducible signal proportions) values for simulations are also close to zero. The results reported from the real data show a higher level of reproducibility for technical replicates compared to biological replicates across all the three different datasets. In summary, we demonstrate that the MaRR procedure application can be adapted to various experimental designs, and that the nonparametric approach performs consistently well. Conclusions This research was motivated by reproducibility, which has proven to be a major obstacle in the use of genomic findings to advance clinical practice. In this paper, we developed a data-driven approach to assess the reproducibility of MS-Metabolomics data sets. The methods described in this paper are implemented in the open-source R package marr, which is freely available from Bioconductor at http://bioconductor.org/packages/marr .
first_indexed 2024-12-19T22:35:27Z
format Article
id doaj.art-fbd6f05e7d29414b9ff245a1633d4f3f
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-19T22:35:27Z
publishDate 2021-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-fbd6f05e7d29414b9ff245a1633d4f3f2022-12-21T20:03:13ZengBMCBMC Bioinformatics1471-21052021-09-0122112510.1186/s12859-021-04336-9Reproducibility of mass spectrometry based metabolomics dataTusharkanti Ghosh0Daisy Philtron1Weiming Zhang2Katerina Kechris3Debashis Ghosh4Colorado School of Public Health, University of Colorado, Anschutz Medical CampusEberly College of Science, Penn State UniversitySyneos HealthColorado School of Public Health, University of Colorado, Anschutz Medical CampusColorado School of Public Health, University of Colorado, Anschutz Medical CampusAbstract Background Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, metabolites which are not consistent across replicates can be labeled as irreproducible. In this work, we introduce and evaluate the use (Ma)ximum (R)ank (R)eproducibility (MaRR) to examine reproducibility in mass spectrometry-based metabolomics experiments. We examine reproducibility across technical or biological samples in three different mass spectrometry metabolomics (MS-Metabolomics) data sets. Results We apply MaRR, a nonparametric approach that detects the change from reproducible to irreproducible signals using a maximal rank statistic. The advantage of using MaRR over model-based methods that it does not make parametric assumptions on the underlying distributions or dependence structures of reproducible metabolites. Using three MS Metabolomics data sets generated in the multi-center Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPD) study, we applied the MaRR procedure after data processing to explore reproducibility across technical or biological samples. Under realistic settings of MS-Metabolomics data, the MaRR procedure effectively controls the False Discovery Rate (FDR) when there was a gradual reduction in correlation between replicate pairs for less highly ranked signals. Simulation studies also show that the MaRR procedure tends to have high power for detecting reproducible metabolites in most situations except for smaller values of proportion of reproducible metabolites. Bias (i.e., the difference between the estimated and the true value of reproducible signal proportions) values for simulations are also close to zero. The results reported from the real data show a higher level of reproducibility for technical replicates compared to biological replicates across all the three different datasets. In summary, we demonstrate that the MaRR procedure application can be adapted to various experimental designs, and that the nonparametric approach performs consistently well. Conclusions This research was motivated by reproducibility, which has proven to be a major obstacle in the use of genomic findings to advance clinical practice. In this paper, we developed a data-driven approach to assess the reproducibility of MS-Metabolomics data sets. The methods described in this paper are implemented in the open-source R package marr, which is freely available from Bioconductor at http://bioconductor.org/packages/marr .https://doi.org/10.1186/s12859-021-04336-9ReproducibilityMass spectrometryMetabolomics
spellingShingle Tusharkanti Ghosh
Daisy Philtron
Weiming Zhang
Katerina Kechris
Debashis Ghosh
Reproducibility of mass spectrometry based metabolomics data
BMC Bioinformatics
Reproducibility
Mass spectrometry
Metabolomics
title Reproducibility of mass spectrometry based metabolomics data
title_full Reproducibility of mass spectrometry based metabolomics data
title_fullStr Reproducibility of mass spectrometry based metabolomics data
title_full_unstemmed Reproducibility of mass spectrometry based metabolomics data
title_short Reproducibility of mass spectrometry based metabolomics data
title_sort reproducibility of mass spectrometry based metabolomics data
topic Reproducibility
Mass spectrometry
Metabolomics
url https://doi.org/10.1186/s12859-021-04336-9
work_keys_str_mv AT tusharkantighosh reproducibilityofmassspectrometrybasedmetabolomicsdata
AT daisyphiltron reproducibilityofmassspectrometrybasedmetabolomicsdata
AT weimingzhang reproducibilityofmassspectrometrybasedmetabolomicsdata
AT katerinakechris reproducibilityofmassspectrometrybasedmetabolomicsdata
AT debashisghosh reproducibilityofmassspectrometrybasedmetabolomicsdata