Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data

Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We e...

Full description

Bibliographic Details
Main Authors: Isaac Ampong, Kip D. Zimmerman, Peter W. Nathanielsz, Laura A. Cox, Michael Olivier
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/12/5/429
_version_ 1797497942579347456
author Isaac Ampong
Kip D. Zimmerman
Peter W. Nathanielsz
Laura A. Cox
Michael Olivier
author_facet Isaac Ampong
Kip D. Zimmerman
Peter W. Nathanielsz
Laura A. Cox
Michael Olivier
author_sort Isaac Ampong
collection DOAJ
description Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and <i>p</i>-values.
first_indexed 2024-03-10T03:26:20Z
format Article
id doaj.art-b5fe8aba01df4dbbafea541a6945ec3b
institution Directory Open Access Journal
issn 2218-1989
language English
last_indexed 2024-03-10T03:26:20Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Metabolites
spelling doaj.art-b5fe8aba01df4dbbafea541a6945ec3b2023-11-23T12:07:18ZengMDPI AGMetabolites2218-19892022-05-0112542910.3390/metabo12050429Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics DataIsaac Ampong0Kip D. Zimmerman1Peter W. Nathanielsz2Laura A. Cox3Michael Olivier4Center for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University, Winston-Salem, NC 27157, USACenter for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University, Winston-Salem, NC 27157, USACenter for the Study of Fetal Programming, University of Wyoming, Laramie, WY 82071, USACenter for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University, Winston-Salem, NC 27157, USACenter for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University, Winston-Salem, NC 27157, USAGas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and <i>p</i>-values.https://www.mdpi.com/2218-1989/12/5/429metabolomicsHR GC–MSimputation missing values
spellingShingle Isaac Ampong
Kip D. Zimmerman
Peter W. Nathanielsz
Laura A. Cox
Michael Olivier
Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
Metabolites
metabolomics
HR GC–MS
imputation missing values
title Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_full Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_fullStr Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_full_unstemmed Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_short Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data
title_sort optimization of imputation strategies for high resolution gas chromatography mass spectrometry hr gc ms metabolomics data
topic metabolomics
HR GC–MS
imputation missing values
url https://www.mdpi.com/2218-1989/12/5/429
work_keys_str_mv AT isaacampong optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT kipdzimmerman optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT peterwnathanielsz optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT lauraacox optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata
AT michaelolivier optimizationofimputationstrategiesforhighresolutiongaschromatographymassspectrometryhrgcmsmetabolomicsdata