NMF-based approach for missing values imputation of mass spectrometry metabolomics data

In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputat...

Full description

Bibliographic Details
Main Authors: Xu, Jingjing, Wang, Yuanshan, Xu, Xiangnan, Cheng, Kian Kai, Raftery, Daniel, Dong, Jiyang
Format: Article
Language:English
Published: MDPI 2021
Subjects:
Online Access:http://eprints.utm.my/94205/1/ChengKianKai2021_NMFBasedApproachforMissing.pdf
_version_ 1796865776400990208
author Xu, Jingjing
Wang, Yuanshan
Xu, Xiangnan
Cheng, Kian Kai
Raftery, Daniel
Dong, Jiyang
author_facet Xu, Jingjing
Wang, Yuanshan
Xu, Xiangnan
Cheng, Kian Kai
Raftery, Daniel
Dong, Jiyang
author_sort Xu, Jingjing
collection ePrints
description In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data.
first_indexed 2024-03-05T21:02:13Z
format Article
id utm.eprints-94205
institution Universiti Teknologi Malaysia - ePrints
language English
last_indexed 2024-03-05T21:02:13Z
publishDate 2021
publisher MDPI
record_format dspace
spelling utm.eprints-942052022-05-31T12:37:26Z http://eprints.utm.my/94205/ NMF-based approach for missing values imputation of mass spectrometry metabolomics data Xu, Jingjing Wang, Yuanshan Xu, Xiangnan Cheng, Kian Kai Raftery, Daniel Dong, Jiyang Q Science (General) In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data. MDPI 2021-10-01 Article PeerReviewed application/pdf en http://eprints.utm.my/94205/1/ChengKianKai2021_NMFBasedApproachforMissing.pdf Xu, Jingjing and Wang, Yuanshan and Xu, Xiangnan and Cheng, Kian Kai and Raftery, Daniel and Dong, Jiyang (2021) NMF-based approach for missing values imputation of mass spectrometry metabolomics data. Molecules, 26 (19). pp. 1-14. ISSN 1420-3049 http://dx.doi.org/10.3390/molecules26195787 DOI:10.3390/molecules26195787
spellingShingle Q Science (General)
Xu, Jingjing
Wang, Yuanshan
Xu, Xiangnan
Cheng, Kian Kai
Raftery, Daniel
Dong, Jiyang
NMF-based approach for missing values imputation of mass spectrometry metabolomics data
title NMF-based approach for missing values imputation of mass spectrometry metabolomics data
title_full NMF-based approach for missing values imputation of mass spectrometry metabolomics data
title_fullStr NMF-based approach for missing values imputation of mass spectrometry metabolomics data
title_full_unstemmed NMF-based approach for missing values imputation of mass spectrometry metabolomics data
title_short NMF-based approach for missing values imputation of mass spectrometry metabolomics data
title_sort nmf based approach for missing values imputation of mass spectrometry metabolomics data
topic Q Science (General)
url http://eprints.utm.my/94205/1/ChengKianKai2021_NMFBasedApproachforMissing.pdf
work_keys_str_mv AT xujingjing nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT wangyuanshan nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT xuxiangnan nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT chengkiankai nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT rafterydaniel nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT dongjiyang nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata