NMF-based approach for missing values imputation of mass spectrometry metabolomics data
In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputat...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI
2021
|
Subjects: | |
Online Access: | http://eprints.utm.my/94205/1/ChengKianKai2021_NMFBasedApproachforMissing.pdf |
_version_ | 1796865776400990208 |
---|---|
author | Xu, Jingjing Wang, Yuanshan Xu, Xiangnan Cheng, Kian Kai Raftery, Daniel Dong, Jiyang |
author_facet | Xu, Jingjing Wang, Yuanshan Xu, Xiangnan Cheng, Kian Kai Raftery, Daniel Dong, Jiyang |
author_sort | Xu, Jingjing |
collection | ePrints |
description | In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data. |
first_indexed | 2024-03-05T21:02:13Z |
format | Article |
id | utm.eprints-94205 |
institution | Universiti Teknologi Malaysia - ePrints |
language | English |
last_indexed | 2024-03-05T21:02:13Z |
publishDate | 2021 |
publisher | MDPI |
record_format | dspace |
spelling | utm.eprints-942052022-05-31T12:37:26Z http://eprints.utm.my/94205/ NMF-based approach for missing values imputation of mass spectrometry metabolomics data Xu, Jingjing Wang, Yuanshan Xu, Xiangnan Cheng, Kian Kai Raftery, Daniel Dong, Jiyang Q Science (General) In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data. MDPI 2021-10-01 Article PeerReviewed application/pdf en http://eprints.utm.my/94205/1/ChengKianKai2021_NMFBasedApproachforMissing.pdf Xu, Jingjing and Wang, Yuanshan and Xu, Xiangnan and Cheng, Kian Kai and Raftery, Daniel and Dong, Jiyang (2021) NMF-based approach for missing values imputation of mass spectrometry metabolomics data. Molecules, 26 (19). pp. 1-14. ISSN 1420-3049 http://dx.doi.org/10.3390/molecules26195787 DOI:10.3390/molecules26195787 |
spellingShingle | Q Science (General) Xu, Jingjing Wang, Yuanshan Xu, Xiangnan Cheng, Kian Kai Raftery, Daniel Dong, Jiyang NMF-based approach for missing values imputation of mass spectrometry metabolomics data |
title | NMF-based approach for missing values imputation of mass spectrometry metabolomics data |
title_full | NMF-based approach for missing values imputation of mass spectrometry metabolomics data |
title_fullStr | NMF-based approach for missing values imputation of mass spectrometry metabolomics data |
title_full_unstemmed | NMF-based approach for missing values imputation of mass spectrometry metabolomics data |
title_short | NMF-based approach for missing values imputation of mass spectrometry metabolomics data |
title_sort | nmf based approach for missing values imputation of mass spectrometry metabolomics data |
topic | Q Science (General) |
url | http://eprints.utm.my/94205/1/ChengKianKai2021_NMFBasedApproachforMissing.pdf |
work_keys_str_mv | AT xujingjing nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata AT wangyuanshan nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata AT xuxiangnan nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata AT chengkiankai nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata AT rafterydaniel nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata AT dongjiyang nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata |