NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data

In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputat...

Full description

Bibliographic Details
Main Authors: Jingjing Xu, Yuanshan Wang, Xiangnan Xu, Kian-Kai Cheng, Daniel Raftery, Jiyang Dong
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/26/19/5787
_version_ 1797516014129250304
author Jingjing Xu
Yuanshan Wang
Xiangnan Xu
Kian-Kai Cheng
Daniel Raftery
Jiyang Dong
author_facet Jingjing Xu
Yuanshan Wang
Xiangnan Xu
Kian-Kai Cheng
Daniel Raftery
Jiyang Dong
author_sort Jingjing Xu
collection DOAJ
description In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data.
first_indexed 2024-03-10T06:55:20Z
format Article
id doaj.art-bb026e9f709d43dc9dbf781b597b5388
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-10T06:55:20Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-bb026e9f709d43dc9dbf781b597b53882023-11-22T16:32:37ZengMDPI AGMolecules1420-30492021-09-012619578710.3390/molecules26195787NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics DataJingjing Xu0Yuanshan Wang1Xiangnan Xu2Kian-Kai Cheng3Daniel Raftery4Jiyang Dong5Department of Electronic Science, Xiamen University, Xiamen 361005, ChinaDepartment of Electronic Science, Xiamen University, Xiamen 361005, ChinaSchool of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, AustraliaInnovation Centre in Agritechnology, Universiti Teknologi Malaysia, Johor, Muar 84600, MalaysiaNorthwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA 98109, USADepartment of Electronic Science, Xiamen University, Xiamen 361005, ChinaIn mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data.https://www.mdpi.com/1420-3049/26/19/5787non-negative matrix factorizationmissing values imputationmass spectrometrymetabolomics datamissing patternoutliers
spellingShingle Jingjing Xu
Yuanshan Wang
Xiangnan Xu
Kian-Kai Cheng
Daniel Raftery
Jiyang Dong
NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
Molecules
non-negative matrix factorization
missing values imputation
mass spectrometry
metabolomics data
missing pattern
outliers
title NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_full NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_fullStr NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_full_unstemmed NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_short NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data
title_sort nmf based approach for missing values imputation of mass spectrometry metabolomics data
topic non-negative matrix factorization
missing values imputation
mass spectrometry
metabolomics data
missing pattern
outliers
url https://www.mdpi.com/1420-3049/26/19/5787
work_keys_str_mv AT jingjingxu nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT yuanshanwang nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT xiangnanxu nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT kiankaicheng nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT danielraftery nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata
AT jiyangdong nmfbasedapproachformissingvaluesimputationofmassspectrometrymetabolomicsdata