MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data

Abstract Mass spectrometry technology can realize dynamic detection of many complex matrix samples in a simple, rapid, compassionate, precise, and high-throughput manner and has become an indispensable tool in accurate diagnosis. The mass spectrometry data analysis is mainly to analyze all metabolit...

Full description

Bibliographic Details
Main Authors: Xin Feng, Zheyuan Dong, Yingrui Li, Qian Cheng, Yongxian Xin, Qiaolin Lu, Ruihao Xin
Format: Article
Language:English
Published: Nature Portfolio 2023-09-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-42395-5
_version_ 1797576757913583616
author Xin Feng
Zheyuan Dong
Yingrui Li
Qian Cheng
Yongxian Xin
Qiaolin Lu
Ruihao Xin
author_facet Xin Feng
Zheyuan Dong
Yingrui Li
Qian Cheng
Yongxian Xin
Qiaolin Lu
Ruihao Xin
author_sort Xin Feng
collection DOAJ
description Abstract Mass spectrometry technology can realize dynamic detection of many complex matrix samples in a simple, rapid, compassionate, precise, and high-throughput manner and has become an indispensable tool in accurate diagnosis. The mass spectrometry data analysis is mainly to analyze all metabolites in the organism quantitatively and to find the relative relationship between metabolites and physiological and pathological changes. A feature construction of mass spectrometry data (MSFS) method is proposed to construct the features of the original mass spectrometry data, so as to reduce the noise in the mass spectrometry data, reduce the redundancy of the original data and improve the information content of the data. Chi-square test is used to select the optimal non-redundant feature subset from high-dimensional features. And the optimal feature subset is visually analyzed and corresponds to the original mass spectrum interval. Training in 10 kinds of supervised learning models, and evaluating the classification effect of the models through various evaluation indexes. Taking two public mass spectrometry datasets as examples, the feasibility of the method proposed in this paper is verified. In the coronary heart disease dataset, during the identification process of mixed batch samples, the classification accuracy on the test set reached 1.000; During the recognition process, the classification accuracy on the test set advanced to 0.979. On the colorectal liver metastases data set, the classification accuracy on the test set reached 1.000. This paper attempts to use a new raw mass spectrometry data preprocessing method to realize the alignment operation of the raw mass spectrometry data, which significantly improves the classification accuracy and provides another new idea for mass spectrometry data analysis. Compared with MetaboAnalyst software and existing experimental results, the method proposed in this paper has obtained better classification results.
first_indexed 2024-03-10T21:58:17Z
format Article
id doaj.art-3c9cd269baf941f38eba631eb0c3f6f6
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-10T21:58:17Z
publishDate 2023-09-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-3c9cd269baf941f38eba631eb0c3f6f62023-11-19T13:01:59ZengNature PortfolioScientific Reports2045-23222023-09-0113111210.1038/s41598-023-42395-5MSFC: a new feature construction method for accurate diagnosis of mass spectrometry dataXin Feng0Zheyuan Dong1Yingrui Li2Qian Cheng3Yongxian Xin4Qiaolin Lu5Ruihao Xin6School of Science, Jilin Institute of Chemical TechnologyCollege of Information and Control Engineering, Jilin Institute of Chemical TechnologyCollege of Information and Control Engineering, Jilin Institute of Chemical TechnologyCollege of Information and Control Engineering, Jilin Institute of Chemical TechnologyCollege of Business and Economics, Australian National UniversitySchool of Artificial Intelligence, Jilin UniversityCollege of Information and Control Engineering, Jilin Institute of Chemical TechnologyAbstract Mass spectrometry technology can realize dynamic detection of many complex matrix samples in a simple, rapid, compassionate, precise, and high-throughput manner and has become an indispensable tool in accurate diagnosis. The mass spectrometry data analysis is mainly to analyze all metabolites in the organism quantitatively and to find the relative relationship between metabolites and physiological and pathological changes. A feature construction of mass spectrometry data (MSFS) method is proposed to construct the features of the original mass spectrometry data, so as to reduce the noise in the mass spectrometry data, reduce the redundancy of the original data and improve the information content of the data. Chi-square test is used to select the optimal non-redundant feature subset from high-dimensional features. And the optimal feature subset is visually analyzed and corresponds to the original mass spectrum interval. Training in 10 kinds of supervised learning models, and evaluating the classification effect of the models through various evaluation indexes. Taking two public mass spectrometry datasets as examples, the feasibility of the method proposed in this paper is verified. In the coronary heart disease dataset, during the identification process of mixed batch samples, the classification accuracy on the test set reached 1.000; During the recognition process, the classification accuracy on the test set advanced to 0.979. On the colorectal liver metastases data set, the classification accuracy on the test set reached 1.000. This paper attempts to use a new raw mass spectrometry data preprocessing method to realize the alignment operation of the raw mass spectrometry data, which significantly improves the classification accuracy and provides another new idea for mass spectrometry data analysis. Compared with MetaboAnalyst software and existing experimental results, the method proposed in this paper has obtained better classification results.https://doi.org/10.1038/s41598-023-42395-5
spellingShingle Xin Feng
Zheyuan Dong
Yingrui Li
Qian Cheng
Yongxian Xin
Qiaolin Lu
Ruihao Xin
MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data
Scientific Reports
title MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data
title_full MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data
title_fullStr MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data
title_full_unstemmed MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data
title_short MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data
title_sort msfc a new feature construction method for accurate diagnosis of mass spectrometry data
url https://doi.org/10.1038/s41598-023-42395-5
work_keys_str_mv AT xinfeng msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata
AT zheyuandong msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata
AT yingruili msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata
AT qiancheng msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata
AT yongxianxin msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata
AT qiaolinlu msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata
AT ruihaoxin msfcanewfeatureconstructionmethodforaccuratediagnosisofmassspectrometrydata