Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival

Lung cancer is a global health challenge, hindered by delayed diagnosis and the disease’s complex molecular landscape. Accurate patient survival prediction is critical, motivating the exploration of various -omics datasets using machine learning methods. Leveraging multi-omics data, this study seeks...

Full description

Bibliographic Details
Main Authors:	Roman Jaksik, Kamila Szumała, Khanh Ngoc Dinh, Jarosław Śmieja
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	International Journal of Molecular Sciences
Subjects:	multiomics data feature selection feature extraction machine learning next-generation sequencing lung cancer
Online Access:	https://www.mdpi.com/1422-0067/25/7/3661

_version_	1797212532297957376
author	Roman Jaksik Kamila Szumała Khanh Ngoc Dinh Jarosław Śmieja
author_facet	Roman Jaksik Kamila Szumała Khanh Ngoc Dinh Jarosław Śmieja
author_sort	Roman Jaksik
collection	DOAJ
description	Lung cancer is a global health challenge, hindered by delayed diagnosis and the disease’s complex molecular landscape. Accurate patient survival prediction is critical, motivating the exploration of various -omics datasets using machine learning methods. Leveraging multi-omics data, this study seeks to enhance the accuracy of survival prediction by proposing new feature extraction techniques combined with unbiased feature selection. Two lung adenocarcinoma multi-omics datasets, originating from the TCGA and CPTAC-3 projects, were employed for this purpose, emphasizing gene expression, methylation, and mutations as the most relevant data sources that provide features for the survival prediction models. Additionally, gene set aggregation was shown to be the most effective feature extraction method for mutation and copy number variation data. Using the TCGA dataset, we identified 32 molecular features that allowed the construction of a 2-year survival prediction model with an AUC of 0.839. The selected features were additionally tested on an independent CPTAC-3 dataset, achieving an AUC of 0.815 in nested cross-validation, which confirmed the robustness of the identified features.
first_indexed	2024-04-24T10:43:53Z
format	Article
id	doaj.art-e8bac40c774440739c5a5c3e998f6aa0
institution	Directory Open Access Journal
issn	1661-6596 1422-0067
language	English
last_indexed	2024-04-24T10:43:53Z
publishDate	2024-03-01
publisher	MDPI AG
record_format	Article
series	International Journal of Molecular Sciences
spelling	doaj.art-e8bac40c774440739c5a5c3e998f6aa02024-04-12T13:19:23ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672024-03-01257366110.3390/ijms25073661Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer SurvivalRoman Jaksik0Kamila Szumała1Khanh Ngoc Dinh2Jarosław Śmieja3Department of Systems Biology and Engineering, Silesian University of Technology, 44-100 Gliwice, PolandFaculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, 44-100 Gliwice, PolandIrving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY 10027, USADepartment of Systems Biology and Engineering, Silesian University of Technology, 44-100 Gliwice, PolandLung cancer is a global health challenge, hindered by delayed diagnosis and the disease’s complex molecular landscape. Accurate patient survival prediction is critical, motivating the exploration of various -omics datasets using machine learning methods. Leveraging multi-omics data, this study seeks to enhance the accuracy of survival prediction by proposing new feature extraction techniques combined with unbiased feature selection. Two lung adenocarcinoma multi-omics datasets, originating from the TCGA and CPTAC-3 projects, were employed for this purpose, emphasizing gene expression, methylation, and mutations as the most relevant data sources that provide features for the survival prediction models. Additionally, gene set aggregation was shown to be the most effective feature extraction method for mutation and copy number variation data. Using the TCGA dataset, we identified 32 molecular features that allowed the construction of a 2-year survival prediction model with an AUC of 0.839. The selected features were additionally tested on an independent CPTAC-3 dataset, achieving an AUC of 0.815 in nested cross-validation, which confirmed the robustness of the identified features.https://www.mdpi.com/1422-0067/25/7/3661multiomics datafeature selectionfeature extractionmachine learningnext-generation sequencinglung cancer
spellingShingle	Roman Jaksik Kamila Szumała Khanh Ngoc Dinh Jarosław Śmieja Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival International Journal of Molecular Sciences multiomics data feature selection feature extraction machine learning next-generation sequencing lung cancer
title	Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival
title_full	Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival
title_fullStr	Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival
title_full_unstemmed	Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival
title_short	Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival
title_sort	multiomics based feature extraction and selection for the prediction of lung cancer survival
topic	multiomics data feature selection feature extraction machine learning next-generation sequencing lung cancer
url	https://www.mdpi.com/1422-0067/25/7/3661
work_keys_str_mv	AT romanjaksik multiomicsbasedfeatureextractionandselectionforthepredictionoflungcancersurvival AT kamilaszumała multiomicsbasedfeatureextractionandselectionforthepredictionoflungcancersurvival AT khanhngocdinh multiomicsbasedfeatureextractionandselectionforthepredictionoflungcancersurvival AT jarosławsmieja multiomicsbasedfeatureextractionandselectionforthepredictionoflungcancersurvival

Multiomics-Based Feature Extraction and Selection for the Prediction of Lung Cancer Survival

Similar Items