A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models

There are potentially infinite gene expression markers for Lung Squamous Cell Carcinoma. This results in a high-dimensional data with a large number of features. The selection of relevant markers for analysis is thus, of utmost importance. In our study, we have aimed to select a subset of prominent...

Full description

Bibliographic Details
Main Authors: Atanu Bhattacharjee, Samudranil Basak, Pragya Kumari
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442523000357
_version_ 1827917379039920128
author Atanu Bhattacharjee
Samudranil Basak
Pragya Kumari
author_facet Atanu Bhattacharjee
Samudranil Basak
Pragya Kumari
author_sort Atanu Bhattacharjee
collection DOAJ
description There are potentially infinite gene expression markers for Lung Squamous Cell Carcinoma. This results in a high-dimensional data with a large number of features. The selection of relevant markers for analysis is thus, of utmost importance. In our study, we have aimed to select a subset of prominent and significant features from 31918 features of gene expressions. Analysis is then performed on the selected features using the Cox Proportional Hazards Model to know how each marker affects the survival estimates of a patient. We have employed a two-step selection process to select a subset of markers. The first step is done by L1 regularized Cox PH. Then the selected markers are screened a second time by running a univariate Cox PH model and checking for the p-value of each bio-marker via Wald inference (p<0.05). Once the final selection is made, we estimate the Hazard Ratio and Confidence intervals using Maximum Likelihood Estimates (MLE) and the Bayesian Approach with the Cox Proportional Hazards Model (CPH) and the Accelerated Failure Time Model (AFT) as an alternative. A forest plot has also been generated to show the graphical representation of the meta-analysis done in the study. With the proposed selection procedure we have managed to find a suitable subset out of a large number of variables available. The features selected have been analyzed and their validity has been confirmed by using survival models.
first_indexed 2024-03-13T03:26:55Z
format Article
id doaj.art-0d9a3deb3ce7423aa15d25faac8cb3c4
institution Directory Open Access Journal
issn 2772-4425
language English
last_indexed 2024-03-13T03:26:55Z
publishDate 2023-11-01
publisher Elsevier
record_format Article
series Healthcare Analytics
spelling doaj.art-0d9a3deb3ce7423aa15d25faac8cb3c42023-06-25T04:44:15ZengElsevierHealthcare Analytics2772-44252023-11-013100168A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival modelsAtanu Bhattacharjee0Samudranil Basak1Pragya Kumari2Leicester Real World Evidence Unit, University of Leicester, Leicester, LE1 7RH, United KingdomDepartment of Statistics, Pondicherry University, Kalapet, 605014, Pondicherry, India; Corresponding author.Department of Mathematics and Computing, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, 826004, Jharkhand, IndiaThere are potentially infinite gene expression markers for Lung Squamous Cell Carcinoma. This results in a high-dimensional data with a large number of features. The selection of relevant markers for analysis is thus, of utmost importance. In our study, we have aimed to select a subset of prominent and significant features from 31918 features of gene expressions. Analysis is then performed on the selected features using the Cox Proportional Hazards Model to know how each marker affects the survival estimates of a patient. We have employed a two-step selection process to select a subset of markers. The first step is done by L1 regularized Cox PH. Then the selected markers are screened a second time by running a univariate Cox PH model and checking for the p-value of each bio-marker via Wald inference (p<0.05). Once the final selection is made, we estimate the Hazard Ratio and Confidence intervals using Maximum Likelihood Estimates (MLE) and the Bayesian Approach with the Cox Proportional Hazards Model (CPH) and the Accelerated Failure Time Model (AFT) as an alternative. A forest plot has also been generated to show the graphical representation of the meta-analysis done in the study. With the proposed selection procedure we have managed to find a suitable subset out of a large number of variables available. The features selected have been analyzed and their validity has been confirmed by using survival models.http://www.sciencedirect.com/science/article/pii/S2772442523000357Lung CancerFeature selectionHigh-dimensionalLasso Cox ModelCox Proportional Hazard ModelAccelerated Failure Time Model
spellingShingle Atanu Bhattacharjee
Samudranil Basak
Pragya Kumari
A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models
Healthcare Analytics
Lung Cancer
Feature selection
High-dimensional
Lasso Cox Model
Cox Proportional Hazard Model
Accelerated Failure Time Model
title A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models
title_full A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models
title_fullStr A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models
title_full_unstemmed A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models
title_short A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models
title_sort two step feature selection procedure for relevant markers of squamous cell lung carcinoma using different survival models
topic Lung Cancer
Feature selection
High-dimensional
Lasso Cox Model
Cox Proportional Hazard Model
Accelerated Failure Time Model
url http://www.sciencedirect.com/science/article/pii/S2772442523000357
work_keys_str_mv AT atanubhattacharjee atwostepfeatureselectionprocedureforrelevantmarkersofsquamouscelllungcarcinomausingdifferentsurvivalmodels
AT samudranilbasak atwostepfeatureselectionprocedureforrelevantmarkersofsquamouscelllungcarcinomausingdifferentsurvivalmodels
AT pragyakumari atwostepfeatureselectionprocedureforrelevantmarkersofsquamouscelllungcarcinomausingdifferentsurvivalmodels
AT atanubhattacharjee twostepfeatureselectionprocedureforrelevantmarkersofsquamouscelllungcarcinomausingdifferentsurvivalmodels
AT samudranilbasak twostepfeatureselectionprocedureforrelevantmarkersofsquamouscelllungcarcinomausingdifferentsurvivalmodels
AT pragyakumari twostepfeatureselectionprocedureforrelevantmarkersofsquamouscelllungcarcinomausingdifferentsurvivalmodels