A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
Abstract Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-04-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-024-58241-1 |
_version_ | 1797219968563019776 |
---|---|
author | Benedetta Gottardelli Varsha Gouthamchand Carlotta Masciocchi Luca Boldrini Antonella Martino Ciro Mazzarella Mariangela Massaccesi René Monshouwer Jeroen Findhammer Leonard Wee Andre Dekker Maria Antonietta Gambacorta Andrea Damiani |
author_facet | Benedetta Gottardelli Varsha Gouthamchand Carlotta Masciocchi Luca Boldrini Antonella Martino Ciro Mazzarella Mariangela Massaccesi René Monshouwer Jeroen Findhammer Leonard Wee Andre Dekker Maria Antonietta Gambacorta Andrea Damiani |
author_sort | Benedetta Gottardelli |
collection | DOAJ |
description | Abstract Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined can help overcome these issues, as one provides the means of training models without exchanging sensitive data, while the other identifies the most informative features, reduces overfitting, and improves model interpretability. Our proposed FS pipeline based on FL principles targets data-driven radiomics FS in a multivariate survival study of non-small cell lung cancer patients. The pipeline was run across datasets from three institutions without patient-level data exchange. It includes two FS techniques, Correlation-based Feature Selection and LASSO regularization, and Cox Proportional-Hazard regression with Overall Survival as endpoint. Trained and validated on 828 patients overall, our pipeline yielded a radiomic signature comprising "intensity-based energy" and "mean discretised intensity". Validation resulted in a mean Harrell C-index of 0.59, showcasing fair efficacy in risk stratification. In conclusion, we suggest a distributed radiomics approach that incorporates preliminary feature selection to systematically decrease the feature set based on data-driven considerations. This aims to address dimensionality challenges beyond those associated with data constraints and interpretability concerns. |
first_indexed | 2024-04-24T12:42:04Z |
format | Article |
id | doaj.art-f46a4dfb4f9e4a9782e2fb980168130c |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-24T12:42:04Z |
publishDate | 2024-04-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-f46a4dfb4f9e4a9782e2fb980168130c2024-04-07T11:13:38ZengNature PortfolioScientific Reports2045-23222024-04-0114111210.1038/s41598-024-58241-1A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patientsBenedetta Gottardelli0Varsha Gouthamchand1Carlotta Masciocchi2Luca Boldrini3Antonella Martino4Ciro Mazzarella5Mariangela Massaccesi6René Monshouwer7Jeroen Findhammer8Leonard Wee9Andre Dekker10Maria Antonietta Gambacorta11Andrea Damiani12Department of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Università Cattolica del Sacro CuoreClinical Data Science, GROW School of Oncology and Reproduction, Maastricht UniversityReal World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Radiation Oncology, Radboud University Medical CenterDepartment of Radiation Oncology, Radboud University Medical CenterDepartment of Radiation Oncology (Maastro), GROW-School for Oncology and Reproduction, Maastricht University Medical Centre+Department of Radiation Oncology (Maastro), GROW-School for Oncology and Reproduction, Maastricht University Medical Centre+Department of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSReal World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario Agostino Gemelli IRCCSAbstract Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined can help overcome these issues, as one provides the means of training models without exchanging sensitive data, while the other identifies the most informative features, reduces overfitting, and improves model interpretability. Our proposed FS pipeline based on FL principles targets data-driven radiomics FS in a multivariate survival study of non-small cell lung cancer patients. The pipeline was run across datasets from three institutions without patient-level data exchange. It includes two FS techniques, Correlation-based Feature Selection and LASSO regularization, and Cox Proportional-Hazard regression with Overall Survival as endpoint. Trained and validated on 828 patients overall, our pipeline yielded a radiomic signature comprising "intensity-based energy" and "mean discretised intensity". Validation resulted in a mean Harrell C-index of 0.59, showcasing fair efficacy in risk stratification. In conclusion, we suggest a distributed radiomics approach that incorporates preliminary feature selection to systematically decrease the feature set based on data-driven considerations. This aims to address dimensionality challenges beyond those associated with data constraints and interpretability concerns.https://doi.org/10.1038/s41598-024-58241-1Distributed learningFeature selectionRadiomicsNSCLC |
spellingShingle | Benedetta Gottardelli Varsha Gouthamchand Carlotta Masciocchi Luca Boldrini Antonella Martino Ciro Mazzarella Mariangela Massaccesi René Monshouwer Jeroen Findhammer Leonard Wee Andre Dekker Maria Antonietta Gambacorta Andrea Damiani A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients Scientific Reports Distributed learning Feature selection Radiomics NSCLC |
title | A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients |
title_full | A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients |
title_fullStr | A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients |
title_full_unstemmed | A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients |
title_short | A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients |
title_sort | distributed feature selection pipeline for survival analysis using radiomics in non small cell lung cancer patients |
topic | Distributed learning Feature selection Radiomics NSCLC |
url | https://doi.org/10.1038/s41598-024-58241-1 |
work_keys_str_mv | AT benedettagottardelli adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT varshagouthamchand adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT carlottamasciocchi adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT lucaboldrini adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT antonellamartino adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT ciromazzarella adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT mariangelamassaccesi adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT renemonshouwer adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT jeroenfindhammer adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT leonardwee adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT andredekker adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT mariaantoniettagambacorta adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT andreadamiani adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT benedettagottardelli distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT varshagouthamchand distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT carlottamasciocchi distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT lucaboldrini distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT antonellamartino distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT ciromazzarella distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT mariangelamassaccesi distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT renemonshouwer distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT jeroenfindhammer distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT leonardwee distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT andredekker distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT mariaantoniettagambacorta distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients AT andreadamiani distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients |