A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients

Abstract Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined...

Full description

Bibliographic Details
Main Authors: Benedetta Gottardelli, Varsha Gouthamchand, Carlotta Masciocchi, Luca Boldrini, Antonella Martino, Ciro Mazzarella, Mariangela Massaccesi, René Monshouwer, Jeroen Findhammer, Leonard Wee, Andre Dekker, Maria Antonietta Gambacorta, Andrea Damiani
Format: Article
Language:English
Published: Nature Portfolio 2024-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-58241-1
_version_ 1797219968563019776
author Benedetta Gottardelli
Varsha Gouthamchand
Carlotta Masciocchi
Luca Boldrini
Antonella Martino
Ciro Mazzarella
Mariangela Massaccesi
René Monshouwer
Jeroen Findhammer
Leonard Wee
Andre Dekker
Maria Antonietta Gambacorta
Andrea Damiani
author_facet Benedetta Gottardelli
Varsha Gouthamchand
Carlotta Masciocchi
Luca Boldrini
Antonella Martino
Ciro Mazzarella
Mariangela Massaccesi
René Monshouwer
Jeroen Findhammer
Leonard Wee
Andre Dekker
Maria Antonietta Gambacorta
Andrea Damiani
author_sort Benedetta Gottardelli
collection DOAJ
description Abstract Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined can help overcome these issues, as one provides the means of training models without exchanging sensitive data, while the other identifies the most informative features, reduces overfitting, and improves model interpretability. Our proposed FS pipeline based on FL principles targets data-driven radiomics FS in a multivariate survival study of non-small cell lung cancer patients. The pipeline was run across datasets from three institutions without patient-level data exchange. It includes two FS techniques, Correlation-based Feature Selection and LASSO regularization, and Cox Proportional-Hazard regression with Overall Survival as endpoint. Trained and validated on 828 patients overall, our pipeline yielded a radiomic signature comprising "intensity-based energy" and "mean discretised intensity". Validation resulted in a mean Harrell C-index of 0.59, showcasing fair efficacy in risk stratification. In conclusion, we suggest a distributed radiomics approach that incorporates preliminary feature selection to systematically decrease the feature set based on data-driven considerations. This aims to address dimensionality challenges beyond those associated with data constraints and interpretability concerns.
first_indexed 2024-04-24T12:42:04Z
format Article
id doaj.art-f46a4dfb4f9e4a9782e2fb980168130c
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-24T12:42:04Z
publishDate 2024-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-f46a4dfb4f9e4a9782e2fb980168130c2024-04-07T11:13:38ZengNature PortfolioScientific Reports2045-23222024-04-0114111210.1038/s41598-024-58241-1A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patientsBenedetta Gottardelli0Varsha Gouthamchand1Carlotta Masciocchi2Luca Boldrini3Antonella Martino4Ciro Mazzarella5Mariangela Massaccesi6René Monshouwer7Jeroen Findhammer8Leonard Wee9Andre Dekker10Maria Antonietta Gambacorta11Andrea Damiani12Department of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Università Cattolica del Sacro CuoreClinical Data Science, GROW School of Oncology and Reproduction, Maastricht UniversityReal World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSDepartment of Radiation Oncology, Radboud University Medical CenterDepartment of Radiation Oncology, Radboud University Medical CenterDepartment of Radiation Oncology (Maastro), GROW-School for Oncology and Reproduction, Maastricht University Medical Centre+Department of Radiation Oncology (Maastro), GROW-School for Oncology and Reproduction, Maastricht University Medical Centre+Department of Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCSReal World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario Agostino Gemelli IRCCSAbstract Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined can help overcome these issues, as one provides the means of training models without exchanging sensitive data, while the other identifies the most informative features, reduces overfitting, and improves model interpretability. Our proposed FS pipeline based on FL principles targets data-driven radiomics FS in a multivariate survival study of non-small cell lung cancer patients. The pipeline was run across datasets from three institutions without patient-level data exchange. It includes two FS techniques, Correlation-based Feature Selection and LASSO regularization, and Cox Proportional-Hazard regression with Overall Survival as endpoint. Trained and validated on 828 patients overall, our pipeline yielded a radiomic signature comprising "intensity-based energy" and "mean discretised intensity". Validation resulted in a mean Harrell C-index of 0.59, showcasing fair efficacy in risk stratification. In conclusion, we suggest a distributed radiomics approach that incorporates preliminary feature selection to systematically decrease the feature set based on data-driven considerations. This aims to address dimensionality challenges beyond those associated with data constraints and interpretability concerns.https://doi.org/10.1038/s41598-024-58241-1Distributed learningFeature selectionRadiomicsNSCLC
spellingShingle Benedetta Gottardelli
Varsha Gouthamchand
Carlotta Masciocchi
Luca Boldrini
Antonella Martino
Ciro Mazzarella
Mariangela Massaccesi
René Monshouwer
Jeroen Findhammer
Leonard Wee
Andre Dekker
Maria Antonietta Gambacorta
Andrea Damiani
A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
Scientific Reports
Distributed learning
Feature selection
Radiomics
NSCLC
title A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
title_full A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
title_fullStr A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
title_full_unstemmed A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
title_short A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
title_sort distributed feature selection pipeline for survival analysis using radiomics in non small cell lung cancer patients
topic Distributed learning
Feature selection
Radiomics
NSCLC
url https://doi.org/10.1038/s41598-024-58241-1
work_keys_str_mv AT benedettagottardelli adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT varshagouthamchand adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT carlottamasciocchi adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT lucaboldrini adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT antonellamartino adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT ciromazzarella adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT mariangelamassaccesi adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT renemonshouwer adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT jeroenfindhammer adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT leonardwee adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT andredekker adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT mariaantoniettagambacorta adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT andreadamiani adistributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT benedettagottardelli distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT varshagouthamchand distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT carlottamasciocchi distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT lucaboldrini distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT antonellamartino distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT ciromazzarella distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT mariangelamassaccesi distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT renemonshouwer distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT jeroenfindhammer distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT leonardwee distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT andredekker distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT mariaantoniettagambacorta distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients
AT andreadamiani distributedfeatureselectionpipelineforsurvivalanalysisusingradiomicsinnonsmallcelllungcancerpatients