Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning

Introduction Accurate prediction of patient prognosis can be especially useful for the selection of best treatment protocols. Machine Learning can serve this purpose by making predictions based upon generalizable clinical patterns embedded within learning datasets. We designed a study to support the...

Full description

Bibliographic Details
Main Authors: Alexandros Laios MD, PhD, Angeliki Katsenou PhD, Yong Sheng Tan MBBS, Racheal Johnson MBChB, MRCOG, Mohamed Otify MBBCh, MRCOG, MSc, Angelika Kaufmann MBBS, MRCOG, PhD, Sarika Munot MBBS, Amudha Thangavelu MBChB, MRCOG, MD, Richard Hutson MBBS, MRCOG, MD, Tim Broadhead MBChB, MRCOG, Georgios Theophilou MBBS, MRCOG, MD, David Nugent MBChB, MRCOG, PhD, Diederick De Jong MBBCh, PhD, MSc
Format: Article
Language:English
Published: SAGE Publishing 2021-10-01
Series:Cancer Control
Online Access:https://doi.org/10.1177/10732748211044678
_version_ 1818740041905602560
author Alexandros Laios MD, PhD
Angeliki Katsenou PhD
Yong Sheng Tan MBBS
Racheal Johnson MBChB, MRCOG
Mohamed Otify MBBCh, MRCOG, MSc
Angelika Kaufmann MBBS, MRCOG, PhD
Sarika Munot MBBS
Amudha Thangavelu MBChB, MRCOG, MD
Richard Hutson MBBS, MRCOG, MD
Tim Broadhead MBChB, MRCOG
Georgios Theophilou MBBS, MRCOG, MD
David Nugent MBChB, MRCOG, PhD
Diederick De Jong MBBCh, PhD, MSc
author_facet Alexandros Laios MD, PhD
Angeliki Katsenou PhD
Yong Sheng Tan MBBS
Racheal Johnson MBChB, MRCOG
Mohamed Otify MBBCh, MRCOG, MSc
Angelika Kaufmann MBBS, MRCOG, PhD
Sarika Munot MBBS
Amudha Thangavelu MBChB, MRCOG, MD
Richard Hutson MBBS, MRCOG, MD
Tim Broadhead MBChB, MRCOG
Georgios Theophilou MBBS, MRCOG, MD
David Nugent MBChB, MRCOG, PhD
Diederick De Jong MBBCh, PhD, MSc
author_sort Alexandros Laios MD, PhD
collection DOAJ
description Introduction Accurate prediction of patient prognosis can be especially useful for the selection of best treatment protocols. Machine Learning can serve this purpose by making predictions based upon generalizable clinical patterns embedded within learning datasets. We designed a study to support the feature selection for the 2-year prognostic period and compared the performance of several Machine Learning prediction algorithms for accurate 2-year prognosis estimation in advanced-stage high grade serous ovarian cancer (HGSOC) patients. Methods The prognosis estimation was formulated as a binary classification problem. Dataset was split into training and test cohorts with repeated random sampling until there was no significant difference (p = 0.20) between the two cohorts. A ten-fold cross-validation was applied. Various state-of-the-art supervised classifiers were used. For feature selection, in addition to the exhaustive search for the best combination of features, we used the-chi square test of independence and the MRMR method. Results Two hundred nine patients were identified. The model's mean prediction accuracy reached 73%. We demonstrated that Support-Vector-Machine and Ensemble Subspace Discriminant algorithms outperformed Logistic Regression in accuracy indices. The probability of achieving a cancer-free state was maximised with a combination of primary cytoreduction, good performance status and maximal surgical effort (AUC 0.63). Standard chemotherapy, performance status, tumour load and residual disease were consistently predictive of the mid-term overall survival (AUC 0.63–0.66). The model recall and precision were greater than 80%. Conclusion Machine Learning appears to be promising for accurate prognosis estimation. Appropriate feature selection is required when building an HGSOC model for 2-year prognosis prediction. We provide evidence as to what combination of prognosticators leads to the largest impact on the HGSOC 2-year prognosis.
first_indexed 2024-12-18T01:34:26Z
format Article
id doaj.art-ccc639714bb1408c8eb74fdc4be154b5
institution Directory Open Access Journal
issn 1073-2748
language English
last_indexed 2024-12-18T01:34:26Z
publishDate 2021-10-01
publisher SAGE Publishing
record_format Article
series Cancer Control
spelling doaj.art-ccc639714bb1408c8eb74fdc4be154b52022-12-21T21:25:30ZengSAGE PublishingCancer Control1073-27482021-10-012810.1177/10732748211044678Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine LearningAlexandros Laios MD, PhDAngeliki Katsenou PhDYong Sheng Tan MBBSRacheal Johnson MBChB, MRCOGMohamed Otify MBBCh, MRCOG, MScAngelika Kaufmann MBBS, MRCOG, PhDSarika Munot MBBSAmudha Thangavelu MBChB, MRCOG, MDRichard Hutson MBBS, MRCOG, MDTim Broadhead MBChB, MRCOGGeorgios Theophilou MBBS, MRCOG, MDDavid Nugent MBChB, MRCOG, PhDDiederick De Jong MBBCh, PhD, MScIntroduction Accurate prediction of patient prognosis can be especially useful for the selection of best treatment protocols. Machine Learning can serve this purpose by making predictions based upon generalizable clinical patterns embedded within learning datasets. We designed a study to support the feature selection for the 2-year prognostic period and compared the performance of several Machine Learning prediction algorithms for accurate 2-year prognosis estimation in advanced-stage high grade serous ovarian cancer (HGSOC) patients. Methods The prognosis estimation was formulated as a binary classification problem. Dataset was split into training and test cohorts with repeated random sampling until there was no significant difference (p = 0.20) between the two cohorts. A ten-fold cross-validation was applied. Various state-of-the-art supervised classifiers were used. For feature selection, in addition to the exhaustive search for the best combination of features, we used the-chi square test of independence and the MRMR method. Results Two hundred nine patients were identified. The model's mean prediction accuracy reached 73%. We demonstrated that Support-Vector-Machine and Ensemble Subspace Discriminant algorithms outperformed Logistic Regression in accuracy indices. The probability of achieving a cancer-free state was maximised with a combination of primary cytoreduction, good performance status and maximal surgical effort (AUC 0.63). Standard chemotherapy, performance status, tumour load and residual disease were consistently predictive of the mid-term overall survival (AUC 0.63–0.66). The model recall and precision were greater than 80%. Conclusion Machine Learning appears to be promising for accurate prognosis estimation. Appropriate feature selection is required when building an HGSOC model for 2-year prognosis prediction. We provide evidence as to what combination of prognosticators leads to the largest impact on the HGSOC 2-year prognosis.https://doi.org/10.1177/10732748211044678
spellingShingle Alexandros Laios MD, PhD
Angeliki Katsenou PhD
Yong Sheng Tan MBBS
Racheal Johnson MBChB, MRCOG
Mohamed Otify MBBCh, MRCOG, MSc
Angelika Kaufmann MBBS, MRCOG, PhD
Sarika Munot MBBS
Amudha Thangavelu MBChB, MRCOG, MD
Richard Hutson MBBS, MRCOG, MD
Tim Broadhead MBChB, MRCOG
Georgios Theophilou MBBS, MRCOG, MD
David Nugent MBChB, MRCOG, PhD
Diederick De Jong MBBCh, PhD, MSc
Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning
Cancer Control
title Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning
title_full Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning
title_fullStr Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning
title_full_unstemmed Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning
title_short Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning
title_sort feature selection is critical for 2 year prognosis in advanced stage high grade serous ovarian cancer by using machine learning
url https://doi.org/10.1177/10732748211044678
work_keys_str_mv AT alexandroslaiosmdphd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT angelikikatsenouphd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT yongshengtanmbbs featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT rachealjohnsonmbchbmrcog featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT mohamedotifymbbchmrcogmsc featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT angelikakaufmannmbbsmrcogphd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT sarikamunotmbbs featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT amudhathangavelumbchbmrcogmd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT richardhutsonmbbsmrcogmd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT timbroadheadmbchbmrcog featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT georgiostheophiloumbbsmrcogmd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT davidnugentmbchbmrcogphd featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning
AT diederickdejongmbbchphdmsc featureselectioniscriticalfor2yearprognosisinadvancedstagehighgradeserousovariancancerbyusingmachinelearning