Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm

Abstract Background Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastat...

Full description

Bibliographic Details
Main Authors: Ashleigh Ledger, Jolien Ceusters, Lil Valentin, Antonia Testa, Caroline Van Holsbeke, Dorella Franchi, Tom Bourne, Wouter Froyman, Dirk Timmerman, Ben Van Calster
Format: Article
Language:English
Published: BMC 2023-11-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-023-02103-3
_version_ 1827633868407046144
author Ashleigh Ledger
Jolien Ceusters
Lil Valentin
Antonia Testa
Caroline Van Holsbeke
Dorella Franchi
Tom Bourne
Wouter Froyman
Dirk Timmerman
Ben Van Calster
author_facet Ashleigh Ledger
Jolien Ceusters
Lil Valentin
Antonia Testa
Caroline Van Holsbeke
Dorella Franchi
Tom Bourne
Wouter Froyman
Dirk Timmerman
Ben Van Calster
author_sort Ashleigh Ledger
collection DOAJ
description Abstract Background Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic. Methods This retrospective cohort study used 5909 patients recruited from 1999 to 2012 for model development, and 3199 patients recruited from 2012 to 2015 for model validation. Patients were recruited at oncology referral or general centers and underwent an ultrasound examination and surgery ≤ 120 days later. We developed models using standard multinomial logistic regression (MLR), Ridge MLR, random forest (RF), XGBoost, neural networks (NN), and support vector machines (SVM). We used nine clinical and ultrasound predictors but developed models with or without CA125. Results Most tumors were benign (3980 in development and 1688 in validation data), secondary metastatic tumors were least common (246 and 172). The c-statistic (AUROC) to discriminate benign from any type of malignant tumor ranged from 0.89 to 0.92 for models with CA125, from 0.89 to 0.91 for models without. The multiclass c-statistic ranged from 0.41 (SVM) to 0.55 (XGBoost) for models with CA125, and from 0.42 (SVM) to 0.51 (standard MLR) for models without. Multiclass calibration was best for RF and XGBoost. Estimated probabilities for a benign tumor in the same patient often differed by more than 0.2 (20% points) depending on the model. Net Benefit for diagnosing malignancy was similar for algorithms at the commonly used 10% risk threshold, but was slightly higher for RF at higher thresholds. Comparing models, between 3% (XGBoost vs. NN, with CA125) and 30% (NN vs. SVM, without CA125) of patients fell on opposite sides of the 10% threshold. Conclusion Although several models had similarly good performance, individual probability estimates varied substantially.
first_indexed 2024-03-09T15:05:30Z
format Article
id doaj.art-ffee93220bd743009060ac253efc8432
institution Directory Open Access Journal
issn 1471-2288
language English
last_indexed 2024-03-09T15:05:30Z
publishDate 2023-11-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj.art-ffee93220bd743009060ac253efc84322023-11-26T13:42:58ZengBMCBMC Medical Research Methodology1471-22882023-11-0123111410.1186/s12874-023-02103-3Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithmAshleigh Ledger0Jolien Ceusters1Lil Valentin2Antonia Testa3Caroline Van Holsbeke4Dorella Franchi5Tom Bourne6Wouter Froyman7Dirk Timmerman8Ben Van Calster9Department of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenDepartment of Obstetrics and Gynecology, Skåne University HospitalDepartment of Woman, Child and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCSDepartment of Obstetrics and Gynecology, Ziekenhuis Oost-LimburgPreventive Gynecology Unit, Division of Gynecology, European Institute of Oncology IRCCSDepartment of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenAbstract Background Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic. Methods This retrospective cohort study used 5909 patients recruited from 1999 to 2012 for model development, and 3199 patients recruited from 2012 to 2015 for model validation. Patients were recruited at oncology referral or general centers and underwent an ultrasound examination and surgery ≤ 120 days later. We developed models using standard multinomial logistic regression (MLR), Ridge MLR, random forest (RF), XGBoost, neural networks (NN), and support vector machines (SVM). We used nine clinical and ultrasound predictors but developed models with or without CA125. Results Most tumors were benign (3980 in development and 1688 in validation data), secondary metastatic tumors were least common (246 and 172). The c-statistic (AUROC) to discriminate benign from any type of malignant tumor ranged from 0.89 to 0.92 for models with CA125, from 0.89 to 0.91 for models without. The multiclass c-statistic ranged from 0.41 (SVM) to 0.55 (XGBoost) for models with CA125, and from 0.42 (SVM) to 0.51 (standard MLR) for models without. Multiclass calibration was best for RF and XGBoost. Estimated probabilities for a benign tumor in the same patient often differed by more than 0.2 (20% points) depending on the model. Net Benefit for diagnosing malignancy was similar for algorithms at the commonly used 10% risk threshold, but was slightly higher for RF at higher thresholds. Comparing models, between 3% (XGBoost vs. NN, with CA125) and 30% (NN vs. SVM, without CA125) of patients fell on opposite sides of the 10% threshold. Conclusion Although several models had similarly good performance, individual probability estimates varied substantially.https://doi.org/10.1186/s12874-023-02103-3Ovarian NeoplasmsPrediction modelsMachine learningCalibrationMulticlass models
spellingShingle Ashleigh Ledger
Jolien Ceusters
Lil Valentin
Antonia Testa
Caroline Van Holsbeke
Dorella Franchi
Tom Bourne
Wouter Froyman
Dirk Timmerman
Ben Van Calster
Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
BMC Medical Research Methodology
Ovarian Neoplasms
Prediction models
Machine learning
Calibration
Multiclass models
title Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_full Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_fullStr Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_full_unstemmed Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_short Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_sort multiclass risk models for ovarian malignancy an illustration of prediction uncertainty due to the choice of algorithm
topic Ovarian Neoplasms
Prediction models
Machine learning
Calibration
Multiclass models
url https://doi.org/10.1186/s12874-023-02103-3
work_keys_str_mv AT ashleighledger multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT jolienceusters multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT lilvalentin multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT antoniatesta multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT carolinevanholsbeke multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT dorellafranchi multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT tombourne multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT wouterfroyman multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT dirktimmerman multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm
AT benvancalster multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm