Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm

Abstract Background Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastat...

Full description

Bibliographic Details
Main Authors:	Ashleigh Ledger, Jolien Ceusters, Lil Valentin, Antonia Testa, Caroline Van Holsbeke, Dorella Franchi, Tom Bourne, Wouter Froyman, Dirk Timmerman, Ben Van Calster
Format:	Article
Language:	English
Published:	BMC 2023-11-01
Series:	BMC Medical Research Methodology
Subjects:	Ovarian Neoplasms Prediction models Machine learning Calibration Multiclass models
Online Access:	https://doi.org/10.1186/s12874-023-02103-3

_version_	1827633868407046144
author	Ashleigh Ledger Jolien Ceusters Lil Valentin Antonia Testa Caroline Van Holsbeke Dorella Franchi Tom Bourne Wouter Froyman Dirk Timmerman Ben Van Calster
author_facet	Ashleigh Ledger Jolien Ceusters Lil Valentin Antonia Testa Caroline Van Holsbeke Dorella Franchi Tom Bourne Wouter Froyman Dirk Timmerman Ben Van Calster
author_sort	Ashleigh Ledger
collection	DOAJ
description	Abstract Background Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic. Methods This retrospective cohort study used 5909 patients recruited from 1999 to 2012 for model development, and 3199 patients recruited from 2012 to 2015 for model validation. Patients were recruited at oncology referral or general centers and underwent an ultrasound examination and surgery ≤ 120 days later. We developed models using standard multinomial logistic regression (MLR), Ridge MLR, random forest (RF), XGBoost, neural networks (NN), and support vector machines (SVM). We used nine clinical and ultrasound predictors but developed models with or without CA125. Results Most tumors were benign (3980 in development and 1688 in validation data), secondary metastatic tumors were least common (246 and 172). The c-statistic (AUROC) to discriminate benign from any type of malignant tumor ranged from 0.89 to 0.92 for models with CA125, from 0.89 to 0.91 for models without. The multiclass c-statistic ranged from 0.41 (SVM) to 0.55 (XGBoost) for models with CA125, and from 0.42 (SVM) to 0.51 (standard MLR) for models without. Multiclass calibration was best for RF and XGBoost. Estimated probabilities for a benign tumor in the same patient often differed by more than 0.2 (20% points) depending on the model. Net Benefit for diagnosing malignancy was similar for algorithms at the commonly used 10% risk threshold, but was slightly higher for RF at higher thresholds. Comparing models, between 3% (XGBoost vs. NN, with CA125) and 30% (NN vs. SVM, without CA125) of patients fell on opposite sides of the 10% threshold. Conclusion Although several models had similarly good performance, individual probability estimates varied substantially.
first_indexed	2024-03-09T15:05:30Z
format	Article
id	doaj.art-ffee93220bd743009060ac253efc8432
institution	Directory Open Access Journal
issn	1471-2288
language	English
last_indexed	2024-03-09T15:05:30Z
publishDate	2023-11-01
publisher	BMC
record_format	Article
series	BMC Medical Research Methodology
spelling	doaj.art-ffee93220bd743009060ac253efc84322023-11-26T13:42:58ZengBMCBMC Medical Research Methodology1471-22882023-11-0123111410.1186/s12874-023-02103-3Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithmAshleigh Ledger0Jolien Ceusters1Lil Valentin2Antonia Testa3Caroline Van Holsbeke4Dorella Franchi5Tom Bourne6Wouter Froyman7Dirk Timmerman8Ben Van Calster9Department of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenDepartment of Obstetrics and Gynecology, Skåne University HospitalDepartment of Woman, Child and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCSDepartment of Obstetrics and Gynecology, Ziekenhuis Oost-LimburgPreventive Gynecology Unit, Division of Gynecology, European Institute of Oncology IRCCSDepartment of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenDepartment of Development and Regeneration, KU LeuvenAbstract Background Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic. Methods This retrospective cohort study used 5909 patients recruited from 1999 to 2012 for model development, and 3199 patients recruited from 2012 to 2015 for model validation. Patients were recruited at oncology referral or general centers and underwent an ultrasound examination and surgery ≤ 120 days later. We developed models using standard multinomial logistic regression (MLR), Ridge MLR, random forest (RF), XGBoost, neural networks (NN), and support vector machines (SVM). We used nine clinical and ultrasound predictors but developed models with or without CA125. Results Most tumors were benign (3980 in development and 1688 in validation data), secondary metastatic tumors were least common (246 and 172). The c-statistic (AUROC) to discriminate benign from any type of malignant tumor ranged from 0.89 to 0.92 for models with CA125, from 0.89 to 0.91 for models without. The multiclass c-statistic ranged from 0.41 (SVM) to 0.55 (XGBoost) for models with CA125, and from 0.42 (SVM) to 0.51 (standard MLR) for models without. Multiclass calibration was best for RF and XGBoost. Estimated probabilities for a benign tumor in the same patient often differed by more than 0.2 (20% points) depending on the model. Net Benefit for diagnosing malignancy was similar for algorithms at the commonly used 10% risk threshold, but was slightly higher for RF at higher thresholds. Comparing models, between 3% (XGBoost vs. NN, with CA125) and 30% (NN vs. SVM, without CA125) of patients fell on opposite sides of the 10% threshold. Conclusion Although several models had similarly good performance, individual probability estimates varied substantially.https://doi.org/10.1186/s12874-023-02103-3Ovarian NeoplasmsPrediction modelsMachine learningCalibrationMulticlass models
spellingShingle	Ashleigh Ledger Jolien Ceusters Lil Valentin Antonia Testa Caroline Van Holsbeke Dorella Franchi Tom Bourne Wouter Froyman Dirk Timmerman Ben Van Calster Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm BMC Medical Research Methodology Ovarian Neoplasms Prediction models Machine learning Calibration Multiclass models
title	Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_full	Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_fullStr	Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_full_unstemmed	Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_short	Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm
title_sort	multiclass risk models for ovarian malignancy an illustration of prediction uncertainty due to the choice of algorithm
topic	Ovarian Neoplasms Prediction models Machine learning Calibration Multiclass models
url	https://doi.org/10.1186/s12874-023-02103-3
work_keys_str_mv	AT ashleighledger multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT jolienceusters multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT lilvalentin multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT antoniatesta multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT carolinevanholsbeke multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT dorellafranchi multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT tombourne multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT wouterfroyman multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT dirktimmerman multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm AT benvancalster multiclassriskmodelsforovarianmalignancyanillustrationofpredictionuncertaintyduetothechoiceofalgorithm

Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm

Similar Items