Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database

This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs betwe...

Full description

Bibliographic Details
Main Authors: Mi Du, Dandara G. Haag, John W. Lynch, Murthy N. Mittinty
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Cancers
Subjects:
Online Access:https://www.mdpi.com/2072-6694/12/10/2802
_version_ 1797552257909129216
author Mi Du
Dandara G. Haag
John W. Lynch
Murthy N. Mittinty
author_facet Mi Du
Dandara G. Haag
John W. Lynch
Murthy N. Mittinty
author_sort Mi Du
collection DOAJ
description This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients.
first_indexed 2024-03-10T15:57:28Z
format Article
id doaj.art-015f7aea4a934c3ebbbc4d46c11191fe
institution Directory Open Access Journal
issn 2072-6694
language English
last_indexed 2024-03-10T15:57:28Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Cancers
spelling doaj.art-015f7aea4a934c3ebbbc4d46c11191fe2023-11-20T15:32:02ZengMDPI AGCancers2072-66942020-09-011210280210.3390/cancers12102802Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER DatabaseMi Du0Dandara G. Haag1John W. Lynch2Murthy N. Mittinty3School of Public Health, The University of Adelaide, 5005 Adelaide, AustraliaSchool of Public Health, The University of Adelaide, 5005 Adelaide, AustraliaSchool of Public Health, The University of Adelaide, 5005 Adelaide, AustraliaSchool of Public Health, The University of Adelaide, 5005 Adelaide, AustraliaThis study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients.https://www.mdpi.com/2072-6694/12/10/2802mouth neoplasmsforecastingsurvivabilityoropharyngealhead and neck
spellingShingle Mi Du
Dandara G. Haag
John W. Lynch
Murthy N. Mittinty
Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
Cancers
mouth neoplasms
forecasting
survivability
oropharyngeal
head and neck
title Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
title_full Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
title_fullStr Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
title_full_unstemmed Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
title_short Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
title_sort comparison of the tree based machine learning algorithms to cox regression in predicting the survival of oral and pharyngeal cancers analyses based on seer database
topic mouth neoplasms
forecasting
survivability
oropharyngeal
head and neck
url https://www.mdpi.com/2072-6694/12/10/2802
work_keys_str_mv AT midu comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase
AT dandaraghaag comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase
AT johnwlynch comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase
AT murthynmittinty comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase