An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting

Abstract Pancreatic cancer is one of the deadliest carcinogenic diseases affecting people all over the world. The majority of patients are usually detected at Stage III or Stage IV, and the chances of survival are very low once detected at the late stages. This study focuses on building an efficient...

Full description

Bibliographic Details
Main Authors: Aditya Chakraborty, Chris P. Tsokos
Format: Article
Language:English
Published: Springer 2023-09-01
Series:Journal of Statistical Theory and Applications (JSTA)
Subjects:
Online Access:https://doi.org/10.1007/s44199-023-00063-7
_version_ 1797413800156069888
author Aditya Chakraborty
Chris P. Tsokos
author_facet Aditya Chakraborty
Chris P. Tsokos
author_sort Aditya Chakraborty
collection DOAJ
description Abstract Pancreatic cancer is one of the deadliest carcinogenic diseases affecting people all over the world. The majority of patients are usually detected at Stage III or Stage IV, and the chances of survival are very low once detected at the late stages. This study focuses on building an efficient data-driven analytical predictive model based on the associated risk factors and identifying the most contributing factors influencing the survival times of patients diagnosed with pancreatic cancer using the XGBoost (eXtreme Gradient Boosting) algorithm. The grid-search mechanism was implemented to compute the optimum values of the hyper-parameters of the analytical model by minimizing the root mean square error (RMSE). The optimum hyperparameters of the final analytical model were selected by comparing the values with 243 competing models. To check the validity of the model, we compared the model’s performance with ten deep neural network models, grown sequentially with different activation functions and optimizers. We also constructed an ensemble model using Gradient Boosting Machine (GBM). The proposed XGBoost model outperformed all competing models we considered with regard to root mean square error (RMSE). After developing the model, the individual risk factors were ranked according to their individual contribution to the response predictions, which is extremely important for pancreatic research organizations to spend their resources on the risk factors causing/influencing the particular type of cancer. The three most influencing risk factors affecting the survival of pancreatic cancer patients were found to be the age of the patient, current BMI, and cigarette smoking years with contributing percentages of 35.5%, 24.3%, and 14.93%, respectively. The predictive model is approximately 96.42% accurate in predicting the survival times of the patients diagnosed with pancreatic cancer and performs excellently on test data. The analytical methodology of developing the model can be utilized for prediction purposes. It can be utilized to predict the time to death related to a specific type of cancer, given a set of numeric, and non-numeric features.
first_indexed 2024-03-09T05:24:09Z
format Article
id doaj.art-41e7c4c7bf57462f92667c2e2e3bc93c
institution Directory Open Access Journal
issn 2214-1766
language English
last_indexed 2024-03-09T05:24:09Z
publishDate 2023-09-01
publisher Springer
record_format Article
series Journal of Statistical Theory and Applications (JSTA)
spelling doaj.art-41e7c4c7bf57462f92667c2e2e3bc93c2023-12-03T12:38:25ZengSpringerJournal of Statistical Theory and Applications (JSTA)2214-17662023-09-0122426228210.1007/s44199-023-00063-7An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient BoostingAditya Chakraborty0Chris P. Tsokos1Eastern Virginia Medical SchoolUniversity of South FloridaAbstract Pancreatic cancer is one of the deadliest carcinogenic diseases affecting people all over the world. The majority of patients are usually detected at Stage III or Stage IV, and the chances of survival are very low once detected at the late stages. This study focuses on building an efficient data-driven analytical predictive model based on the associated risk factors and identifying the most contributing factors influencing the survival times of patients diagnosed with pancreatic cancer using the XGBoost (eXtreme Gradient Boosting) algorithm. The grid-search mechanism was implemented to compute the optimum values of the hyper-parameters of the analytical model by minimizing the root mean square error (RMSE). The optimum hyperparameters of the final analytical model were selected by comparing the values with 243 competing models. To check the validity of the model, we compared the model’s performance with ten deep neural network models, grown sequentially with different activation functions and optimizers. We also constructed an ensemble model using Gradient Boosting Machine (GBM). The proposed XGBoost model outperformed all competing models we considered with regard to root mean square error (RMSE). After developing the model, the individual risk factors were ranked according to their individual contribution to the response predictions, which is extremely important for pancreatic research organizations to spend their resources on the risk factors causing/influencing the particular type of cancer. The three most influencing risk factors affecting the survival of pancreatic cancer patients were found to be the age of the patient, current BMI, and cigarette smoking years with contributing percentages of 35.5%, 24.3%, and 14.93%, respectively. The predictive model is approximately 96.42% accurate in predicting the survival times of the patients diagnosed with pancreatic cancer and performs excellently on test data. The analytical methodology of developing the model can be utilized for prediction purposes. It can be utilized to predict the time to death related to a specific type of cancer, given a set of numeric, and non-numeric features.https://doi.org/10.1007/s44199-023-00063-7Pancreatic CancerExtreme Gradient BoostingBoosted Regression TreesPancreatic Risk FactorsGrid Search Mechanism
spellingShingle Aditya Chakraborty
Chris P. Tsokos
An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
Journal of Statistical Theory and Applications (JSTA)
Pancreatic Cancer
Extreme Gradient Boosting
Boosted Regression Trees
Pancreatic Risk Factors
Grid Search Mechanism
title An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
title_full An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
title_fullStr An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
title_full_unstemmed An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
title_short An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
title_sort ai driven predictive model for pancreatic cancer patients using extreme gradient boosting
topic Pancreatic Cancer
Extreme Gradient Boosting
Boosted Regression Trees
Pancreatic Risk Factors
Grid Search Mechanism
url https://doi.org/10.1007/s44199-023-00063-7
work_keys_str_mv AT adityachakraborty anaidrivenpredictivemodelforpancreaticcancerpatientsusingextremegradientboosting
AT chrisptsokos anaidrivenpredictivemodelforpancreaticcancerpatientsusingextremegradientboosting
AT adityachakraborty aidrivenpredictivemodelforpancreaticcancerpatientsusingextremegradientboosting
AT chrisptsokos aidrivenpredictivemodelforpancreaticcancerpatientsusingextremegradientboosting