Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database

<h4>Introduction</h4> Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. <h4>Objective</h4> The cohort study was intended to establish a reliable dat...

Full description

Bibliographic Details
Main Authors: Ruiyang Wu, Jing Luo, Hangyu Wan, Haiyan Zhang, Yewei Yuan, Huihua Hu, Jinyan Feng, Jing Wen, Yan Wang, Junyan Li, Qi Liang, Fengjiao Gan, Gang Zhang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879508/?tool=EBI
_version_ 1811176111239856128
author Ruiyang Wu
Jing Luo
Hangyu Wan
Haiyan Zhang
Yewei Yuan
Huihua Hu
Jinyan Feng
Jing Wen
Yan Wang
Junyan Li
Qi Liang
Fengjiao Gan
Gang Zhang
author_facet Ruiyang Wu
Jing Luo
Hangyu Wan
Haiyan Zhang
Yewei Yuan
Huihua Hu
Jinyan Feng
Jing Wen
Yan Wang
Junyan Li
Qi Liang
Fengjiao Gan
Gang Zhang
author_sort Ruiyang Wu
collection DOAJ
description <h4>Introduction</h4> Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. <h4>Objective</h4> The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others. <h4>Methods</h4> This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database. <h4>Results</h4> Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820–0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/). <h4>Conclusions</h4> The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine.
first_indexed 2024-04-10T19:46:48Z
format Article
id doaj.art-dca171f0f6e4406bb5e3a6f5d851c45a
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-10T19:46:48Z
publishDate 2023-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-dca171f0f6e4406bb5e3a6f5d851c45a2023-01-29T05:30:57ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01181Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results databaseRuiyang WuJing LuoHangyu WanHaiyan ZhangYewei YuanHuihua HuJinyan FengJing WenYan WangJunyan LiQi LiangFengjiao GanGang Zhang<h4>Introduction</h4> Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. <h4>Objective</h4> The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others. <h4>Methods</h4> This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database. <h4>Results</h4> Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820–0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/). <h4>Conclusions</h4> The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879508/?tool=EBI
spellingShingle Ruiyang Wu
Jing Luo
Hangyu Wan
Haiyan Zhang
Yewei Yuan
Huihua Hu
Jinyan Feng
Jing Wen
Yan Wang
Junyan Li
Qi Liang
Fengjiao Gan
Gang Zhang
Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
PLoS ONE
title Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
title_full Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
title_fullStr Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
title_full_unstemmed Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
title_short Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
title_sort evaluation of machine learning algorithms for the prognosis of breast cancer from the surveillance epidemiology and end results database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879508/?tool=EBI
work_keys_str_mv AT ruiyangwu evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT jingluo evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT hangyuwan evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT haiyanzhang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT yeweiyuan evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT huihuahu evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT jinyanfeng evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT jingwen evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT yanwang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT junyanli evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT qiliang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT fengjiaogan evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase
AT gangzhang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase