Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database
<h4>Introduction</h4> Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. <h4>Objective</h4> The cohort study was intended to establish a reliable dat...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2023-01-01
|
Series: | PLoS ONE |
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879508/?tool=EBI |
_version_ | 1811176111239856128 |
---|---|
author | Ruiyang Wu Jing Luo Hangyu Wan Haiyan Zhang Yewei Yuan Huihua Hu Jinyan Feng Jing Wen Yan Wang Junyan Li Qi Liang Fengjiao Gan Gang Zhang |
author_facet | Ruiyang Wu Jing Luo Hangyu Wan Haiyan Zhang Yewei Yuan Huihua Hu Jinyan Feng Jing Wen Yan Wang Junyan Li Qi Liang Fengjiao Gan Gang Zhang |
author_sort | Ruiyang Wu |
collection | DOAJ |
description | <h4>Introduction</h4> Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. <h4>Objective</h4> The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others. <h4>Methods</h4> This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database. <h4>Results</h4> Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820–0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/). <h4>Conclusions</h4> The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine. |
first_indexed | 2024-04-10T19:46:48Z |
format | Article |
id | doaj.art-dca171f0f6e4406bb5e3a6f5d851c45a |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-04-10T19:46:48Z |
publishDate | 2023-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-dca171f0f6e4406bb5e3a6f5d851c45a2023-01-29T05:30:57ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01181Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results databaseRuiyang WuJing LuoHangyu WanHaiyan ZhangYewei YuanHuihua HuJinyan FengJing WenYan WangJunyan LiQi LiangFengjiao GanGang Zhang<h4>Introduction</h4> Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. <h4>Objective</h4> The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others. <h4>Methods</h4> This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database. <h4>Results</h4> Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820–0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/). <h4>Conclusions</h4> The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879508/?tool=EBI |
spellingShingle | Ruiyang Wu Jing Luo Hangyu Wan Haiyan Zhang Yewei Yuan Huihua Hu Jinyan Feng Jing Wen Yan Wang Junyan Li Qi Liang Fengjiao Gan Gang Zhang Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database PLoS ONE |
title | Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database |
title_full | Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database |
title_fullStr | Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database |
title_full_unstemmed | Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database |
title_short | Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database |
title_sort | evaluation of machine learning algorithms for the prognosis of breast cancer from the surveillance epidemiology and end results database |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879508/?tool=EBI |
work_keys_str_mv | AT ruiyangwu evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT jingluo evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT hangyuwan evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT haiyanzhang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT yeweiyuan evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT huihuahu evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT jinyanfeng evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT jingwen evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT yanwang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT junyanli evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT qiliang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT fengjiaogan evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase AT gangzhang evaluationofmachinelearningalgorithmsfortheprognosisofbreastcancerfromthesurveillanceepidemiologyandendresultsdatabase |