Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study

Abstract Machine learning (ML) could have advantages over traditional statistical models in identifying risk factors. Using ML algorithms, our objective was to identify the most important variables associated with mortality after dementia diagnosis in the Swedish Registry for Cognitive/Dementia Diso...

Full description

Bibliographic Details
Main Authors: Shayan Mostafaei, Minh Tuan Hoang, Pol Grau Jurado, Hong Xu, Lluis Zacarias-Pons, Maria Eriksdotter, Saikat Chatterjee, Sara Garcia-Ptacek
Format: Article
Language:English
Published: Nature Portfolio 2023-06-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-36362-3
_version_ 1797806732762677248
author Shayan Mostafaei
Minh Tuan Hoang
Pol Grau Jurado
Hong Xu
Lluis Zacarias-Pons
Maria Eriksdotter
Saikat Chatterjee
Sara Garcia-Ptacek
author_facet Shayan Mostafaei
Minh Tuan Hoang
Pol Grau Jurado
Hong Xu
Lluis Zacarias-Pons
Maria Eriksdotter
Saikat Chatterjee
Sara Garcia-Ptacek
author_sort Shayan Mostafaei
collection DOAJ
description Abstract Machine learning (ML) could have advantages over traditional statistical models in identifying risk factors. Using ML algorithms, our objective was to identify the most important variables associated with mortality after dementia diagnosis in the Swedish Registry for Cognitive/Dementia Disorders (SveDem). From SveDem, a longitudinal cohort of 28,023 dementia-diagnosed patients was selected for this study. Sixty variables were considered as potential predictors of mortality risk, such as age at dementia diagnosis, dementia type, sex, body mass index (BMI), mini-mental state examination (MMSE) score, time from referral to initiation of work-up, time from initiation of work-up to diagnosis, dementia medications, comorbidities, and some specific medications for chronic comorbidities (e.g., cardiovascular disease). We applied sparsity-inducing penalties for three ML algorithms and identified twenty important variables for the binary classification task in mortality risk prediction and fifteen variables to predict time to death. Area-under-ROC curve (AUC) measure was used to evaluate the classification algorithms. Then, an unsupervised clustering algorithm was applied on the set of twenty-selected variables to find two main clusters which accurately matched surviving and dead patient clusters. A support-vector-machines with an appropriate sparsity penalty provided the classification of mortality risk with accuracy = 0.7077, AUROC = 0.7375, sensitivity = 0.6436, and specificity = 0.740. Across three ML algorithms, the majority of the identified twenty variables were compatible with literature and with our previous studies on SveDem. We also found new variables which were not previously reported in literature as associated with mortality in dementia. Performance of basic dementia diagnostic work-up, time from referral to initiation of work-up, and time from initiation of work-up to diagnosis were found to be elements of the diagnostic process identified by the ML algorithms. The median follow-up time was 1053 (IQR = 516–1771) days in surviving and 1125 (IQR = 605–1770) days in dead patients. For prediction of time to death, the CoxBoost model identified 15 variables and classified them in order of importance. These highly important variables were age at diagnosis, MMSE score, sex, BMI, and Charlson Comorbidity Index with selection scores of 23%, 15%, 14%, 12% and 10%, respectively. This study demonstrates the potential of sparsity-inducing ML algorithms in improving our understanding of mortality risk factors in dementia patients and their application in clinical settings. Moreover, ML methods can be used as a complement to traditional statistical methods.
first_indexed 2024-03-13T06:11:46Z
format Article
id doaj.art-b6b2b845dca44c13b634a97b249a9d42
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-13T06:11:46Z
publishDate 2023-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-b6b2b845dca44c13b634a97b249a9d422023-06-11T11:13:04ZengNature PortfolioScientific Reports2045-23222023-06-0113111710.1038/s41598-023-36362-3Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort studyShayan Mostafaei0Minh Tuan Hoang1Pol Grau Jurado2Hong Xu3Lluis Zacarias-Pons4Maria Eriksdotter5Saikat Chatterjee6Sara Garcia-Ptacek7Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteDivision of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteDivision of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteDivision of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteDivision of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteDivision of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteDivision of Information Science and Engineering, School of Electrical Engineering and Computer Science, KTH Royal Institute of TechnologyDivision of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska InstituteAbstract Machine learning (ML) could have advantages over traditional statistical models in identifying risk factors. Using ML algorithms, our objective was to identify the most important variables associated with mortality after dementia diagnosis in the Swedish Registry for Cognitive/Dementia Disorders (SveDem). From SveDem, a longitudinal cohort of 28,023 dementia-diagnosed patients was selected for this study. Sixty variables were considered as potential predictors of mortality risk, such as age at dementia diagnosis, dementia type, sex, body mass index (BMI), mini-mental state examination (MMSE) score, time from referral to initiation of work-up, time from initiation of work-up to diagnosis, dementia medications, comorbidities, and some specific medications for chronic comorbidities (e.g., cardiovascular disease). We applied sparsity-inducing penalties for three ML algorithms and identified twenty important variables for the binary classification task in mortality risk prediction and fifteen variables to predict time to death. Area-under-ROC curve (AUC) measure was used to evaluate the classification algorithms. Then, an unsupervised clustering algorithm was applied on the set of twenty-selected variables to find two main clusters which accurately matched surviving and dead patient clusters. A support-vector-machines with an appropriate sparsity penalty provided the classification of mortality risk with accuracy = 0.7077, AUROC = 0.7375, sensitivity = 0.6436, and specificity = 0.740. Across three ML algorithms, the majority of the identified twenty variables were compatible with literature and with our previous studies on SveDem. We also found new variables which were not previously reported in literature as associated with mortality in dementia. Performance of basic dementia diagnostic work-up, time from referral to initiation of work-up, and time from initiation of work-up to diagnosis were found to be elements of the diagnostic process identified by the ML algorithms. The median follow-up time was 1053 (IQR = 516–1771) days in surviving and 1125 (IQR = 605–1770) days in dead patients. For prediction of time to death, the CoxBoost model identified 15 variables and classified them in order of importance. These highly important variables were age at diagnosis, MMSE score, sex, BMI, and Charlson Comorbidity Index with selection scores of 23%, 15%, 14%, 12% and 10%, respectively. This study demonstrates the potential of sparsity-inducing ML algorithms in improving our understanding of mortality risk factors in dementia patients and their application in clinical settings. Moreover, ML methods can be used as a complement to traditional statistical methods.https://doi.org/10.1038/s41598-023-36362-3
spellingShingle Shayan Mostafaei
Minh Tuan Hoang
Pol Grau Jurado
Hong Xu
Lluis Zacarias-Pons
Maria Eriksdotter
Saikat Chatterjee
Sara Garcia-Ptacek
Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study
Scientific Reports
title Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study
title_full Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study
title_fullStr Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study
title_full_unstemmed Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study
title_short Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study
title_sort machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis a longitudinal cohort study
url https://doi.org/10.1038/s41598-023-36362-3
work_keys_str_mv AT shayanmostafaei machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT minhtuanhoang machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT polgraujurado machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT hongxu machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT lluiszacariaspons machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT mariaeriksdotter machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT saikatchatterjee machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy
AT saragarciaptacek machinelearningalgorithmsforidentifyingpredictivevariablesofmortalityriskfollowingdementiadiagnosisalongitudinalcohortstudy