Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches

<b>Introduction</b>: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an opt...

Full description

Bibliographic Details
Main Authors: Vida Abedi, Clare Lambert, Durgesh Chaudhary, Emily Rieder, Venkatesh Avula, Wenke Hwang, Jiang Li, Ramin Zand
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Journal of Clinical Medicine
Subjects:
Online Access:https://www.mdpi.com/2077-0383/12/7/2600
_version_ 1797607700034486272
author Vida Abedi
Clare Lambert
Durgesh Chaudhary
Emily Rieder
Venkatesh Avula
Wenke Hwang
Jiang Li
Ramin Zand
author_facet Vida Abedi
Clare Lambert
Durgesh Chaudhary
Emily Rieder
Venkatesh Avula
Wenke Hwang
Jiang Li
Ramin Zand
author_sort Vida Abedi
collection DOAJ
description <b>Introduction</b>: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an optimal cut-point for defining the age of young IS. <b>Methods:</b> Patient-level EHRs were extracted from 13 hospitals in Pennsylvania, and used in two parallel approaches. The first approach included ICD9/10, from IS patients to group comorbidities, and computed similarity scores between every patient pair. We determined the optimal age of young IS by analyzing the trend of patient similarity with respect to their clinical profile for different ages of index IS. The second approach used the IS cohort and control (without IS), and built three sets of machine-learning models—generalized linear regression (GLM), random forest (RF), and XGBoost (XGB)—to classify patients for seventeen age groups. After extracting feature importance from the models, we determined the optimal age of young IS by analyzing the pattern of comorbidity with respect to the age of index IS. Both approaches were completed separately for male and female patients. <b>Results:</b> The stroke cohort contained 7555 ISs, and the control included 31,067 patients. In the first approach, the optimal age of young stroke was 53.7 and 51.0 years in female and male patients, respectively. In the second approach, we created 102 models, based on three algorithms, 17 age brackets, and two sexes. The optimal age was 53 (GLM), 52 (RF), and 54 (XGB) for female, and 52 (GLM and RF) and 53 (RF) for male patients. Different age and sex groups exhibited different comorbidity patterns. <b>Discussion:</b> Using a data-driven approach, we determined the age of young stroke to be 54 years for women and 52 years for men in our mainly rural population, in central Pennsylvania. Future validation studies should include more diverse populations.
first_indexed 2024-03-11T05:33:21Z
format Article
id doaj.art-fecf9cb8fe2845e0a0e2e42fcb5253be
institution Directory Open Access Journal
issn 2077-0383
language English
last_indexed 2024-03-11T05:33:21Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Journal of Clinical Medicine
spelling doaj.art-fecf9cb8fe2845e0a0e2e42fcb5253be2023-11-17T16:59:21ZengMDPI AGJournal of Clinical Medicine2077-03832023-03-01127260010.3390/jcm12072600Defining the Age of Young Ischemic Stroke Using Data-Driven ApproachesVida Abedi0Clare Lambert1Durgesh Chaudhary2Emily Rieder3Venkatesh Avula4Wenke Hwang5Jiang Li6Ramin Zand7Department of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USADepartment of Neurology, Yale New Haven Hospital, New Haven, CT 06510, USAGeisinger Neuroscience Institute, Geisinger Health System, Danville, PA 17822, USAGeisinger Commonwealth, School of Medicine, Scranton, PA 18509, USADepartment of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USADepartment of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USADepartment of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USAGeisinger Neuroscience Institute, Geisinger Health System, Danville, PA 17822, USA<b>Introduction</b>: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an optimal cut-point for defining the age of young IS. <b>Methods:</b> Patient-level EHRs were extracted from 13 hospitals in Pennsylvania, and used in two parallel approaches. The first approach included ICD9/10, from IS patients to group comorbidities, and computed similarity scores between every patient pair. We determined the optimal age of young IS by analyzing the trend of patient similarity with respect to their clinical profile for different ages of index IS. The second approach used the IS cohort and control (without IS), and built three sets of machine-learning models—generalized linear regression (GLM), random forest (RF), and XGBoost (XGB)—to classify patients for seventeen age groups. After extracting feature importance from the models, we determined the optimal age of young IS by analyzing the pattern of comorbidity with respect to the age of index IS. Both approaches were completed separately for male and female patients. <b>Results:</b> The stroke cohort contained 7555 ISs, and the control included 31,067 patients. In the first approach, the optimal age of young stroke was 53.7 and 51.0 years in female and male patients, respectively. In the second approach, we created 102 models, based on three algorithms, 17 age brackets, and two sexes. The optimal age was 53 (GLM), 52 (RF), and 54 (XGB) for female, and 52 (GLM and RF) and 53 (RF) for male patients. Different age and sex groups exhibited different comorbidity patterns. <b>Discussion:</b> Using a data-driven approach, we determined the age of young stroke to be 54 years for women and 52 years for men in our mainly rural population, in central Pennsylvania. Future validation studies should include more diverse populations.https://www.mdpi.com/2077-0383/12/7/2600ischemic strokedata sciencemachine-learningelectronic health recordsEHR
spellingShingle Vida Abedi
Clare Lambert
Durgesh Chaudhary
Emily Rieder
Venkatesh Avula
Wenke Hwang
Jiang Li
Ramin Zand
Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
Journal of Clinical Medicine
ischemic stroke
data science
machine-learning
electronic health records
EHR
title Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
title_full Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
title_fullStr Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
title_full_unstemmed Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
title_short Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
title_sort defining the age of young ischemic stroke using data driven approaches
topic ischemic stroke
data science
machine-learning
electronic health records
EHR
url https://www.mdpi.com/2077-0383/12/7/2600
work_keys_str_mv AT vidaabedi definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT clarelambert definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT durgeshchaudhary definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT emilyrieder definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT venkateshavula definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT wenkehwang definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT jiangli definingtheageofyoungischemicstrokeusingdatadrivenapproaches
AT raminzand definingtheageofyoungischemicstrokeusingdatadrivenapproaches