Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches
<b>Introduction</b>: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an opt...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Journal of Clinical Medicine |
Subjects: | |
Online Access: | https://www.mdpi.com/2077-0383/12/7/2600 |
_version_ | 1797607700034486272 |
---|---|
author | Vida Abedi Clare Lambert Durgesh Chaudhary Emily Rieder Venkatesh Avula Wenke Hwang Jiang Li Ramin Zand |
author_facet | Vida Abedi Clare Lambert Durgesh Chaudhary Emily Rieder Venkatesh Avula Wenke Hwang Jiang Li Ramin Zand |
author_sort | Vida Abedi |
collection | DOAJ |
description | <b>Introduction</b>: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an optimal cut-point for defining the age of young IS. <b>Methods:</b> Patient-level EHRs were extracted from 13 hospitals in Pennsylvania, and used in two parallel approaches. The first approach included ICD9/10, from IS patients to group comorbidities, and computed similarity scores between every patient pair. We determined the optimal age of young IS by analyzing the trend of patient similarity with respect to their clinical profile for different ages of index IS. The second approach used the IS cohort and control (without IS), and built three sets of machine-learning models—generalized linear regression (GLM), random forest (RF), and XGBoost (XGB)—to classify patients for seventeen age groups. After extracting feature importance from the models, we determined the optimal age of young IS by analyzing the pattern of comorbidity with respect to the age of index IS. Both approaches were completed separately for male and female patients. <b>Results:</b> The stroke cohort contained 7555 ISs, and the control included 31,067 patients. In the first approach, the optimal age of young stroke was 53.7 and 51.0 years in female and male patients, respectively. In the second approach, we created 102 models, based on three algorithms, 17 age brackets, and two sexes. The optimal age was 53 (GLM), 52 (RF), and 54 (XGB) for female, and 52 (GLM and RF) and 53 (RF) for male patients. Different age and sex groups exhibited different comorbidity patterns. <b>Discussion:</b> Using a data-driven approach, we determined the age of young stroke to be 54 years for women and 52 years for men in our mainly rural population, in central Pennsylvania. Future validation studies should include more diverse populations. |
first_indexed | 2024-03-11T05:33:21Z |
format | Article |
id | doaj.art-fecf9cb8fe2845e0a0e2e42fcb5253be |
institution | Directory Open Access Journal |
issn | 2077-0383 |
language | English |
last_indexed | 2024-03-11T05:33:21Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Journal of Clinical Medicine |
spelling | doaj.art-fecf9cb8fe2845e0a0e2e42fcb5253be2023-11-17T16:59:21ZengMDPI AGJournal of Clinical Medicine2077-03832023-03-01127260010.3390/jcm12072600Defining the Age of Young Ischemic Stroke Using Data-Driven ApproachesVida Abedi0Clare Lambert1Durgesh Chaudhary2Emily Rieder3Venkatesh Avula4Wenke Hwang5Jiang Li6Ramin Zand7Department of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USADepartment of Neurology, Yale New Haven Hospital, New Haven, CT 06510, USAGeisinger Neuroscience Institute, Geisinger Health System, Danville, PA 17822, USAGeisinger Commonwealth, School of Medicine, Scranton, PA 18509, USADepartment of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USADepartment of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USADepartment of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USAGeisinger Neuroscience Institute, Geisinger Health System, Danville, PA 17822, USA<b>Introduction</b>: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an optimal cut-point for defining the age of young IS. <b>Methods:</b> Patient-level EHRs were extracted from 13 hospitals in Pennsylvania, and used in two parallel approaches. The first approach included ICD9/10, from IS patients to group comorbidities, and computed similarity scores between every patient pair. We determined the optimal age of young IS by analyzing the trend of patient similarity with respect to their clinical profile for different ages of index IS. The second approach used the IS cohort and control (without IS), and built three sets of machine-learning models—generalized linear regression (GLM), random forest (RF), and XGBoost (XGB)—to classify patients for seventeen age groups. After extracting feature importance from the models, we determined the optimal age of young IS by analyzing the pattern of comorbidity with respect to the age of index IS. Both approaches were completed separately for male and female patients. <b>Results:</b> The stroke cohort contained 7555 ISs, and the control included 31,067 patients. In the first approach, the optimal age of young stroke was 53.7 and 51.0 years in female and male patients, respectively. In the second approach, we created 102 models, based on three algorithms, 17 age brackets, and two sexes. The optimal age was 53 (GLM), 52 (RF), and 54 (XGB) for female, and 52 (GLM and RF) and 53 (RF) for male patients. Different age and sex groups exhibited different comorbidity patterns. <b>Discussion:</b> Using a data-driven approach, we determined the age of young stroke to be 54 years for women and 52 years for men in our mainly rural population, in central Pennsylvania. Future validation studies should include more diverse populations.https://www.mdpi.com/2077-0383/12/7/2600ischemic strokedata sciencemachine-learningelectronic health recordsEHR |
spellingShingle | Vida Abedi Clare Lambert Durgesh Chaudhary Emily Rieder Venkatesh Avula Wenke Hwang Jiang Li Ramin Zand Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches Journal of Clinical Medicine ischemic stroke data science machine-learning electronic health records EHR |
title | Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches |
title_full | Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches |
title_fullStr | Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches |
title_full_unstemmed | Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches |
title_short | Defining the Age of Young Ischemic Stroke Using Data-Driven Approaches |
title_sort | defining the age of young ischemic stroke using data driven approaches |
topic | ischemic stroke data science machine-learning electronic health records EHR |
url | https://www.mdpi.com/2077-0383/12/7/2600 |
work_keys_str_mv | AT vidaabedi definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT clarelambert definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT durgeshchaudhary definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT emilyrieder definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT venkateshavula definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT wenkehwang definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT jiangli definingtheageofyoungischemicstrokeusingdatadrivenapproaches AT raminzand definingtheageofyoungischemicstrokeusingdatadrivenapproaches |