Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study

Abstract Background Many studies have evaluated stroke using claims data; most of these studies have defined ischemic stroke using an operational definition following the rule-based method. Rule-based methods tend to overestimate the number of patients with ischemic stroke. Objectives We aimed to id...

Full description

Bibliographic Details
Main Authors:	Hyunsun Lim, Youngmin Park, Jung Hwa Hong, Ki-Bong Yoo, Kwon-Duk Seo
Format:	Article
Language:	English
Published:	BMC 2024-01-01
Series:	European Journal of Medical Research
Subjects:	Phenotyping Ischemic stroke Machine learning Deep learning Insurance claim analysis
Online Access:	https://doi.org/10.1186/s40001-023-01594-6

_version_	1827388592666705920
author	Hyunsun Lim Youngmin Park Jung Hwa Hong Ki-Bong Yoo Kwon-Duk Seo
author_facet	Hyunsun Lim Youngmin Park Jung Hwa Hong Ki-Bong Yoo Kwon-Duk Seo
author_sort	Hyunsun Lim
collection	DOAJ
description	Abstract Background Many studies have evaluated stroke using claims data; most of these studies have defined ischemic stroke using an operational definition following the rule-based method. Rule-based methods tend to overestimate the number of patients with ischemic stroke. Objectives We aimed to identify an appropriate algorithm for identifying stroke by applying machine learning (ML) techniques to analyze the claims data. Methods We obtained the data from the Korean National Health Insurance Service database, which is linked to the Ilsan Hospital database (n = 30,897). The performance of prediction models (extreme gradient boosting [XGBoost] or gated recurrent unit [GRU]) was evaluated using the area under the receiver operating characteristic curve (AUROC), the area under precision–recall curve (AUPRC), and calibration curve. Results In total, 30,897 patients were enrolled in this study, 3145 of whom (10.18%) had ischemic stroke. XGBoost, a tree-based ML technique, had the AUROC was 94.46% and AUPRC was 92.80%. GRU showed the highest accuracy (99.81%), precision (99.92%) and recall (99.69%). Conclusions We proposed recurrent neural network-based deep learning techniques to improve stroke phenotyping. This can be expected to produce rapid and more accurate results than the rule-based methods.
first_indexed	2024-03-08T16:22:03Z
format	Article
id	doaj.art-149712666e1343a0baf3e827c5a63a20
institution	Directory Open Access Journal
issn	2047-783X
language	English
last_indexed	2024-03-08T16:22:03Z
publishDate	2024-01-01
publisher	BMC
record_format	Article
series	European Journal of Medical Research
spelling	doaj.art-149712666e1343a0baf3e827c5a63a202024-01-07T12:17:36ZengBMCEuropean Journal of Medical Research2047-783X2024-01-012911910.1186/s40001-023-01594-6Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based studyHyunsun Lim0Youngmin Park1Jung Hwa Hong2Ki-Bong Yoo3Kwon-Duk Seo4Department of Research and Analysis, National Health Insurance Service Ilsan HospitalDepartment of Family Medicine, National Health Insurance Service Ilsan HospitalDepartment of Research and Analysis, National Health Insurance Service Ilsan HospitalDivision of Health Administration, Yonsei UniversityDepartment of Neurology, National Health Insurance Service Ilsan HospitalAbstract Background Many studies have evaluated stroke using claims data; most of these studies have defined ischemic stroke using an operational definition following the rule-based method. Rule-based methods tend to overestimate the number of patients with ischemic stroke. Objectives We aimed to identify an appropriate algorithm for identifying stroke by applying machine learning (ML) techniques to analyze the claims data. Methods We obtained the data from the Korean National Health Insurance Service database, which is linked to the Ilsan Hospital database (n = 30,897). The performance of prediction models (extreme gradient boosting [XGBoost] or gated recurrent unit [GRU]) was evaluated using the area under the receiver operating characteristic curve (AUROC), the area under precision–recall curve (AUPRC), and calibration curve. Results In total, 30,897 patients were enrolled in this study, 3145 of whom (10.18%) had ischemic stroke. XGBoost, a tree-based ML technique, had the AUROC was 94.46% and AUPRC was 92.80%. GRU showed the highest accuracy (99.81%), precision (99.92%) and recall (99.69%). Conclusions We proposed recurrent neural network-based deep learning techniques to improve stroke phenotyping. This can be expected to produce rapid and more accurate results than the rule-based methods.https://doi.org/10.1186/s40001-023-01594-6PhenotypingIschemic strokeMachine learningDeep learningInsurance claim analysis
spellingShingle	Hyunsun Lim Youngmin Park Jung Hwa Hong Ki-Bong Yoo Kwon-Duk Seo Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study European Journal of Medical Research Phenotyping Ischemic stroke Machine learning Deep learning Insurance claim analysis
title	Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study
title_full	Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study
title_fullStr	Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study
title_full_unstemmed	Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study
title_short	Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study
title_sort	use of machine learning techniques for identifying ischemic stroke instead of the rule based methods a nationwide population based study
topic	Phenotyping Ischemic stroke Machine learning Deep learning Insurance claim analysis
url	https://doi.org/10.1186/s40001-023-01594-6
work_keys_str_mv	AT hyunsunlim useofmachinelearningtechniquesforidentifyingischemicstrokeinsteadoftherulebasedmethodsanationwidepopulationbasedstudy AT youngminpark useofmachinelearningtechniquesforidentifyingischemicstrokeinsteadoftherulebasedmethodsanationwidepopulationbasedstudy AT junghwahong useofmachinelearningtechniquesforidentifyingischemicstrokeinsteadoftherulebasedmethodsanationwidepopulationbasedstudy AT kibongyoo useofmachinelearningtechniquesforidentifyingischemicstrokeinsteadoftherulebasedmethodsanationwidepopulationbasedstudy AT kwondukseo useofmachinelearningtechniquesforidentifyingischemicstrokeinsteadoftherulebasedmethodsanationwidepopulationbasedstudy

Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study

Similar Items