Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method

Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining...

Full description

Bibliographic Details
Main Authors: Ronald Sebastian, Christina Juliane
Format: Article
Language:Indonesian
Published: Universitas Muhammadiyah Purwokerto 2023-11-01
Series:Jurnal Informatika
Subjects:
Online Access:https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348
_version_ 1797542950097387520
author Ronald Sebastian
Christina Juliane
author_facet Ronald Sebastian
Christina Juliane
author_sort Ronald Sebastian
collection DOAJ
description Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations.
first_indexed 2024-03-10T13:37:47Z
format Article
id doaj.art-6d58d5aeebe94e2da753f3066f0925b9
institution Directory Open Access Journal
issn 2086-9398
2579-8901
language Indonesian
last_indexed 2024-03-10T13:37:47Z
publishDate 2023-11-01
publisher Universitas Muhammadiyah Purwokerto
record_format Article
series Jurnal Informatika
spelling doaj.art-6d58d5aeebe94e2da753f3066f0925b92023-11-21T06:37:19ZindUniversitas Muhammadiyah PurwokertoJurnal Informatika2086-93982579-89012023-11-0111231132110.30595/juita.v11i2.173485721Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling MethodRonald Sebastian0Christina Juliane1<em><span lang="EN-AU">Program Studi Pascasarjana, Magister Sistem Informasi Bisnis STMIK LIKMI</span></em><em><span lang="EN-AU">Program Studi Pascasarjana, Magister Sistem Informasi Bisnis STMIK LIKMI</span></em>Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations.https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
spellingShingle Ronald Sebastian
Christina Juliane
Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
Jurnal Informatika
smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
title Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_full Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_fullStr Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_full_unstemmed Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_short Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_sort comparison of data mining classification algorithms for stroke disease prediction using the smote upsampling method
topic smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
url https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348
work_keys_str_mv AT ronaldsebastian comparisonofdataminingclassificationalgorithmsforstrokediseasepredictionusingthesmoteupsamplingmethod
AT christinajuliane comparisonofdataminingclassificationalgorithmsforstrokediseasepredictionusingthesmoteupsamplingmethod