Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining...
Main Authors: | , |
---|---|
Format: | Article |
Language: | Indonesian |
Published: |
Universitas Muhammadiyah Purwokerto
2023-11-01
|
Series: | Jurnal Informatika |
Subjects: | |
Online Access: | https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348 |
_version_ | 1797542950097387520 |
---|---|
author | Ronald Sebastian Christina Juliane |
author_facet | Ronald Sebastian Christina Juliane |
author_sort | Ronald Sebastian |
collection | DOAJ |
description | Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations. |
first_indexed | 2024-03-10T13:37:47Z |
format | Article |
id | doaj.art-6d58d5aeebe94e2da753f3066f0925b9 |
institution | Directory Open Access Journal |
issn | 2086-9398 2579-8901 |
language | Indonesian |
last_indexed | 2024-03-10T13:37:47Z |
publishDate | 2023-11-01 |
publisher | Universitas Muhammadiyah Purwokerto |
record_format | Article |
series | Jurnal Informatika |
spelling | doaj.art-6d58d5aeebe94e2da753f3066f0925b92023-11-21T06:37:19ZindUniversitas Muhammadiyah PurwokertoJurnal Informatika2086-93982579-89012023-11-0111231132110.30595/juita.v11i2.173485721Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling MethodRonald Sebastian0Christina Juliane1<em><span lang="EN-AU">Program Studi Pascasarjana, Magister Sistem Informasi Bisnis STMIK LIKMI</span></em><em><span lang="EN-AU">Program Studi Pascasarjana, Magister Sistem Informasi Bisnis STMIK LIKMI</span></em>Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations.https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke |
spellingShingle | Ronald Sebastian Christina Juliane Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method Jurnal Informatika smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke |
title | Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method |
title_full | Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method |
title_fullStr | Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method |
title_full_unstemmed | Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method |
title_short | Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method |
title_sort | comparison of data mining classification algorithms for stroke disease prediction using the smote upsampling method |
topic | smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke |
url | https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348 |
work_keys_str_mv | AT ronaldsebastian comparisonofdataminingclassificationalgorithmsforstrokediseasepredictionusingthesmoteupsamplingmethod AT christinajuliane comparisonofdataminingclassificationalgorithmsforstrokediseasepredictionusingthesmoteupsamplingmethod |