Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method

Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining...

Full description

Bibliographic Details
Main Authors:	Ronald Sebastian, Christina Juliane
Format:	Article
Language:	Indonesian
Published:	Universitas Muhammadiyah Purwokerto 2023-11-01
Series:	Jurnal Informatika
Subjects:	smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
Online Access:	https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348

_version_	1827698848091340800
author	Ronald Sebastian Christina Juliane
author_facet	Ronald Sebastian Christina Juliane
author_sort	Ronald Sebastian
collection	DOAJ
description	Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations.
first_indexed	2024-03-10T13:37:47Z
format	Article
id	doaj.art-6d58d5aeebe94e2da753f3066f0925b9
institution	Directory Open Access Journal
issn	2086-9398 2579-8901
language	Indonesian
last_indexed	2024-03-10T13:37:47Z
publishDate	2023-11-01
publisher	Universitas Muhammadiyah Purwokerto
record_format	Article
series	Jurnal Informatika
spelling	doaj.art-6d58d5aeebe94e2da753f3066f0925b92023-11-21T06:37:19ZindUniversitas Muhammadiyah PurwokertoJurnal Informatika2086-93982579-89012023-11-0111231132110.30595/juita.v11i2.173485721Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling MethodRonald Sebastian0Christina Juliane1<em><span lang="EN-AU">Program Studi Pascasarjana, Magister Sistem Informasi Bisnis STMIK LIKMI</span></em><em><span lang="EN-AU">Program Studi Pascasarjana, Magister Sistem Informasi Bisnis STMIK LIKMI</span></em>Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations.https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
spellingShingle	Ronald Sebastian Christina Juliane Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method Jurnal Informatika smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
title	Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_full	Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_fullStr	Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_full_unstemmed	Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_short	Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method
title_sort	comparison of data mining classification algorithms for stroke disease prediction using the smote upsampling method
topic	smote upsampling, k-nearest neighbour, naïve bayes, c4.5, stroke
url	https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/17348
work_keys_str_mv	AT ronaldsebastian comparisonofdataminingclassificationalgorithmsforstrokediseasepredictionusingthesmoteupsamplingmethod AT christinajuliane comparisonofdataminingclassificationalgorithmsforstrokediseasepredictionusingthesmoteupsamplingmethod

Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method

Similar Items