COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA

In machine learning study, classification analysis aims to minimize misclassification and also maximize the results of prediction accuracy. The main characteristic of this classification problem is that there is one class that significantly exceeds the number of samples of other classes. SMOTE minor...

Full description

Bibliographic Details
Main Authors:	Jus Prasetya, Abdurakhman Abdurakhman
Format:	Article
Language:	English
Published:	Universitas Diponegoro 2023-04-01
Series:	Media Statistika
Subjects:	machine learning classification smote random forest k-nearest neighbors
Online Access:	https://ejournal.undip.ac.id/index.php/media_statistika/article/view/42755

_version_	1827587970538930176
author	Jus Prasetya Abdurakhman Abdurakhman
author_facet	Jus Prasetya Abdurakhman Abdurakhman
author_sort	Jus Prasetya
collection	DOAJ
description	In machine learning study, classification analysis aims to minimize misclassification and also maximize the results of prediction accuracy. The main characteristic of this classification problem is that there is one class that significantly exceeds the number of samples of other classes. SMOTE minority class data is studied and extrapolated so that it can produce new synthetic samples. Random forest is a classification method consisting of a combination of mutually independent classification trees. K-Nearest Neighbors which is a classification method that labels the new sample based on the nearest neighbors of the new sample. SMOTE generates synthesis data in the minority class, namely class 1 (cervical cancer) to 585 observation respondents (samples) so that the total observation respondents are 1208 samples. SMOTE random forest resulted an accuracy of 96.28%, sensitivity 99.17%, specificity 93.44%, precision 93.70%, and AUC 96.30%. SMOTE K-Nearest Neighborss resulted an accuracy of 87.60%, sensitivity 77.50%, specificity 97.54%, precision 96.88%, and AUC 82.27%. SMOTE random forest produces a perfect classification model, SMOTE K-Nearest neighbors classification produces a good classification model, while the random forest and K-Nearest neighbors classification on imbalanced data results a failed classification model.
first_indexed	2024-03-09T00:26:05Z
format	Article
id	doaj.art-2def4a093d9a4b7abe4f9e8f63cfefb5
institution	Directory Open Access Journal
issn	1979-3693 2477-0647
language	English
last_indexed	2024-03-09T00:26:05Z
publishDate	2023-04-01
publisher	Universitas Diponegoro
record_format	Article
series	Media Statistika
spelling	doaj.art-2def4a093d9a4b7abe4f9e8f63cfefb52023-12-12T02:27:52ZengUniversitas DiponegoroMedia Statistika1979-36932477-06472023-04-0115219820810.14710/medstat.15.2.198-20821856COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATAJus Prasetya0Abdurakhman Abdurakhman1PROGRAM STUDI MAGISTER MATEMATIKA, Departemen Matematika, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Gadjah Mada, Sekip Utara BLS 21 Yogyakarta 55281, IndonesiaDepartment of Mathematics, Gadjah Mada University, Indonesia, IndonesiaIn machine learning study, classification analysis aims to minimize misclassification and also maximize the results of prediction accuracy. The main characteristic of this classification problem is that there is one class that significantly exceeds the number of samples of other classes. SMOTE minority class data is studied and extrapolated so that it can produce new synthetic samples. Random forest is a classification method consisting of a combination of mutually independent classification trees. K-Nearest Neighbors which is a classification method that labels the new sample based on the nearest neighbors of the new sample. SMOTE generates synthesis data in the minority class, namely class 1 (cervical cancer) to 585 observation respondents (samples) so that the total observation respondents are 1208 samples. SMOTE random forest resulted an accuracy of 96.28%, sensitivity 99.17%, specificity 93.44%, precision 93.70%, and AUC 96.30%. SMOTE K-Nearest Neighborss resulted an accuracy of 87.60%, sensitivity 77.50%, specificity 97.54%, precision 96.88%, and AUC 82.27%. SMOTE random forest produces a perfect classification model, SMOTE K-Nearest neighbors classification produces a good classification model, while the random forest and K-Nearest neighbors classification on imbalanced data results a failed classification model.https://ejournal.undip.ac.id/index.php/media_statistika/article/view/42755machine learningclassificationsmoterandom forestk-nearest neighbors
spellingShingle	Jus Prasetya Abdurakhman Abdurakhman COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA Media Statistika machine learning classification smote random forest k-nearest neighbors
title	COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA
title_full	COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA
title_fullStr	COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA
title_full_unstemmed	COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA
title_short	COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA
title_sort	comparison of smote random forest and smote k nearest neighbors classification analysis on imbalanced data
topic	machine learning classification smote random forest k-nearest neighbors
url	https://ejournal.undip.ac.id/index.php/media_statistika/article/view/42755
work_keys_str_mv	AT jusprasetya comparisonofsmoterandomforestandsmoteknearestneighborsclassificationanalysisonimbalanceddata AT abdurakhmanabdurakhman comparisonofsmoterandomforestandsmoteknearestneighborsclassificationanalysisonimbalanceddata

COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA

Similar Items