Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor
There was an increase in the number of late payments of tuition fees by 3,018 from a total of 5,535 students at the end of 2020. This study uses the Python library which requires data to be of numeric type, so it requires data transformation according to the type of data in the study, data that has...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universitas Islam Negeri Sunan Kalijaga Yogyakarta
2022-09-01
|
Series: | JISKA (Jurnal Informatika Sunan Kalijaga) |
Subjects: | |
Online Access: | https://ejournal.uin-suka.ac.id/saintek/JISKA/article/view/3533 |
_version_ | 1797758861073973248 |
---|---|
author | Muhammad Norhalimi Taghfirul Azhima Yoga Siswa |
author_facet | Muhammad Norhalimi Taghfirul Azhima Yoga Siswa |
author_sort | Muhammad Norhalimi |
collection | DOAJ |
description |
There was an increase in the number of late payments of tuition fees by 3,018 from a total of 5,535 students at the end of 2020. This study uses the Python library which requires data to be of numeric type, so it requires data transformation according to the type of data in the study, data that has a scale is transformed using an ordinal encoder, and data that does not have a scale is transformed using one-hot encoding. The purpose of this study was to evaluate the performance of the Naïve Bayes algorithm and K-Nearest Neighbor with a confusion matrix in predicting late payment of tuition fees at UMKT. The dataset used in this study was sourced from the financial administration bureau as many as 12,408 data with a distribution of 90:10. Based on the results of the calculation of the selection of information gain features, the best 4 attributes that influence the research are obtained, namely faculty, study program, class, and gender. The results of the evaluation of the confusion matrix that have the best performance using the Naïve Bayes with information gain algorithm obtain an accuracy of 55.19%, while the K-Nearest Neighbor with information gain only obtains an accuracy of 50.76%. Based on the accuracy results obtained in the prediction of late payment of tuition fees by using attributes derived from information gain, it influences increasing the accuracy of Naïve Bayes, but the use of the information gain attribute on the K-Nearest Neighbor algorithm makes the accuracy obtained decrease.
|
first_indexed | 2024-03-12T18:36:06Z |
format | Article |
id | doaj.art-a2cadad6da5641df94d14b3f7185536b |
institution | Directory Open Access Journal |
issn | 2527-5836 2528-0074 |
language | English |
last_indexed | 2024-03-12T18:36:06Z |
publishDate | 2022-09-01 |
publisher | Universitas Islam Negeri Sunan Kalijaga Yogyakarta |
record_format | Article |
series | JISKA (Jurnal Informatika Sunan Kalijaga) |
spelling | doaj.art-a2cadad6da5641df94d14b3f7185536b2023-08-02T08:02:35ZengUniversitas Islam Negeri Sunan Kalijaga YogyakartaJISKA (Jurnal Informatika Sunan Kalijaga)2527-58362528-00742022-09-0173Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest NeighborMuhammad Norhalimi0Taghfirul Azhima Yoga Siswa1Universitas Muhammadiyah Kalimantan TimurUniversitas Muhammadiyah Kalimantan Timur There was an increase in the number of late payments of tuition fees by 3,018 from a total of 5,535 students at the end of 2020. This study uses the Python library which requires data to be of numeric type, so it requires data transformation according to the type of data in the study, data that has a scale is transformed using an ordinal encoder, and data that does not have a scale is transformed using one-hot encoding. The purpose of this study was to evaluate the performance of the Naïve Bayes algorithm and K-Nearest Neighbor with a confusion matrix in predicting late payment of tuition fees at UMKT. The dataset used in this study was sourced from the financial administration bureau as many as 12,408 data with a distribution of 90:10. Based on the results of the calculation of the selection of information gain features, the best 4 attributes that influence the research are obtained, namely faculty, study program, class, and gender. The results of the evaluation of the confusion matrix that have the best performance using the Naïve Bayes with information gain algorithm obtain an accuracy of 55.19%, while the K-Nearest Neighbor with information gain only obtains an accuracy of 50.76%. Based on the accuracy results obtained in the prediction of late payment of tuition fees by using attributes derived from information gain, it influences increasing the accuracy of Naïve Bayes, but the use of the information gain attribute on the K-Nearest Neighbor algorithm makes the accuracy obtained decrease. https://ejournal.uin-suka.ac.id/saintek/JISKA/article/view/3533PredictionNaïve BayesK-Nearest NeighborInformation GainConfusion Matrix |
spellingShingle | Muhammad Norhalimi Taghfirul Azhima Yoga Siswa Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor JISKA (Jurnal Informatika Sunan Kalijaga) Prediction Naïve Bayes K-Nearest Neighbor Information Gain Confusion Matrix |
title | Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor |
title_full | Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor |
title_fullStr | Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor |
title_full_unstemmed | Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor |
title_short | Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor |
title_sort | optimasi seleksi fitur information gain pada algoritma naive bayes dan k nearest neighbor |
topic | Prediction Naïve Bayes K-Nearest Neighbor Information Gain Confusion Matrix |
url | https://ejournal.uin-suka.ac.id/saintek/JISKA/article/view/3533 |
work_keys_str_mv | AT muhammadnorhalimi optimasiseleksifiturinformationgainpadaalgoritmanaivebayesdanknearestneighbor AT taghfirulazhimayogasiswa optimasiseleksifiturinformationgainpadaalgoritmanaivebayesdanknearestneighbor |