Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks

An ensemble method is an approach where several classifiers are created from the training data which can be often more accurate than any of the single classifiers, especially if the base classifiers are accurate and different one each other. Menawhile, feature clustering can reduce feature space by...

Full description

Bibliographic Details
Main Author: Mediana Aryuni
Format: Article
Language:English
Published: Bina Nusantara University 2013-06-01
Series:ComTech
Subjects:
Online Access:https://journal.binus.ac.id/index.php/comtech/article/view/2745
_version_ 1827838931000885248
author Mediana Aryuni
author_facet Mediana Aryuni
author_sort Mediana Aryuni
collection DOAJ
description An ensemble method is an approach where several classifiers are created from the training data which can be often more accurate than any of the single classifiers, especially if the base classifiers are accurate and different one each other. Menawhile, feature clustering can reduce feature space by joining similar words into one cluster. The objective of this research is to develop a text categorization system that employs feature clustering based on ensemble feature selection. The research methodology consists of text documents preprocessing, feature subspaces generation using the genetic algorithm-based iterative refinement, implementation of base classifiers by applying feature clustering, and classification result integration of each base classifier using both the static selection and majority voting methods. Experimental results show that the computational time consumed in classifying the dataset into 2 and 3 categories using the feature clustering method is 1.18 and 27.04 seconds faster in compared to those that do not employ the feature selection method, respectively. Also, using static selection method, the ensemble feature selection method with genetic algorithm-based iterative refinement produces 10% and 10.66% better accuracy in compared to those produced by the single classifier in classifying the dataset into 2 and 3 categories, respectively. Whilst, using the majority voting method for the same experiment, the similar ensemble method produces 10% and 12% better accuracy than those produced by the single classifier, respectively.
first_indexed 2024-03-12T07:13:50Z
format Article
id doaj.art-1d75061a39264d39b83fcd57205007bb
institution Directory Open Access Journal
issn 2087-1244
2476-907X
language English
last_indexed 2024-03-12T07:13:50Z
publishDate 2013-06-01
publisher Bina Nusantara University
record_format Article
series ComTech
spelling doaj.art-1d75061a39264d39b83fcd57205007bb2023-09-02T22:55:49ZengBina Nusantara UniversityComTech2087-12442476-907X2013-06-014133334210.21512/comtech.v4i1.27452141Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen TeksMediana Aryuni0Bina Nusantara UniversityAn ensemble method is an approach where several classifiers are created from the training data which can be often more accurate than any of the single classifiers, especially if the base classifiers are accurate and different one each other. Menawhile, feature clustering can reduce feature space by joining similar words into one cluster. The objective of this research is to develop a text categorization system that employs feature clustering based on ensemble feature selection. The research methodology consists of text documents preprocessing, feature subspaces generation using the genetic algorithm-based iterative refinement, implementation of base classifiers by applying feature clustering, and classification result integration of each base classifier using both the static selection and majority voting methods. Experimental results show that the computational time consumed in classifying the dataset into 2 and 3 categories using the feature clustering method is 1.18 and 27.04 seconds faster in compared to those that do not employ the feature selection method, respectively. Also, using static selection method, the ensemble feature selection method with genetic algorithm-based iterative refinement produces 10% and 10.66% better accuracy in compared to those produced by the single classifier in classifying the dataset into 2 and 3 categories, respectively. Whilst, using the majority voting method for the same experiment, the similar ensemble method produces 10% and 12% better accuracy than those produced by the single classifier, respectively.https://journal.binus.ac.id/index.php/comtech/article/view/2745feature clustering, classification, ensemble feature selection
spellingShingle Mediana Aryuni
Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks
ComTech
feature clustering, classification, ensemble feature selection
title Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks
title_full Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks
title_fullStr Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks
title_full_unstemmed Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks
title_short Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks
title_sort penerapan ensemble feature selection dan klasterisasi fitur pada klasifikasi dokumen teks
topic feature clustering, classification, ensemble feature selection
url https://journal.binus.ac.id/index.php/comtech/article/view/2745
work_keys_str_mv AT medianaaryuni penerapanensemblefeatureselectiondanklasterisasifiturpadaklasifikasidokumenteks