Study on rough set and chi square statistic feature selection for spam classification

64 p.

Bibliographic Details
Main Author: Juniarto Samsudin.
Other Authors: Zhong Zhaowei
Format: Thesis
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/10356/35982
_version_ 1811689723175895040
author Juniarto Samsudin.
author2 Zhong Zhaowei
author_facet Zhong Zhaowei
Juniarto Samsudin.
author_sort Juniarto Samsudin.
collection NTU
description 64 p.
first_indexed 2024-10-01T05:52:38Z
format Thesis
id ntu-10356/35982
institution Nanyang Technological University
last_indexed 2024-10-01T05:52:38Z
publishDate 2010
record_format dspace
spelling ntu-10356/359822023-03-11T17:06:55Z Study on rough set and chi square statistic feature selection for spam classification Juniarto Samsudin. Zhong Zhaowei School of Mechanical and Aerospace Engineering DRNTU::Engineering::Systems engineering 64 p. Spam messages waste time and resources to the recipients. This dissertation presents the effectiveness of feature selections, particularly,rough set and chi square statistic feature selection methods in combination with J48 decision tree classifier for e-mail classification. Experiments were performed on SpamAssassin corpus, with features selected using word's age, chi square statistic and rough set attribute reduction. Performance is measured based on 10 fold cross validation in terms of Area Under Receiving Operating Characteristic Curve (AUC), precision and recall. The results show feature selection not only can improve the performance of the classifier, but also is a very essential step in e-mail classification. The experiments also reveal that e-mail messages contain a great deal of noise and bad features, which should be removed to increase the performance of the classifier. Master of Science (Smart Product Design) 2010-04-23T02:21:46Z 2010-04-23T02:21:46Z 2007 2007 Thesis http://hdl.handle.net/10356/35982 application/pdf
spellingShingle DRNTU::Engineering::Systems engineering
Juniarto Samsudin.
Study on rough set and chi square statistic feature selection for spam classification
title Study on rough set and chi square statistic feature selection for spam classification
title_full Study on rough set and chi square statistic feature selection for spam classification
title_fullStr Study on rough set and chi square statistic feature selection for spam classification
title_full_unstemmed Study on rough set and chi square statistic feature selection for spam classification
title_short Study on rough set and chi square statistic feature selection for spam classification
title_sort study on rough set and chi square statistic feature selection for spam classification
topic DRNTU::Engineering::Systems engineering
url http://hdl.handle.net/10356/35982
work_keys_str_mv AT juniartosamsudin studyonroughsetandchisquarestatisticfeatureselectionforspamclassification