Study on rough set and chi square statistic feature selection for spam classification
64 p.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/35982 |
_version_ | 1811689723175895040 |
---|---|
author | Juniarto Samsudin. |
author2 | Zhong Zhaowei |
author_facet | Zhong Zhaowei Juniarto Samsudin. |
author_sort | Juniarto Samsudin. |
collection | NTU |
description | 64 p. |
first_indexed | 2024-10-01T05:52:38Z |
format | Thesis |
id | ntu-10356/35982 |
institution | Nanyang Technological University |
last_indexed | 2024-10-01T05:52:38Z |
publishDate | 2010 |
record_format | dspace |
spelling | ntu-10356/359822023-03-11T17:06:55Z Study on rough set and chi square statistic feature selection for spam classification Juniarto Samsudin. Zhong Zhaowei School of Mechanical and Aerospace Engineering DRNTU::Engineering::Systems engineering 64 p. Spam messages waste time and resources to the recipients. This dissertation presents the effectiveness of feature selections, particularly,rough set and chi square statistic feature selection methods in combination with J48 decision tree classifier for e-mail classification. Experiments were performed on SpamAssassin corpus, with features selected using word's age, chi square statistic and rough set attribute reduction. Performance is measured based on 10 fold cross validation in terms of Area Under Receiving Operating Characteristic Curve (AUC), precision and recall. The results show feature selection not only can improve the performance of the classifier, but also is a very essential step in e-mail classification. The experiments also reveal that e-mail messages contain a great deal of noise and bad features, which should be removed to increase the performance of the classifier. Master of Science (Smart Product Design) 2010-04-23T02:21:46Z 2010-04-23T02:21:46Z 2007 2007 Thesis http://hdl.handle.net/10356/35982 application/pdf |
spellingShingle | DRNTU::Engineering::Systems engineering Juniarto Samsudin. Study on rough set and chi square statistic feature selection for spam classification |
title | Study on rough set and chi square statistic feature selection for spam classification |
title_full | Study on rough set and chi square statistic feature selection for spam classification |
title_fullStr | Study on rough set and chi square statistic feature selection for spam classification |
title_full_unstemmed | Study on rough set and chi square statistic feature selection for spam classification |
title_short | Study on rough set and chi square statistic feature selection for spam classification |
title_sort | study on rough set and chi square statistic feature selection for spam classification |
topic | DRNTU::Engineering::Systems engineering |
url | http://hdl.handle.net/10356/35982 |
work_keys_str_mv | AT juniartosamsudin studyonroughsetandchisquarestatisticfeatureselectionforspamclassification |