An evaluation on the efficiency of hybrid feature selection in spam email classification
In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the emai...
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://eprints.utm.my/59143/1/MasurahMohamad2015_AnEvaluationontheEfficiencyofHybrid.pdf |
_version_ | 1796860652514443264 |
---|---|
author | Mohamad, M. Selamat, A. |
author_facet | Mohamad, M. Selamat, A. |
author_sort | Mohamad, M. |
collection | ePrints |
description | In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the email service provider. In addition, the issue of spam is always highlighted by Internet users and attracts many researchers to conduct research works on fighting the spam. A number of frameworks, algorithms, toolkits, systems and applications have been proposed, developed and applied by researchers and developers to protect us from spam. Several steps need to be considered in the classification task such as data pre-processing, feature selection, feature extraction, training and testing. One of the main processes in the classification task is called feature selection, which is used to reduce the dimensionality of word frequency without affecting the performance of the classification task. In conjunction with that, we had taken the initiative to conduct an experiment to test the efficiency of the proposed Hybrid Feature Selection, which is a combination of Term Frequency Inverse Document Frequency (TFIDF) with the rough set theory in spam email classification problem. The result shows that the proposed Hybrid Feature Selection return a good result. |
first_indexed | 2024-03-05T19:44:30Z |
format | Conference or Workshop Item |
id | utm.eprints-59143 |
institution | Universiti Teknologi Malaysia - ePrints |
language | English |
last_indexed | 2024-03-05T19:44:30Z |
publishDate | 2015 |
record_format | dspace |
spelling | utm.eprints-591432021-09-30T05:55:27Z http://eprints.utm.my/59143/ An evaluation on the efficiency of hybrid feature selection in spam email classification Mohamad, M. Selamat, A. QA75 Electronic computers. Computer science In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the email service provider. In addition, the issue of spam is always highlighted by Internet users and attracts many researchers to conduct research works on fighting the spam. A number of frameworks, algorithms, toolkits, systems and applications have been proposed, developed and applied by researchers and developers to protect us from spam. Several steps need to be considered in the classification task such as data pre-processing, feature selection, feature extraction, training and testing. One of the main processes in the classification task is called feature selection, which is used to reduce the dimensionality of word frequency without affecting the performance of the classification task. In conjunction with that, we had taken the initiative to conduct an experiment to test the efficiency of the proposed Hybrid Feature Selection, which is a combination of Term Frequency Inverse Document Frequency (TFIDF) with the rough set theory in spam email classification problem. The result shows that the proposed Hybrid Feature Selection return a good result. 2015 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/59143/1/MasurahMohamad2015_AnEvaluationontheEfficiencyofHybrid.pdf Mohamad, M. and Selamat, A. (2015) An evaluation on the efficiency of hybrid feature selection in spam email classification. In: 2nd International Conference on Computer, Communications, and Control Technology, I4CT 2015, 21-23 Apr 2015, Kuching, Sarawak. http://www.dx.doi.org/10.1109/I4CT.2015.7219571 |
spellingShingle | QA75 Electronic computers. Computer science Mohamad, M. Selamat, A. An evaluation on the efficiency of hybrid feature selection in spam email classification |
title | An evaluation on the efficiency of hybrid feature selection in spam email classification |
title_full | An evaluation on the efficiency of hybrid feature selection in spam email classification |
title_fullStr | An evaluation on the efficiency of hybrid feature selection in spam email classification |
title_full_unstemmed | An evaluation on the efficiency of hybrid feature selection in spam email classification |
title_short | An evaluation on the efficiency of hybrid feature selection in spam email classification |
title_sort | evaluation on the efficiency of hybrid feature selection in spam email classification |
topic | QA75 Electronic computers. Computer science |
url | http://eprints.utm.my/59143/1/MasurahMohamad2015_AnEvaluationontheEfficiencyofHybrid.pdf |
work_keys_str_mv | AT mohamadm anevaluationontheefficiencyofhybridfeatureselectioninspamemailclassification AT selamata anevaluationontheefficiencyofhybridfeatureselectioninspamemailclassification AT mohamadm evaluationontheefficiencyofhybridfeatureselectioninspamemailclassification AT selamata evaluationontheefficiencyofhybridfeatureselectioninspamemailclassification |