An evaluation on the efficiency of hybrid feature selection in spam email classification

In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the emai...

Full description

Bibliographic Details
Main Authors: Mohamad, M., Selamat, A.
Format: Conference or Workshop Item
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/59143/1/MasurahMohamad2015_AnEvaluationontheEfficiencyofHybrid.pdf
_version_ 1796860652514443264
author Mohamad, M.
Selamat, A.
author_facet Mohamad, M.
Selamat, A.
author_sort Mohamad, M.
collection ePrints
description In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the email service provider. In addition, the issue of spam is always highlighted by Internet users and attracts many researchers to conduct research works on fighting the spam. A number of frameworks, algorithms, toolkits, systems and applications have been proposed, developed and applied by researchers and developers to protect us from spam. Several steps need to be considered in the classification task such as data pre-processing, feature selection, feature extraction, training and testing. One of the main processes in the classification task is called feature selection, which is used to reduce the dimensionality of word frequency without affecting the performance of the classification task. In conjunction with that, we had taken the initiative to conduct an experiment to test the efficiency of the proposed Hybrid Feature Selection, which is a combination of Term Frequency Inverse Document Frequency (TFIDF) with the rough set theory in spam email classification problem. The result shows that the proposed Hybrid Feature Selection return a good result.
first_indexed 2024-03-05T19:44:30Z
format Conference or Workshop Item
id utm.eprints-59143
institution Universiti Teknologi Malaysia - ePrints
language English
last_indexed 2024-03-05T19:44:30Z
publishDate 2015
record_format dspace
spelling utm.eprints-591432021-09-30T05:55:27Z http://eprints.utm.my/59143/ An evaluation on the efficiency of hybrid feature selection in spam email classification Mohamad, M. Selamat, A. QA75 Electronic computers. Computer science In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the email service provider. In addition, the issue of spam is always highlighted by Internet users and attracts many researchers to conduct research works on fighting the spam. A number of frameworks, algorithms, toolkits, systems and applications have been proposed, developed and applied by researchers and developers to protect us from spam. Several steps need to be considered in the classification task such as data pre-processing, feature selection, feature extraction, training and testing. One of the main processes in the classification task is called feature selection, which is used to reduce the dimensionality of word frequency without affecting the performance of the classification task. In conjunction with that, we had taken the initiative to conduct an experiment to test the efficiency of the proposed Hybrid Feature Selection, which is a combination of Term Frequency Inverse Document Frequency (TFIDF) with the rough set theory in spam email classification problem. The result shows that the proposed Hybrid Feature Selection return a good result. 2015 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/59143/1/MasurahMohamad2015_AnEvaluationontheEfficiencyofHybrid.pdf Mohamad, M. and Selamat, A. (2015) An evaluation on the efficiency of hybrid feature selection in spam email classification. In: 2nd International Conference on Computer, Communications, and Control Technology, I4CT 2015, 21-23 Apr 2015, Kuching, Sarawak. http://www.dx.doi.org/10.1109/I4CT.2015.7219571
spellingShingle QA75 Electronic computers. Computer science
Mohamad, M.
Selamat, A.
An evaluation on the efficiency of hybrid feature selection in spam email classification
title An evaluation on the efficiency of hybrid feature selection in spam email classification
title_full An evaluation on the efficiency of hybrid feature selection in spam email classification
title_fullStr An evaluation on the efficiency of hybrid feature selection in spam email classification
title_full_unstemmed An evaluation on the efficiency of hybrid feature selection in spam email classification
title_short An evaluation on the efficiency of hybrid feature selection in spam email classification
title_sort evaluation on the efficiency of hybrid feature selection in spam email classification
topic QA75 Electronic computers. Computer science
url http://eprints.utm.my/59143/1/MasurahMohamad2015_AnEvaluationontheEfficiencyofHybrid.pdf
work_keys_str_mv AT mohamadm anevaluationontheefficiencyofhybridfeatureselectioninspamemailclassification
AT selamata anevaluationontheefficiencyofhybridfeatureselectioninspamemailclassification
AT mohamadm evaluationontheefficiencyofhybridfeatureselectioninspamemailclassification
AT selamata evaluationontheefficiencyofhybridfeatureselectioninspamemailclassification