Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection

Spam is defined as junk and unwanted e-mail. The implementation of a reliable spam email filter becomes more and more important for e-mail users since they have to face with the growing amount of uninvited e-mails. The faults of spam classifiers are characterized by being more and more insufficient...

Full description

Bibliographic Details
Main Authors: Ghada Al-Rawashdeh, Rabiei Mamat, Noor Hafhizah Binti Abd Rahim
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8850011/
_version_ 1818935895423713280
author Ghada Al-Rawashdeh
Rabiei Mamat
Noor Hafhizah Binti Abd Rahim
author_facet Ghada Al-Rawashdeh
Rabiei Mamat
Noor Hafhizah Binti Abd Rahim
author_sort Ghada Al-Rawashdeh
collection DOAJ
description Spam is defined as junk and unwanted e-mail. The implementation of a reliable spam email filter becomes more and more important for e-mail users since they have to face with the growing amount of uninvited e-mails. The faults of spam classifiers are characterized by being more and more insufficient to handle huge volumes of relevant emails and to identify and detect the new spam email as example with high performance. The problem in spam classifiers is a huge number of features. Feature selection is an important task in keyword content classification for being among the most popular and effective methods for feature reduction. Accordingly, irrelevant and redundant features that can impede performance would be eliminated. Meta-heuristic optimization is to choose the optimal solution between possible multi-solutions, which respect the aim of this research that is the performance. The other problem is related to ambiguity of the effect of optimization feature selection on multiple classifiers algorithm which are popular used by previous work namely; K-nearest Neighbor, Naïve Bayesian and Support Vector Machine. Therefore, the aim of this research is to improve the accuracy of feature selection by applying hybrid Water Cycle and Simulated Annealing to optimize results and to evaluate the proposed Spam Detection. The methodology used in this study which consists of groundwork, induction, improvement, evaluation and comparison quality. The cross-validation was used for training and validation dataset and seven datasets were employed in testing the spam classification proposed. The results demonstrate that the meta-heuristic namely water cycle feature selection (WCFS) was employed and three ways of hybridization with Simulated Annealing as a feature selection employed. In comparison with other feature selection algorithms such as Harmony Search, Genetic Algorithm, and Particle Swarm, the hybridization interleaved hybridization outperformed other feature selection algorithms with accuracy 96.3%, on the other side the effect of using three classifier algorithms, the SVM was better than other of classifier algorithms with f-measurement 96.3%. The number of features using interleaved water cycle and Simulated Annealing the number of features has decreased to more than 50%.
first_indexed 2024-12-20T05:27:26Z
format Article
id doaj.art-019018b0682b424fb214f35c15fbf1ed
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-20T05:27:26Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-019018b0682b424fb214f35c15fbf1ed2022-12-21T19:51:50ZengIEEEIEEE Access2169-35362019-01-01714372114373410.1109/ACCESS.2019.29440898850011Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail DetectionGhada Al-Rawashdeh0https://orcid.org/0000-0003-1945-1779Rabiei Mamat1Noor Hafhizah Binti Abd Rahim2Department of Computer Science, University Malaysia Terengganu, Kuala Terengganu, MalaysiaDepartment of Computer Science, University Malaysia Terengganu, Kuala Terengganu, MalaysiaDepartment of Computer Science, University Malaysia Terengganu, Kuala Terengganu, MalaysiaSpam is defined as junk and unwanted e-mail. The implementation of a reliable spam email filter becomes more and more important for e-mail users since they have to face with the growing amount of uninvited e-mails. The faults of spam classifiers are characterized by being more and more insufficient to handle huge volumes of relevant emails and to identify and detect the new spam email as example with high performance. The problem in spam classifiers is a huge number of features. Feature selection is an important task in keyword content classification for being among the most popular and effective methods for feature reduction. Accordingly, irrelevant and redundant features that can impede performance would be eliminated. Meta-heuristic optimization is to choose the optimal solution between possible multi-solutions, which respect the aim of this research that is the performance. The other problem is related to ambiguity of the effect of optimization feature selection on multiple classifiers algorithm which are popular used by previous work namely; K-nearest Neighbor, Naïve Bayesian and Support Vector Machine. Therefore, the aim of this research is to improve the accuracy of feature selection by applying hybrid Water Cycle and Simulated Annealing to optimize results and to evaluate the proposed Spam Detection. The methodology used in this study which consists of groundwork, induction, improvement, evaluation and comparison quality. The cross-validation was used for training and validation dataset and seven datasets were employed in testing the spam classification proposed. The results demonstrate that the meta-heuristic namely water cycle feature selection (WCFS) was employed and three ways of hybridization with Simulated Annealing as a feature selection employed. In comparison with other feature selection algorithms such as Harmony Search, Genetic Algorithm, and Particle Swarm, the hybridization interleaved hybridization outperformed other feature selection algorithms with accuracy 96.3%, on the other side the effect of using three classifier algorithms, the SVM was better than other of classifier algorithms with f-measurement 96.3%. The number of features using interleaved water cycle and Simulated Annealing the number of features has decreased to more than 50%.https://ieeexplore.ieee.org/document/8850011/Water cycle algorithmclassification algorithmspam emailsimulating annealinghybridization.global search
spellingShingle Ghada Al-Rawashdeh
Rabiei Mamat
Noor Hafhizah Binti Abd Rahim
Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
IEEE Access
Water cycle algorithm
classification algorithm
spam email
simulating annealing
hybridization.
global search
title Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
title_full Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
title_fullStr Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
title_full_unstemmed Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
title_short Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
title_sort hybrid water cycle optimization algorithm with simulated annealing for spam e mail detection
topic Water cycle algorithm
classification algorithm
spam email
simulating annealing
hybridization.
global search
url https://ieeexplore.ieee.org/document/8850011/
work_keys_str_mv AT ghadaalrawashdeh hybridwatercycleoptimizationalgorithmwithsimulatedannealingforspamemaildetection
AT rabieimamat hybridwatercycleoptimizationalgorithmwithsimulatedannealingforspamemaildetection
AT noorhafhizahbintiabdrahim hybridwatercycleoptimizationalgorithmwithsimulatedannealingforspamemaildetection