A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms

The increasing volume of unsolicited bulk emails has become a major threat to global security. While a significant amount of research has been carried out in terms of proposing new and better algorithms for email spam detection, relatively less attention has been given to evaluation metrics. Some wi...

Full description

Bibliographic Details
Main Authors: Salman A. Khan, Kashif Iqbal, Nazeeruddin Mohammad, Rehan Akbar, Syed Saad Azhar Ali, Ammar Ahmed Siddiqui
Format: Article
Language:English
Published: MDPI AG 2022-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/14/7043
Description
Summary:The increasing volume of unsolicited bulk emails has become a major threat to global security. While a significant amount of research has been carried out in terms of proposing new and better algorithms for email spam detection, relatively less attention has been given to evaluation metrics. Some widely used metrics include accuracy, recall, precision, and F-score. This paper proposes a new evaluation metric based on the concepts of fuzzy logic. The proposed metric, termed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>μ</mi><mi>O</mi></msub></semantics></math></inline-formula>, combines accuracy, recall, and precision into a multi-criteria fuzzy function. Several possible evaluation rules are proposed. As proof of concept, a preliminary empirical analysis of the proposed scheme is carried out using two models, namely BERT (Bidirectional Encoder Representations from Transformers) and LSTM (Long short-term memory) from the domain of deep learning, while utilizing three benchmark datasets. Results indicate that for the Enron and PU datasets, LSTM produces better results of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>μ</mi><mi>O</mi></msub></semantics></math></inline-formula>, with the values in the range of 0.88 to 0.96, whereas BERT generates better values of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>μ</mi><mi>O</mi></msub></semantics></math></inline-formula> in the range of 0.94 to 0.96 for Lingspam dataset. Furthermore, extrinsic evaluation confirms the effectiveness of the proposed fuzzy logic metric.
ISSN:2076-3417