A Method for Fast Selection of Machine-Learning Classifiers for Spam Filtering

The paper elaborates on how text analysis influences classification—a key part of the spam-filtering process. The authors propose a multistage meta-algorithm for checking classifier performance. As a result, the algorithm allows for the fast selection of the best-performing classifiers as well as fo...

Full description

Bibliographic Details
Main Authors: Sylwia Rapacz, Piotr Chołda, Marek Natkaniec
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/17/2083
Description
Summary:The paper elaborates on how text analysis influences classification—a key part of the spam-filtering process. The authors propose a multistage meta-algorithm for checking classifier performance. As a result, the algorithm allows for the fast selection of the best-performing classifiers as well as for the analysis of higher-dimensionality data. The last aspect is especially important when analyzing large datasets. The approach of cross-validation between different datasets for supervised learning is applied in the meta-algorithm. Three machine-learning methods allowing a user to classify e-mails as desirable (ham) or potentially harmful (spam) messages were compared in the paper to illustrate the operation of the meta-algorithm. The used methods are simple, but as the results showed, they are powerful enough. We use the following classifiers: <i>k</i>-nearest neighbours (<i>k</i>-NNs), support vector machines (SVM), and the naïve Bayes classifier (NB). The conducted research gave us the conclusion that multinomial naïve Bayes classifier can be an excellent weapon in the fight against the constantly increasing amount of spam messages. It was also confirmed that the proposed solution gives very accurate results.
ISSN:2079-9292