The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software...

Full description

Bibliographic Details
Main Authors: Isaac C. Ferreira, Marcelo V. C. Aragao, Edvard M. Oliveira, Bruno T. Kuehne, Edmilson M. Moreira, Otavio A. S. Carpinteiro
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9565223/
_version_ 1818735629364625408
author Isaac C. Ferreira
Marcelo V. C. Aragao
Edvard M. Oliveira
Bruno T. Kuehne
Edmilson M. Moreira
Otavio A. S. Carpinteiro
author_facet Isaac C. Ferreira
Marcelo V. C. Aragao
Edvard M. Oliveira
Bruno T. Kuehne
Edmilson M. Moreira
Otavio A. S. Carpinteiro
author_sort Isaac C. Ferreira
collection DOAJ
description Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajubá, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.
first_indexed 2024-12-18T00:24:18Z
format Article
id doaj.art-209603a806694194be8fd401fd8a81fa
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-18T00:24:18Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-209603a806694194be8fd401fd8a81fa2022-12-21T21:27:16ZengIEEEIEEE Access2169-35362021-01-01913861813863210.1109/ACCESS.2021.31189019565223The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)Isaac C. Ferreira0Marcelo V. C. Aragao1https://orcid.org/0000-0001-8999-8169Edvard M. Oliveira2Bruno T. Kuehne3https://orcid.org/0000-0003-2529-225XEdmilson M. Moreira4https://orcid.org/0000-0001-5059-9080Otavio A. S. Carpinteiro5https://orcid.org/0000-0002-7490-9255TRICOD Equipamentos Eletrônicos Indústria e Comércio LTDA, Itajubá, BrazilNational Institute of Telecommunications, Santa Rita do Sapucaí, BrazilResearch Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, BrazilResearch Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, BrazilResearch Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, BrazilResearch Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, BrazilSpam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajubá, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.https://ieeexplore.ieee.org/document/9565223/Electronic mail (e-mail)internetmachine learningnetwork securityopen source softwaresimple mail transfer protocol (SMTP)
spellingShingle Isaac C. Ferreira
Marcelo V. C. Aragao
Edvard M. Oliveira
Bruno T. Kuehne
Edmilson M. Moreira
Otavio A. S. Carpinteiro
The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
IEEE Access
Electronic mail (e-mail)
internet
machine learning
network security
open source software
simple mail transfer protocol (SMTP)
title The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
title_full The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
title_fullStr The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
title_full_unstemmed The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
title_short The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
title_sort development of the open machine learning based anti spam open malbas
topic Electronic mail (e-mail)
internet
machine learning
network security
open source software
simple mail transfer protocol (SMTP)
url https://ieeexplore.ieee.org/document/9565223/
work_keys_str_mv AT isaaccferreira thedevelopmentoftheopenmachinelearningbasedantispamopenmalbas
AT marcelovcaragao thedevelopmentoftheopenmachinelearningbasedantispamopenmalbas
AT edvardmoliveira thedevelopmentoftheopenmachinelearningbasedantispamopenmalbas
AT brunotkuehne thedevelopmentoftheopenmachinelearningbasedantispamopenmalbas
AT edmilsonmmoreira thedevelopmentoftheopenmachinelearningbasedantispamopenmalbas
AT otavioascarpinteiro thedevelopmentoftheopenmachinelearningbasedantispamopenmalbas
AT isaaccferreira developmentoftheopenmachinelearningbasedantispamopenmalbas
AT marcelovcaragao developmentoftheopenmachinelearningbasedantispamopenmalbas
AT edvardmoliveira developmentoftheopenmachinelearningbasedantispamopenmalbas
AT brunotkuehne developmentoftheopenmachinelearningbasedantispamopenmalbas
AT edmilsonmmoreira developmentoftheopenmachinelearningbasedantispamopenmalbas
AT otavioascarpinteiro developmentoftheopenmachinelearningbasedantispamopenmalbas