A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things

In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic usi...

Full description

Bibliographic Details
Main Authors: Ikram Sumaiya Thaseen, Vanitha Mohanraj, Sakthivel Ramachandran, Kishore Sanapala, Sang-Soo Yeo
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/16/1955
_version_ 1797524119803133952
author Ikram Sumaiya Thaseen
Vanitha Mohanraj
Sakthivel Ramachandran
Kishore Sanapala
Sang-Soo Yeo
author_facet Ikram Sumaiya Thaseen
Vanitha Mohanraj
Sakthivel Ramachandran
Kishore Sanapala
Sang-Soo Yeo
author_sort Ikram Sumaiya Thaseen
collection DOAJ
description In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS.
first_indexed 2024-03-10T08:52:48Z
format Article
id doaj.art-9c5882ff854a4b6bafa4b6eaf862c802
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T08:52:48Z
publishDate 2021-08-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-9c5882ff854a4b6bafa4b6eaf862c8022023-11-22T07:25:03ZengMDPI AGElectronics2079-92922021-08-011016195510.3390/electronics10161955A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of ThingsIkram Sumaiya Thaseen0Vanitha Mohanraj1Sakthivel Ramachandran2Kishore Sanapala3Sang-Soo Yeo4School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaSchool of Electronics Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaDepartment of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad 500043, Telangana, IndiaDepartment of Convergence Computer & Media, Mokwon University, Daejeon 35349, KoreaIn recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS.https://www.mdpi.com/2079-9292/10/16/1955HadoopInternet of Thingsanomaly detectionartificial immune networkhyperparametersK-nearest neighbor
spellingShingle Ikram Sumaiya Thaseen
Vanitha Mohanraj
Sakthivel Ramachandran
Kishore Sanapala
Sang-Soo Yeo
A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
Electronics
Hadoop
Internet of Things
anomaly detection
artificial immune network
hyperparameters
K-nearest neighbor
title A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
title_full A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
title_fullStr A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
title_full_unstemmed A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
title_short A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
title_sort hadoop based framework integrating machine learning classifiers for anomaly detection in the internet of things
topic Hadoop
Internet of Things
anomaly detection
artificial immune network
hyperparameters
K-nearest neighbor
url https://www.mdpi.com/2079-9292/10/16/1955
work_keys_str_mv AT ikramsumaiyathaseen ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT vanithamohanraj ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT sakthivelramachandran ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT kishoresanapala ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT sangsooyeo ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT ikramsumaiyathaseen hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT vanithamohanraj hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT sakthivelramachandran hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT kishoresanapala hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings
AT sangsooyeo hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings