A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things
In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic usi...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-08-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/16/1955 |
_version_ | 1797524119803133952 |
---|---|
author | Ikram Sumaiya Thaseen Vanitha Mohanraj Sakthivel Ramachandran Kishore Sanapala Sang-Soo Yeo |
author_facet | Ikram Sumaiya Thaseen Vanitha Mohanraj Sakthivel Ramachandran Kishore Sanapala Sang-Soo Yeo |
author_sort | Ikram Sumaiya Thaseen |
collection | DOAJ |
description | In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS. |
first_indexed | 2024-03-10T08:52:48Z |
format | Article |
id | doaj.art-9c5882ff854a4b6bafa4b6eaf862c802 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T08:52:48Z |
publishDate | 2021-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-9c5882ff854a4b6bafa4b6eaf862c8022023-11-22T07:25:03ZengMDPI AGElectronics2079-92922021-08-011016195510.3390/electronics10161955A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of ThingsIkram Sumaiya Thaseen0Vanitha Mohanraj1Sakthivel Ramachandran2Kishore Sanapala3Sang-Soo Yeo4School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaSchool of Electronics Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaDepartment of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad 500043, Telangana, IndiaDepartment of Convergence Computer & Media, Mokwon University, Daejeon 35349, KoreaIn recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS.https://www.mdpi.com/2079-9292/10/16/1955HadoopInternet of Thingsanomaly detectionartificial immune networkhyperparametersK-nearest neighbor |
spellingShingle | Ikram Sumaiya Thaseen Vanitha Mohanraj Sakthivel Ramachandran Kishore Sanapala Sang-Soo Yeo A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things Electronics Hadoop Internet of Things anomaly detection artificial immune network hyperparameters K-nearest neighbor |
title | A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things |
title_full | A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things |
title_fullStr | A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things |
title_full_unstemmed | A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things |
title_short | A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things |
title_sort | hadoop based framework integrating machine learning classifiers for anomaly detection in the internet of things |
topic | Hadoop Internet of Things anomaly detection artificial immune network hyperparameters K-nearest neighbor |
url | https://www.mdpi.com/2079-9292/10/16/1955 |
work_keys_str_mv | AT ikramsumaiyathaseen ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT vanithamohanraj ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT sakthivelramachandran ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT kishoresanapala ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT sangsooyeo ahadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT ikramsumaiyathaseen hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT vanithamohanraj hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT sakthivelramachandran hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT kishoresanapala hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings AT sangsooyeo hadoopbasedframeworkintegratingmachinelearningclassifiersforanomalydetectionintheinternetofthings |