Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]

The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go bey...

Full description

Bibliographic Details
Main Authors:	Shohel Sayeed, Abu Fuad Ahmad, Tan Choo Peng
Format:	Article
Language:	English
Published:	F1000 Research Ltd 2022-01-01
Series:	F1000Research
Subjects:	IoT Big Data Analytics Data Cleaning Data Imputation Feature Engineering eng
Online Access:	https://f1000research.com/articles/11-17/v1

_version_	1797348024880463872
author	Shohel Sayeed Abu Fuad Ahmad Tan Choo Peng
author_facet	Shohel Sayeed Abu Fuad Ahmad Tan Choo Peng
author_sort	Shohel Sayeed
collection	DOAJ
description	The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.
first_indexed	2024-03-08T11:57:27Z
format	Article
id	doaj.art-4eb9ddec60654c198f16b5369314bd71
institution	Directory Open Access Journal
issn	2046-1402
language	English
last_indexed	2024-03-08T11:57:27Z
publishDate	2022-01-01
publisher	F1000 Research Ltd
record_format	Article
series	F1000Research
spelling	doaj.art-4eb9ddec60654c198f16b5369314bd712024-01-24T01:00:00ZengF1000 Research LtdF1000Research2046-14022022-01-011177276Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]Shohel Sayeed0https://orcid.org/0000-0002-0052-4870Abu Fuad Ahmad1Tan Choo Peng2https://orcid.org/0000-0003-2350-7755Faculty of Information Science and Technology, Multimedia University, Melaka, Melaka, 75450, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka, Melaka, 75450, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka, Melaka, 75450, MalaysiaThe Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.https://f1000research.com/articles/11-17/v1IoT Big Data Analytics Data Cleaning Data Imputation Feature Engineeringeng
spellingShingle	Shohel Sayeed Abu Fuad Ahmad Tan Choo Peng Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved] F1000Research IoT Big Data Analytics Data Cleaning Data Imputation Feature Engineering eng
title	Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]
title_full	Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]
title_fullStr	Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]
title_full_unstemmed	Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]
title_short	Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]
title_sort	smartic a smart tool for big data analytics and iot version 1 peer review 2 approved
topic	IoT Big Data Analytics Data Cleaning Data Imputation Feature Engineering eng
url	https://f1000research.com/articles/11-17/v1
work_keys_str_mv	AT shohelsayeed smarticasmarttoolforbigdataanalyticsandiotversion1peerreview2approved AT abufuadahmad smarticasmarttoolforbigdataanalyticsandiotversion1peerreview2approved AT tanchoopeng smarticasmarttoolforbigdataanalyticsandiotversion1peerreview2approved

Smartic: A smart tool for Big Data analytics and IoT [version 1; peer review: 2 approved]

Similar Items