Scalable Clustering Algorithms for Big Data: A Review

Clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount o...

Full description

Bibliographic Details
Main Authors: Mahmoud A. Mahdi, Khalid M. Hosny, Ibrahim Elhenawy
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9440980/
_version_ 1828406819876241408
author Mahmoud A. Mahdi
Khalid M. Hosny
Ibrahim Elhenawy
author_facet Mahmoud A. Mahdi
Khalid M. Hosny
Ibrahim Elhenawy
author_sort Mahmoud A. Mahdi
collection DOAJ
description Clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing research work to develop different algorithms to facilitate complex clustering processes, there are still many difficulties that arise while dealing with a large volume of data. In this paper, we review the most relevant clustering algorithms in a categorized manner, provide a comparison of clustering methods for large-scale data and explain the overall challenges based on clustering type. The key idea of the paper is to highlight the main advantages and disadvantages of clustering algorithms for dealing with big data in a scalable approach behind the different other features.
first_indexed 2024-12-10T11:17:17Z
format Article
id doaj.art-4e17569c7c154a18981cd1bb853c0ade
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-10T11:17:17Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-4e17569c7c154a18981cd1bb853c0ade2022-12-22T01:51:06ZengIEEEIEEE Access2169-35362021-01-019800158002710.1109/ACCESS.2021.30840579440980Scalable Clustering Algorithms for Big Data: A ReviewMahmoud A. Mahdi0https://orcid.org/0000-0002-7810-7006Khalid M. Hosny1https://orcid.org/0000-0001-8065-8977Ibrahim Elhenawy2https://orcid.org/0000-0001-7630-1983Faculty of Computers and Information, Zagazig University, Zagazig, EgyptFaculty of Computers and Information, Zagazig University, Zagazig, EgyptFaculty of Computers and Information, Zagazig University, Zagazig, EgyptClustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing research work to develop different algorithms to facilitate complex clustering processes, there are still many difficulties that arise while dealing with a large volume of data. In this paper, we review the most relevant clustering algorithms in a categorized manner, provide a comparison of clustering methods for large-scale data and explain the overall challenges based on clustering type. The key idea of the paper is to highlight the main advantages and disadvantages of clustering algorithms for dealing with big data in a scalable approach behind the different other features.https://ieeexplore.ieee.org/document/9440980/Clusteringunsupervised learningtraditional clusteringparallel clusteringstream clusteringhigh dimensional data
spellingShingle Mahmoud A. Mahdi
Khalid M. Hosny
Ibrahim Elhenawy
Scalable Clustering Algorithms for Big Data: A Review
IEEE Access
Clustering
unsupervised learning
traditional clustering
parallel clustering
stream clustering
high dimensional data
title Scalable Clustering Algorithms for Big Data: A Review
title_full Scalable Clustering Algorithms for Big Data: A Review
title_fullStr Scalable Clustering Algorithms for Big Data: A Review
title_full_unstemmed Scalable Clustering Algorithms for Big Data: A Review
title_short Scalable Clustering Algorithms for Big Data: A Review
title_sort scalable clustering algorithms for big data a review
topic Clustering
unsupervised learning
traditional clustering
parallel clustering
stream clustering
high dimensional data
url https://ieeexplore.ieee.org/document/9440980/
work_keys_str_mv AT mahmoudamahdi scalableclusteringalgorithmsforbigdataareview
AT khalidmhosny scalableclusteringalgorithmsforbigdataareview
AT ibrahimelhenawy scalableclusteringalgorithmsforbigdataareview