Scalable Clustering Algorithms for Big Data: A Review

Clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount o...

Full description

Bibliographic Details
Main Authors:	Mahmoud A. Mahdi, Khalid M. Hosny, Ibrahim Elhenawy
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Clustering unsupervised learning traditional clustering parallel clustering stream clustering high dimensional data
Online Access:	https://ieeexplore.ieee.org/document/9440980/

_version_	1828406819876241408
author	Mahmoud A. Mahdi Khalid M. Hosny Ibrahim Elhenawy
author_facet	Mahmoud A. Mahdi Khalid M. Hosny Ibrahim Elhenawy
author_sort	Mahmoud A. Mahdi
collection	DOAJ
description	Clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing research work to develop different algorithms to facilitate complex clustering processes, there are still many difficulties that arise while dealing with a large volume of data. In this paper, we review the most relevant clustering algorithms in a categorized manner, provide a comparison of clustering methods for large-scale data and explain the overall challenges based on clustering type. The key idea of the paper is to highlight the main advantages and disadvantages of clustering algorithms for dealing with big data in a scalable approach behind the different other features.
first_indexed	2024-12-10T11:17:17Z
format	Article
id	doaj.art-4e17569c7c154a18981cd1bb853c0ade
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-10T11:17:17Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-4e17569c7c154a18981cd1bb853c0ade2022-12-22T01:51:06ZengIEEEIEEE Access2169-35362021-01-019800158002710.1109/ACCESS.2021.30840579440980Scalable Clustering Algorithms for Big Data: A ReviewMahmoud A. Mahdi0https://orcid.org/0000-0002-7810-7006Khalid M. Hosny1https://orcid.org/0000-0001-8065-8977Ibrahim Elhenawy2https://orcid.org/0000-0001-7630-1983Faculty of Computers and Information, Zagazig University, Zagazig, EgyptFaculty of Computers and Information, Zagazig University, Zagazig, EgyptFaculty of Computers and Information, Zagazig University, Zagazig, EgyptClustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing research work to develop different algorithms to facilitate complex clustering processes, there are still many difficulties that arise while dealing with a large volume of data. In this paper, we review the most relevant clustering algorithms in a categorized manner, provide a comparison of clustering methods for large-scale data and explain the overall challenges based on clustering type. The key idea of the paper is to highlight the main advantages and disadvantages of clustering algorithms for dealing with big data in a scalable approach behind the different other features.https://ieeexplore.ieee.org/document/9440980/Clusteringunsupervised learningtraditional clusteringparallel clusteringstream clusteringhigh dimensional data
spellingShingle	Mahmoud A. Mahdi Khalid M. Hosny Ibrahim Elhenawy Scalable Clustering Algorithms for Big Data: A Review IEEE Access Clustering unsupervised learning traditional clustering parallel clustering stream clustering high dimensional data
title	Scalable Clustering Algorithms for Big Data: A Review
title_full	Scalable Clustering Algorithms for Big Data: A Review
title_fullStr	Scalable Clustering Algorithms for Big Data: A Review
title_full_unstemmed	Scalable Clustering Algorithms for Big Data: A Review
title_short	Scalable Clustering Algorithms for Big Data: A Review
title_sort	scalable clustering algorithms for big data a review
topic	Clustering unsupervised learning traditional clustering parallel clustering stream clustering high dimensional data
url	https://ieeexplore.ieee.org/document/9440980/
work_keys_str_mv	AT mahmoudamahdi scalableclusteringalgorithmsforbigdataareview AT khalidmhosny scalableclusteringalgorithmsforbigdataareview AT ibrahimelhenawy scalableclusteringalgorithmsforbigdataareview

Scalable Clustering Algorithms for Big Data: A Review

Similar Items