A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem

Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering group...

Full description

Bibliographic Details
Main Authors: Sunil Kumar, Maninder Singh
Format: Article
Language:English
Published: Tsinghua University Press 2019-12-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2018.9020037
_version_ 1798029367626956800
author Sunil Kumar
Maninder Singh
author_facet Sunil Kumar
Maninder Singh
author_sort Sunil Kumar
collection DOAJ
description Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.
first_indexed 2024-04-11T19:24:18Z
format Article
id doaj.art-6a34f9e29dcc4346940cc11b16c94363
institution Directory Open Access Journal
issn 2096-0654
language English
last_indexed 2024-04-11T19:24:18Z
publishDate 2019-12-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj.art-6a34f9e29dcc4346940cc11b16c943632022-12-22T04:07:13ZengTsinghua University PressBig Data Mining and Analytics2096-06542019-12-012424024710.26599/BDMA.2018.9020037A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop EcosystemSunil Kumar0Maninder Singh1<institution content-type="dept">Directorate of Livestock Farms</institution>, <institution>Guru Angad Dev Veterinary and Animal Sciences University</institution>, <city>Ludhiana</city> <postal-code>141001</postal-code>, <country>India</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Punjabi University</institution>, <city>Punjab</city> <postal-code>147002</postal-code>, <country>India</country>.Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.https://www.sciopen.com/article/10.26599/BDMA.2018.9020037clusteringhadoopbig datak-meanshierarchical
spellingShingle Sunil Kumar
Maninder Singh
A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
Big Data Mining and Analytics
clustering
hadoop
big data
k-means
hierarchical
title A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_full A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_fullStr A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_full_unstemmed A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_short A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_sort novel clustering technique for efficient clustering of big data in hadoop ecosystem
topic clustering
hadoop
big data
k-means
hierarchical
url https://www.sciopen.com/article/10.26599/BDMA.2018.9020037
work_keys_str_mv AT sunilkumar anovelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem
AT manindersingh anovelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem
AT sunilkumar novelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem
AT manindersingh novelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem