A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem

Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering group...

Full description

Bibliographic Details
Main Authors:	Sunil Kumar, Maninder Singh
Format:	Article
Language:	English
Published:	Tsinghua University Press 2019-12-01
Series:	Big Data Mining and Analytics
Subjects:	clustering hadoop big data k-means hierarchical
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2018.9020037

_version_	1798029367626956800
author	Sunil Kumar Maninder Singh
author_facet	Sunil Kumar Maninder Singh
author_sort	Sunil Kumar
collection	DOAJ
description	Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.
first_indexed	2024-04-11T19:24:18Z
format	Article
id	doaj.art-6a34f9e29dcc4346940cc11b16c94363
institution	Directory Open Access Journal
issn	2096-0654
language	English
last_indexed	2024-04-11T19:24:18Z
publishDate	2019-12-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj.art-6a34f9e29dcc4346940cc11b16c943632022-12-22T04:07:13ZengTsinghua University PressBig Data Mining and Analytics2096-06542019-12-012424024710.26599/BDMA.2018.9020037A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop EcosystemSunil Kumar0Maninder Singh1<institution content-type="dept">Directorate of Livestock Farms</institution>, <institution>Guru Angad Dev Veterinary and Animal Sciences University</institution>, <city>Ludhiana</city> <postal-code>141001</postal-code>, <country>India</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Punjabi University</institution>, <city>Punjab</city> <postal-code>147002</postal-code>, <country>India</country>.Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.https://www.sciopen.com/article/10.26599/BDMA.2018.9020037clusteringhadoopbig datak-meanshierarchical
spellingShingle	Sunil Kumar Maninder Singh A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem Big Data Mining and Analytics clustering hadoop big data k-means hierarchical
title	A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_full	A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_fullStr	A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_full_unstemmed	A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_short	A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
title_sort	novel clustering technique for efficient clustering of big data in hadoop ecosystem
topic	clustering hadoop big data k-means hierarchical
url	https://www.sciopen.com/article/10.26599/BDMA.2018.9020037
work_keys_str_mv	AT sunilkumar anovelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem AT manindersingh anovelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem AT sunilkumar novelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem AT manindersingh novelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem

A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem

Similar Items