A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering group...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2019-12-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2018.9020037 |
_version_ | 1798029367626956800 |
---|---|
author | Sunil Kumar Maninder Singh |
author_facet | Sunil Kumar Maninder Singh |
author_sort | Sunil Kumar |
collection | DOAJ |
description | Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values. |
first_indexed | 2024-04-11T19:24:18Z |
format | Article |
id | doaj.art-6a34f9e29dcc4346940cc11b16c94363 |
institution | Directory Open Access Journal |
issn | 2096-0654 |
language | English |
last_indexed | 2024-04-11T19:24:18Z |
publishDate | 2019-12-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj.art-6a34f9e29dcc4346940cc11b16c943632022-12-22T04:07:13ZengTsinghua University PressBig Data Mining and Analytics2096-06542019-12-012424024710.26599/BDMA.2018.9020037A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop EcosystemSunil Kumar0Maninder Singh1<institution content-type="dept">Directorate of Livestock Farms</institution>, <institution>Guru Angad Dev Veterinary and Animal Sciences University</institution>, <city>Ludhiana</city> <postal-code>141001</postal-code>, <country>India</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Punjabi University</institution>, <city>Punjab</city> <postal-code>147002</postal-code>, <country>India</country>.Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.https://www.sciopen.com/article/10.26599/BDMA.2018.9020037clusteringhadoopbig datak-meanshierarchical |
spellingShingle | Sunil Kumar Maninder Singh A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem Big Data Mining and Analytics clustering hadoop big data k-means hierarchical |
title | A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem |
title_full | A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem |
title_fullStr | A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem |
title_full_unstemmed | A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem |
title_short | A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem |
title_sort | novel clustering technique for efficient clustering of big data in hadoop ecosystem |
topic | clustering hadoop big data k-means hierarchical |
url | https://www.sciopen.com/article/10.26599/BDMA.2018.9020037 |
work_keys_str_mv | AT sunilkumar anovelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem AT manindersingh anovelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem AT sunilkumar novelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem AT manindersingh novelclusteringtechniqueforefficientclusteringofbigdatainhadoopecosystem |