A comparative study and performance evaluation of similarity measures for data clustering

Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relat...

Full description

Bibliographic Details
Main Authors:	Usman, Dauda, Mohamad, Ismail
Format:	Conference or Workshop Item
Language:	English
Published:	2014
Subjects:	QA Mathematics
Online Access:	http://eprints.utm.my/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf

_version_	1796861015293427712
author	Usman, Dauda Mohamad, Ismail
author_facet	Usman, Dauda Mohamad, Ismail
author_sort	Usman, Dauda
collection	ePrints
description	Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in clustering for high dimensional datasets. Our experiments utilize the basic K-means algorithm with application of PCA and we report results on simulated high dimensional datasets and two distance/similarity measures that have been most commonly used in clustering. The analyzed results indicate that Squared Euclidean distance is much better than the Manhattan distance method.
first_indexed	2024-03-05T19:49:59Z
format	Conference or Workshop Item
id	utm.eprints-60995
institution	Universiti Teknologi Malaysia - ePrints
language	English
last_indexed	2024-03-05T19:49:59Z
publishDate	2014
record_format	dspace
spelling	utm.eprints-609952017-03-12T07:52:24Z http://eprints.utm.my/60995/ A comparative study and performance evaluation of similarity measures for data clustering Usman, Dauda Mohamad, Ismail QA Mathematics Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in clustering for high dimensional datasets. Our experiments utilize the basic K-means algorithm with application of PCA and we report results on simulated high dimensional datasets and two distance/similarity measures that have been most commonly used in clustering. The analyzed results indicate that Squared Euclidean distance is much better than the Manhattan distance method. 2014 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf Usman, Dauda and Mohamad, Ismail (2014) A comparative study and performance evaluation of similarity measures for data clustering. In: 2nd International Science Postgraduate Conference 2014 (ISPC2014), 10-12 Mac, 2014, Johor Bahru, Malaysia.
spellingShingle	QA Mathematics Usman, Dauda Mohamad, Ismail A comparative study and performance evaluation of similarity measures for data clustering
title	A comparative study and performance evaluation of similarity measures for data clustering
title_full	A comparative study and performance evaluation of similarity measures for data clustering
title_fullStr	A comparative study and performance evaluation of similarity measures for data clustering
title_full_unstemmed	A comparative study and performance evaluation of similarity measures for data clustering
title_short	A comparative study and performance evaluation of similarity measures for data clustering
title_sort	comparative study and performance evaluation of similarity measures for data clustering
topic	QA Mathematics
url	http://eprints.utm.my/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf
work_keys_str_mv	AT usmandauda acomparativestudyandperformanceevaluationofsimilaritymeasuresfordataclustering AT mohamadismail acomparativestudyandperformanceevaluationofsimilaritymeasuresfordataclustering AT usmandauda comparativestudyandperformanceevaluationofsimilaritymeasuresfordataclustering AT mohamadismail comparativestudyandperformanceevaluationofsimilaritymeasuresfordataclustering

A comparative study and performance evaluation of similarity measures for data clustering

Similar Items