An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms

The <italic>k</italic>-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of <italic>k</italic>-means to its initial set of centroids. Since the cluster recovery performance of <italic>k</italic...

Full description

Bibliographic Details
Main Authors: Simon Harris, Renato Cordeiro De Amorim
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9786801/
_version_ 1811342022208913408
author Simon Harris
Renato Cordeiro De Amorim
author_facet Simon Harris
Renato Cordeiro De Amorim
author_sort Simon Harris
collection DOAJ
description The <italic>k</italic>-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of <italic>k</italic>-means to its initial set of centroids. Since the cluster recovery performance of <italic>k</italic>-means can be improved by better initialisation, numerous algorithms have been proposed aiming at producing good initial centroids. However, it is still unclear which algorithm should be used in any particular clustering scenario. With this in mind, we compare 17 such algorithms on 6,000 synthetic and 28 real-world data sets. The synthetic data sets were produced under different configurations, allowing us to show which algorithm excels in each scenario. Hence, the results of our experiments can be particularly useful for those considering <italic>k</italic>-means for a non-trivial clustering scenario.
first_indexed 2024-04-13T19:04:03Z
format Article
id doaj.art-d208ed2541774746af28d49c962933d7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T19:04:03Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-d208ed2541774746af28d49c962933d72022-12-22T02:34:01ZengIEEEIEEE Access2169-35362022-01-0110587525876810.1109/ACCESS.2022.31798039786801An Extensive Empirical Comparison of <italic>k</italic>-means Initialization AlgorithmsSimon Harris0Renato Cordeiro De Amorim1https://orcid.org/0000-0002-6805-4609School of Computer Science and Electronic Engineering, University of Essex, Colchester, U.K.School of Computer Science and Electronic Engineering, University of Essex, Colchester, U.K.The <italic>k</italic>-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of <italic>k</italic>-means to its initial set of centroids. Since the cluster recovery performance of <italic>k</italic>-means can be improved by better initialisation, numerous algorithms have been proposed aiming at producing good initial centroids. However, it is still unclear which algorithm should be used in any particular clustering scenario. With this in mind, we compare 17 such algorithms on 6,000 synthetic and 28 real-world data sets. The synthetic data sets were produced under different configurations, allowing us to show which algorithm excels in each scenario. Hence, the results of our experiments can be particularly useful for those considering <italic>k</italic>-means for a non-trivial clustering scenario.https://ieeexplore.ieee.org/document/9786801/<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means initialisationclustering
spellingShingle Simon Harris
Renato Cordeiro De Amorim
An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
IEEE Access
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means initialisation
clustering
title An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
title_full An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
title_fullStr An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
title_full_unstemmed An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
title_short An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
title_sort extensive empirical comparison of italic k italic means initialization algorithms
topic <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means initialisation
clustering
url https://ieeexplore.ieee.org/document/9786801/
work_keys_str_mv AT simonharris anextensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms
AT renatocordeirodeamorim anextensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms
AT simonharris extensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms
AT renatocordeirodeamorim extensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms