An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms
The <italic>k</italic>-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of <italic>k</italic>-means to its initial set of centroids. Since the cluster recovery performance of <italic>k</italic...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9786801/ |
_version_ | 1811342022208913408 |
---|---|
author | Simon Harris Renato Cordeiro De Amorim |
author_facet | Simon Harris Renato Cordeiro De Amorim |
author_sort | Simon Harris |
collection | DOAJ |
description | The <italic>k</italic>-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of <italic>k</italic>-means to its initial set of centroids. Since the cluster recovery performance of <italic>k</italic>-means can be improved by better initialisation, numerous algorithms have been proposed aiming at producing good initial centroids. However, it is still unclear which algorithm should be used in any particular clustering scenario. With this in mind, we compare 17 such algorithms on 6,000 synthetic and 28 real-world data sets. The synthetic data sets were produced under different configurations, allowing us to show which algorithm excels in each scenario. Hence, the results of our experiments can be particularly useful for those considering <italic>k</italic>-means for a non-trivial clustering scenario. |
first_indexed | 2024-04-13T19:04:03Z |
format | Article |
id | doaj.art-d208ed2541774746af28d49c962933d7 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-13T19:04:03Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-d208ed2541774746af28d49c962933d72022-12-22T02:34:01ZengIEEEIEEE Access2169-35362022-01-0110587525876810.1109/ACCESS.2022.31798039786801An Extensive Empirical Comparison of <italic>k</italic>-means Initialization AlgorithmsSimon Harris0Renato Cordeiro De Amorim1https://orcid.org/0000-0002-6805-4609School of Computer Science and Electronic Engineering, University of Essex, Colchester, U.K.School of Computer Science and Electronic Engineering, University of Essex, Colchester, U.K.The <italic>k</italic>-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of <italic>k</italic>-means to its initial set of centroids. Since the cluster recovery performance of <italic>k</italic>-means can be improved by better initialisation, numerous algorithms have been proposed aiming at producing good initial centroids. However, it is still unclear which algorithm should be used in any particular clustering scenario. With this in mind, we compare 17 such algorithms on 6,000 synthetic and 28 real-world data sets. The synthetic data sets were produced under different configurations, allowing us to show which algorithm excels in each scenario. Hence, the results of our experiments can be particularly useful for those considering <italic>k</italic>-means for a non-trivial clustering scenario.https://ieeexplore.ieee.org/document/9786801/<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means initialisationclustering |
spellingShingle | Simon Harris Renato Cordeiro De Amorim An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms IEEE Access <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means initialisation clustering |
title | An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms |
title_full | An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms |
title_fullStr | An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms |
title_full_unstemmed | An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms |
title_short | An Extensive Empirical Comparison of <italic>k</italic>-means Initialization Algorithms |
title_sort | extensive empirical comparison of italic k italic means initialization algorithms |
topic | <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-means initialisation clustering |
url | https://ieeexplore.ieee.org/document/9786801/ |
work_keys_str_mv | AT simonharris anextensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms AT renatocordeirodeamorim anextensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms AT simonharris extensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms AT renatocordeirodeamorim extensiveempiricalcomparisonofitalickitalicmeansinitializationalgorithms |