Efficient High-Dimensional Kernel k-Means++ with Random Projection

Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inli...

Full description

Bibliographic Details
Main Authors: Jan Y. K. Chan, Alex Po Leung, Yunbo Xie
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/15/6963
_version_ 1797525809163927552
author Jan Y. K. Chan
Alex Po Leung
Yunbo Xie
author_facet Jan Y. K. Chan
Alex Po Leung
Yunbo Xie
author_sort Jan Y. K. Chan
collection DOAJ
description Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>Θ</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula>, respectively, with <i>n</i> being the number of data points, the dimensionality of input feature vectors <i>D</i> and the number of clusters <i>k</i>. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula> and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula>. Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>=</mo><mn>200</mn></mrow></semantics></math></inline-formula> is 2 to 26 times with very little performance degradation (less than one percent) in general.
first_indexed 2024-03-10T09:19:12Z
format Article
id doaj.art-cdd09ec274084cf3a4a2058b6880a9df
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T09:19:12Z
publishDate 2021-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-cdd09ec274084cf3a4a2058b6880a9df2023-11-22T05:22:18ZengMDPI AGApplied Sciences2076-34172021-07-011115696310.3390/app11156963Efficient High-Dimensional Kernel k-Means++ with Random ProjectionJan Y. K. Chan0Alex Po Leung1Yunbo Xie2Faculty of Information Technology, Macau University of Science and Technology, Taipa 999078, ChinaFaculty of Information Technology, Macau University of Science and Technology, Taipa 999078, ChinaFaculty of Information Technology, Macau University of Science and Technology, Taipa 999078, ChinaUsing random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>Θ</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula>, respectively, with <i>n</i> being the number of data points, the dimensionality of input feature vectors <i>D</i> and the number of clusters <i>k</i>. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula> and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula>. Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>=</mo><mn>200</mn></mrow></semantics></math></inline-formula> is 2 to 26 times with very little performance degradation (less than one percent) in general.https://www.mdpi.com/2076-3417/11/15/6963kernel k-meansk-means++random projectiondimensionality reductionhigh dimensional data
spellingShingle Jan Y. K. Chan
Alex Po Leung
Yunbo Xie
Efficient High-Dimensional Kernel k-Means++ with Random Projection
Applied Sciences
kernel k-means
k-means++
random projection
dimensionality reduction
high dimensional data
title Efficient High-Dimensional Kernel k-Means++ with Random Projection
title_full Efficient High-Dimensional Kernel k-Means++ with Random Projection
title_fullStr Efficient High-Dimensional Kernel k-Means++ with Random Projection
title_full_unstemmed Efficient High-Dimensional Kernel k-Means++ with Random Projection
title_short Efficient High-Dimensional Kernel k-Means++ with Random Projection
title_sort efficient high dimensional kernel k means with random projection
topic kernel k-means
k-means++
random projection
dimensionality reduction
high dimensional data
url https://www.mdpi.com/2076-3417/11/15/6963
work_keys_str_mv AT janykchan efficienthighdimensionalkernelkmeanswithrandomprojection
AT alexpoleung efficienthighdimensionalkernelkmeanswithrandomprojection
AT yunboxie efficienthighdimensionalkernelkmeanswithrandomprojection