Efficient High-Dimensional Kernel k-Means++ with Random Projection
Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inli...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/15/6963 |
_version_ | 1797525809163927552 |
---|---|
author | Jan Y. K. Chan Alex Po Leung Yunbo Xie |
author_facet | Jan Y. K. Chan Alex Po Leung Yunbo Xie |
author_sort | Jan Y. K. Chan |
collection | DOAJ |
description | Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>Θ</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula>, respectively, with <i>n</i> being the number of data points, the dimensionality of input feature vectors <i>D</i> and the number of clusters <i>k</i>. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula> and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula>. Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>=</mo><mn>200</mn></mrow></semantics></math></inline-formula> is 2 to 26 times with very little performance degradation (less than one percent) in general. |
first_indexed | 2024-03-10T09:19:12Z |
format | Article |
id | doaj.art-cdd09ec274084cf3a4a2058b6880a9df |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T09:19:12Z |
publishDate | 2021-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-cdd09ec274084cf3a4a2058b6880a9df2023-11-22T05:22:18ZengMDPI AGApplied Sciences2076-34172021-07-011115696310.3390/app11156963Efficient High-Dimensional Kernel k-Means++ with Random ProjectionJan Y. K. Chan0Alex Po Leung1Yunbo Xie2Faculty of Information Technology, Macau University of Science and Technology, Taipa 999078, ChinaFaculty of Information Technology, Macau University of Science and Technology, Taipa 999078, ChinaFaculty of Information Technology, Macau University of Science and Technology, Taipa 999078, ChinaUsing random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>Θ</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula>, respectively, with <i>n</i> being the number of data points, the dimensionality of input feature vectors <i>D</i> and the number of clusters <i>k</i>. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula> and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>D</mi><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mi>k</mi><mi>d</mi><mo>)</mo></mrow></semantics></math></inline-formula>. Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>=</mo><mn>200</mn></mrow></semantics></math></inline-formula> is 2 to 26 times with very little performance degradation (less than one percent) in general.https://www.mdpi.com/2076-3417/11/15/6963kernel k-meansk-means++random projectiondimensionality reductionhigh dimensional data |
spellingShingle | Jan Y. K. Chan Alex Po Leung Yunbo Xie Efficient High-Dimensional Kernel k-Means++ with Random Projection Applied Sciences kernel k-means k-means++ random projection dimensionality reduction high dimensional data |
title | Efficient High-Dimensional Kernel k-Means++ with Random Projection |
title_full | Efficient High-Dimensional Kernel k-Means++ with Random Projection |
title_fullStr | Efficient High-Dimensional Kernel k-Means++ with Random Projection |
title_full_unstemmed | Efficient High-Dimensional Kernel k-Means++ with Random Projection |
title_short | Efficient High-Dimensional Kernel k-Means++ with Random Projection |
title_sort | efficient high dimensional kernel k means with random projection |
topic | kernel k-means k-means++ random projection dimensionality reduction high dimensional data |
url | https://www.mdpi.com/2076-3417/11/15/6963 |
work_keys_str_mv | AT janykchan efficienthighdimensionalkernelkmeanswithrandomprojection AT alexpoleung efficienthighdimensionalkernelkmeanswithrandomprojection AT yunboxie efficienthighdimensionalkernelkmeanswithrandomprojection |