Well-Separated Pair Decompositions for High-Dimensional Datasets
Well-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display=&qu...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Algorithms |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-4893/16/5/254 |
_version_ | 1797601402473676800 |
---|---|
author | Domagoj Matijević |
author_facet | Domagoj Matijević |
author_sort | Domagoj Matijević |
collection | DOAJ |
description | Well-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> pairwise distances of <i>n</i> given points from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> space for a fixed dimension <i>d</i>. However, the main problem with this remarkable decomposition is the “hidden” dependence on the dimension <i>d</i>, which in practice does not allow for the computation of a WSPD for any dimension <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>3</mn></mrow></semantics></math></inline-formula> at best. In this work, I will show how to compute a WSPD for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> and for any dimension <i>d</i>. Instead of computing a WSPD directly in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula>, I propose to learn nonlinear mapping and transform the data to a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><msup><mi>d</mi><mo>′</mo></msup></msup></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>3</mn></mrow></semantics></math></inline-formula>, since only in such low-dimensional spaces can a WSPD be efficiently computed. Furthermore, I estimate the quality of the computed WSPD in the original <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>R</mi><mi>d</mi></msup></semantics></math></inline-formula> space. My experiments show that for different synthetic and real-world datasets my approach allows that a WSPD of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> can still be computed for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> for dimensions <i>d</i> much larger than two or three in practice. |
first_indexed | 2024-03-11T04:00:24Z |
format | Article |
id | doaj.art-31d2e4a73cd642a2bc5d9bb066fb18d0 |
institution | Directory Open Access Journal |
issn | 1999-4893 |
language | English |
last_indexed | 2024-03-11T04:00:24Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Algorithms |
spelling | doaj.art-31d2e4a73cd642a2bc5d9bb066fb18d02023-11-18T00:08:57ZengMDPI AGAlgorithms1999-48932023-05-0116525410.3390/a16050254Well-Separated Pair Decompositions for High-Dimensional DatasetsDomagoj Matijević0Department of Mathematics, University of Osijek, 31000 Osijek, CroatiaWell-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> pairwise distances of <i>n</i> given points from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> space for a fixed dimension <i>d</i>. However, the main problem with this remarkable decomposition is the “hidden” dependence on the dimension <i>d</i>, which in practice does not allow for the computation of a WSPD for any dimension <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>3</mn></mrow></semantics></math></inline-formula> at best. In this work, I will show how to compute a WSPD for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> and for any dimension <i>d</i>. Instead of computing a WSPD directly in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula>, I propose to learn nonlinear mapping and transform the data to a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><msup><mi>d</mi><mo>′</mo></msup></msup></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>3</mn></mrow></semantics></math></inline-formula>, since only in such low-dimensional spaces can a WSPD be efficiently computed. Furthermore, I estimate the quality of the computed WSPD in the original <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>R</mi><mi>d</mi></msup></semantics></math></inline-formula> space. My experiments show that for different synthetic and real-world datasets my approach allows that a WSPD of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> can still be computed for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> for dimensions <i>d</i> much larger than two or three in practice.https://www.mdpi.com/1999-4893/16/5/254well-separated pair decompositionhigh-dimensional datanonlinear mapping |
spellingShingle | Domagoj Matijević Well-Separated Pair Decompositions for High-Dimensional Datasets Algorithms well-separated pair decomposition high-dimensional data nonlinear mapping |
title | Well-Separated Pair Decompositions for High-Dimensional Datasets |
title_full | Well-Separated Pair Decompositions for High-Dimensional Datasets |
title_fullStr | Well-Separated Pair Decompositions for High-Dimensional Datasets |
title_full_unstemmed | Well-Separated Pair Decompositions for High-Dimensional Datasets |
title_short | Well-Separated Pair Decompositions for High-Dimensional Datasets |
title_sort | well separated pair decompositions for high dimensional datasets |
topic | well-separated pair decomposition high-dimensional data nonlinear mapping |
url | https://www.mdpi.com/1999-4893/16/5/254 |
work_keys_str_mv | AT domagojmatijevic wellseparatedpairdecompositionsforhighdimensionaldatasets |