Well-Separated Pair Decompositions for High-Dimensional Datasets

Well-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display=&qu...

Full description

Bibliographic Details
Main Author: Domagoj Matijević
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/16/5/254
_version_ 1797601402473676800
author Domagoj Matijević
author_facet Domagoj Matijević
author_sort Domagoj Matijević
collection DOAJ
description Well-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> pairwise distances of <i>n</i> given points from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> space for a fixed dimension <i>d</i>. However, the main problem with this remarkable decomposition is the “hidden” dependence on the dimension <i>d</i>, which in practice does not allow for the computation of a WSPD for any dimension <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>3</mn></mrow></semantics></math></inline-formula> at best. In this work, I will show how to compute a WSPD for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> and for any dimension <i>d</i>. Instead of computing a WSPD directly in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula>, I propose to learn nonlinear mapping and transform the data to a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><msup><mi>d</mi><mo>′</mo></msup></msup></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>3</mn></mrow></semantics></math></inline-formula>, since only in such low-dimensional spaces can a WSPD be efficiently computed. Furthermore, I estimate the quality of the computed WSPD in the original <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>R</mi><mi>d</mi></msup></semantics></math></inline-formula> space. My experiments show that for different synthetic and real-world datasets my approach allows that a WSPD of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> can still be computed for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> for dimensions <i>d</i> much larger than two or three in practice.
first_indexed 2024-03-11T04:00:24Z
format Article
id doaj.art-31d2e4a73cd642a2bc5d9bb066fb18d0
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-11T04:00:24Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-31d2e4a73cd642a2bc5d9bb066fb18d02023-11-18T00:08:57ZengMDPI AGAlgorithms1999-48932023-05-0116525410.3390/a16050254Well-Separated Pair Decompositions for High-Dimensional DatasetsDomagoj Matijević0Department of Mathematics, University of Osijek, 31000 Osijek, CroatiaWell-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> pairwise distances of <i>n</i> given points from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> space for a fixed dimension <i>d</i>. However, the main problem with this remarkable decomposition is the “hidden” dependence on the dimension <i>d</i>, which in practice does not allow for the computation of a WSPD for any dimension <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>></mo><mn>3</mn></mrow></semantics></math></inline-formula> at best. In this work, I will show how to compute a WSPD for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> and for any dimension <i>d</i>. Instead of computing a WSPD directly in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula>, I propose to learn nonlinear mapping and transform the data to a lower-dimensional space <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><msup><mi>d</mi><mo>′</mo></msup></msup></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>2</mn></mrow></semantics></math></inline-formula> or <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>d</mi><mo>′</mo></msup><mo>=</mo><mn>3</mn></mrow></semantics></math></inline-formula>, since only in such low-dimensional spaces can a WSPD be efficiently computed. Furthermore, I estimate the quality of the computed WSPD in the original <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>R</mi><mi>d</mi></msup></semantics></math></inline-formula> space. My experiments show that for different synthetic and real-world datasets my approach allows that a WSPD of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula> can still be computed for points in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></semantics></math></inline-formula> for dimensions <i>d</i> much larger than two or three in practice.https://www.mdpi.com/1999-4893/16/5/254well-separated pair decompositionhigh-dimensional datanonlinear mapping
spellingShingle Domagoj Matijević
Well-Separated Pair Decompositions for High-Dimensional Datasets
Algorithms
well-separated pair decomposition
high-dimensional data
nonlinear mapping
title Well-Separated Pair Decompositions for High-Dimensional Datasets
title_full Well-Separated Pair Decompositions for High-Dimensional Datasets
title_fullStr Well-Separated Pair Decompositions for High-Dimensional Datasets
title_full_unstemmed Well-Separated Pair Decompositions for High-Dimensional Datasets
title_short Well-Separated Pair Decompositions for High-Dimensional Datasets
title_sort well separated pair decompositions for high dimensional datasets
topic well-separated pair decomposition
high-dimensional data
nonlinear mapping
url https://www.mdpi.com/1999-4893/16/5/254
work_keys_str_mv AT domagojmatijevic wellseparatedpairdecompositionsforhighdimensionaldatasets