Parallel Batch-Dynamic 𝑘d-trees

𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient pa...

Full description

Bibliographic Details
Main Author: Yesantharao, Rahul
Other Authors: Shun, Julian
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143277
_version_ 1826217406399250432
author Yesantharao, Rahul
author2 Shun, Julian
author_facet Shun, Julian
Yesantharao, Rahul
author_sort Yesantharao, Rahul
collection MIT
description 𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient parallel 𝑘-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of 𝑘d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with 𝑛 points, each batch of 𝐵 updates takes 𝑂(𝐵 log2 (𝑛 + 𝐵)) amortized work and 𝑂(log (𝑛 + 𝐵) log log (𝑛 + 𝐵)) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious 𝑘d-tree construction and parallel bloom filter construction. Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8× (28.4× on average) for batch insertions, up to 35.5× (27.2× on average) for batch deletions, and up to 46.1× (40.0× on average) for 𝑘-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for 𝑘-NN queries. We compare to two baseline 𝑘d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates.
first_indexed 2024-09-23T17:03:09Z
format Thesis
id mit-1721.1/143277
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T17:03:09Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1432772022-06-16T03:01:25Z Parallel Batch-Dynamic 𝑘d-trees Yesantharao, Rahul Shun, Julian Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science 𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient parallel 𝑘-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of 𝑘d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with 𝑛 points, each batch of 𝐵 updates takes 𝑂(𝐵 log2 (𝑛 + 𝐵)) amortized work and 𝑂(log (𝑛 + 𝐵) log log (𝑛 + 𝐵)) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious 𝑘d-tree construction and parallel bloom filter construction. Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8× (28.4× on average) for batch insertions, up to 35.5× (27.2× on average) for batch deletions, and up to 46.1× (40.0× on average) for 𝑘-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for 𝑘-NN queries. We compare to two baseline 𝑘d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates. M.Eng. 2022-06-15T13:09:13Z 2022-06-15T13:09:13Z 2022-02 2022-02-22T18:32:25.893Z Thesis https://hdl.handle.net/1721.1/143277 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Yesantharao, Rahul
Parallel Batch-Dynamic 𝑘d-trees
title Parallel Batch-Dynamic 𝑘d-trees
title_full Parallel Batch-Dynamic 𝑘d-trees
title_fullStr Parallel Batch-Dynamic 𝑘d-trees
title_full_unstemmed Parallel Batch-Dynamic 𝑘d-trees
title_short Parallel Batch-Dynamic 𝑘d-trees
title_sort parallel batch dynamic 𝑘d trees
url https://hdl.handle.net/1721.1/143277
work_keys_str_mv AT yesantharaorahul parallelbatchdynamickdtrees