Parallel Batch-Dynamic 𝑘d-trees
𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient pa...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143277 |
_version_ | 1826217406399250432 |
---|---|
author | Yesantharao, Rahul |
author2 | Shun, Julian |
author_facet | Shun, Julian Yesantharao, Rahul |
author_sort | Yesantharao, Rahul |
collection | MIT |
description | 𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient parallel 𝑘-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of 𝑘d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with 𝑛 points, each batch of 𝐵 updates takes 𝑂(𝐵 log2 (𝑛 + 𝐵)) amortized work and 𝑂(log (𝑛 + 𝐵) log log (𝑛 + 𝐵)) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious 𝑘d-tree construction and parallel bloom filter construction.
Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8× (28.4× on average) for batch insertions, up to 35.5× (27.2× on average) for batch deletions, and up to 46.1× (40.0× on average) for 𝑘-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for 𝑘-NN queries. We compare to two baseline 𝑘d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates. |
first_indexed | 2024-09-23T17:03:09Z |
format | Thesis |
id | mit-1721.1/143277 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T17:03:09Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1432772022-06-16T03:01:25Z Parallel Batch-Dynamic 𝑘d-trees Yesantharao, Rahul Shun, Julian Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science 𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient parallel 𝑘-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of 𝑘d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with 𝑛 points, each batch of 𝐵 updates takes 𝑂(𝐵 log2 (𝑛 + 𝐵)) amortized work and 𝑂(log (𝑛 + 𝐵) log log (𝑛 + 𝐵)) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious 𝑘d-tree construction and parallel bloom filter construction. Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8× (28.4× on average) for batch insertions, up to 35.5× (27.2× on average) for batch deletions, and up to 46.1× (40.0× on average) for 𝑘-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for 𝑘-NN queries. We compare to two baseline 𝑘d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates. M.Eng. 2022-06-15T13:09:13Z 2022-06-15T13:09:13Z 2022-02 2022-02-22T18:32:25.893Z Thesis https://hdl.handle.net/1721.1/143277 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Yesantharao, Rahul Parallel Batch-Dynamic 𝑘d-trees |
title | Parallel Batch-Dynamic 𝑘d-trees |
title_full | Parallel Batch-Dynamic 𝑘d-trees |
title_fullStr | Parallel Batch-Dynamic 𝑘d-trees |
title_full_unstemmed | Parallel Batch-Dynamic 𝑘d-trees |
title_short | Parallel Batch-Dynamic 𝑘d-trees |
title_sort | parallel batch dynamic 𝑘d trees |
url | https://hdl.handle.net/1721.1/143277 |
work_keys_str_mv | AT yesantharaorahul parallelbatchdynamickdtrees |