Space and time efficient kernel density estimation in high dimensions

Recently, Charikar and Siminelakis (2017) presented a framework for kernel density estimation in provably sublinear query time, for kernels that possess a certain hashing-based property. However, their data structure requires a significantly increased super-linear storage space, as well as super-lin...

Full description

Bibliographic Details
Main Authors: Indyk, Piotr, Wagner, Tal
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Morgan Kaufmann Publishers 2021
Online Access:https://hdl.handle.net/1721.1/129407
_version_ 1811089339903377408
author Indyk, Piotr
Wagner, Tal
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Indyk, Piotr
Wagner, Tal
author_sort Indyk, Piotr
collection MIT
description Recently, Charikar and Siminelakis (2017) presented a framework for kernel density estimation in provably sublinear query time, for kernels that possess a certain hashing-based property. However, their data structure requires a significantly increased super-linear storage space, as well as super-linear preprocessing time. These limitations inhibit the practical applicability of their approach on large datasets. In this work, we present an improvement to their framework that retains the same query time, while requiring only linear space and linear preprocessing time. We instantiate our framework with the Laplacian and Exponential kernels, two popular kernels which possess the aforementioned property. Our experiments on various datasets verify that our approach attains accuracy and query time similar to Charikar and Siminelakis (2017), with significantly improved space and preprocessing time.
first_indexed 2024-09-23T14:17:38Z
format Article
id mit-1721.1/129407
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T14:17:38Z
publishDate 2021
publisher Morgan Kaufmann Publishers
record_format dspace
spelling mit-1721.1/1294072022-10-01T20:25:57Z Space and time efficient kernel density estimation in high dimensions Indyk, Piotr Wagner, Tal Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Recently, Charikar and Siminelakis (2017) presented a framework for kernel density estimation in provably sublinear query time, for kernels that possess a certain hashing-based property. However, their data structure requires a significantly increased super-linear storage space, as well as super-linear preprocessing time. These limitations inhibit the practical applicability of their approach on large datasets. In this work, we present an improvement to their framework that retains the same query time, while requiring only linear space and linear preprocessing time. We instantiate our framework with the Laplacian and Exponential kernels, two popular kernels which possess the aforementioned property. Our experiments on various datasets verify that our approach attains accuracy and query time similar to Charikar and Siminelakis (2017), with significantly improved space and preprocessing time. National Science Foundation (U.S.). Transdisciplinary Research in Principles of Data Science (Award 1740751) 2021-01-13T19:01:08Z 2021-01-13T19:01:08Z 2019-12 2020-12-18T16:35:13Z Article http://purl.org/eprint/type/ConferencePaper 1049-5258 https://hdl.handle.net/1721.1/129407 Backurs, Arturs et al. “Space and time efficient kernel density estimation in high dimensions.” Advances in Neural Information Processing Systems, 32 (December 2019) © 2019 The Author(s) en https://papers.nips.cc/paper/2019/hash/a2ce8f1706e52936dfad516c23904e3e-Abstract.html Advances in Neural Information Processing Systems Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Morgan Kaufmann Publishers Neural Information Processing Systems (NIPS)
spellingShingle Indyk, Piotr
Wagner, Tal
Space and time efficient kernel density estimation in high dimensions
title Space and time efficient kernel density estimation in high dimensions
title_full Space and time efficient kernel density estimation in high dimensions
title_fullStr Space and time efficient kernel density estimation in high dimensions
title_full_unstemmed Space and time efficient kernel density estimation in high dimensions
title_short Space and time efficient kernel density estimation in high dimensions
title_sort space and time efficient kernel density estimation in high dimensions
url https://hdl.handle.net/1721.1/129407
work_keys_str_mv AT indykpiotr spaceandtimeefficientkerneldensityestimationinhighdimensions
AT wagnertal spaceandtimeefficientkerneldensityestimationinhighdimensions