Practical data-dependent metric compression with provable guarantees

We introduce a new distance-preserving compact representation of multidimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O(dlog(dB/e) + logn) bits per point from which one c...

Full description

Bibliographic Details
Main Authors: Indyk, Piotr, Razenshteyn, Ilya, Wagner, Tal
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Neural Information Processing Systems Foundation, Inc. 2019
Online Access:https://hdl.handle.net/1721.1/121556
_version_ 1826209509874335744
author Indyk, Piotr
Razenshteyn, Ilya
Wagner, Tal
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Indyk, Piotr
Razenshteyn, Ilya
Wagner, Tal
author_sort Indyk, Piotr
collection MIT
description We introduce a new distance-preserving compact representation of multidimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O(dlog(dB/e) + logn) bits per point from which one can approximate the distances up to a factor of 1 ± e. Our algorithm almost matches the recent bound of [6] while being much simpler. We compare our algorithm to Product Quantization (PQ) [7], a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in [7]), MNIST [11], New York City taxi time series [4] and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned parameters, our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.
first_indexed 2024-09-23T14:23:36Z
format Article
id mit-1721.1/121556
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T14:23:36Z
publishDate 2019
publisher Neural Information Processing Systems Foundation, Inc.
record_format dspace
spelling mit-1721.1/1215562022-09-29T09:13:50Z Practical data-dependent metric compression with provable guarantees Indyk, Piotr Razenshteyn, Ilya Wagner, Tal Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We introduce a new distance-preserving compact representation of multidimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O(dlog(dB/e) + logn) bits per point from which one can approximate the distances up to a factor of 1 ± e. Our algorithm almost matches the recent bound of [6] while being much simpler. We compare our algorithm to Product Quantization (PQ) [7], a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in [7]), MNIST [11], New York City taxi time series [4] and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned parameters, our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance. 2019-07-09T20:08:57Z 2019-07-09T20:08:57Z 2017-12 2019-05-31T14:38:15Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/121556 Indyk, Piotr, Ilya Razenshteyn, and Tal Wagner. "Practical Data-Dependent Metric Compression with Provable Guarantees." 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, USA, 4-9 December, 2017, NIPS, 2017. © Neural Information Processing Systems Foundation, Inc. en https://papers.nips.cc/paper/6855-practical-data-dependent-metric-compression-with-provable-guarantees 31st Conference on Neural Information Processing Systems (NIPS 2017) Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems Foundation, Inc. Neural Information Processing Systems (NIPS)
spellingShingle Indyk, Piotr
Razenshteyn, Ilya
Wagner, Tal
Practical data-dependent metric compression with provable guarantees
title Practical data-dependent metric compression with provable guarantees
title_full Practical data-dependent metric compression with provable guarantees
title_fullStr Practical data-dependent metric compression with provable guarantees
title_full_unstemmed Practical data-dependent metric compression with provable guarantees
title_short Practical data-dependent metric compression with provable guarantees
title_sort practical data dependent metric compression with provable guarantees
url https://hdl.handle.net/1721.1/121556
work_keys_str_mv AT indykpiotr practicaldatadependentmetriccompressionwithprovableguarantees
AT razenshteynilya practicaldatadependentmetriccompressionwithprovableguarantees
AT wagnertal practicaldatadependentmetriccompressionwithprovableguarantees