Nearly-optimal bounds for sparse recovery in generic norms, with applications to k-median sketching

We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically for a norm ||·||, sparsity parameter k, approximation factor K > 0, and probability of failure P > 0, we ask: what is the minimal value of m so that th...

Full description

Bibliographic Details
Main Authors: Woodruff, David P., Backurs, Arturs, Indyk, Piotr, Razenshteyn, Ilya
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computing Machinery 2018
Online Access:http://hdl.handle.net/1721.1/113845
https://orcid.org/0000-0001-7546-6313
https://orcid.org/0000-0002-7983-9524
https://orcid.org/0000-0002-3962-721X
Description
Summary:We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically for a norm ||·||, sparsity parameter k, approximation factor K > 0, and probability of failure P > 0, we ask: what is the minimal value of m so that there is a distribution over m × n matrices A with the property that for any x, given Ax, we can recover a k-sparse approximation to x in the given norm with probability at least 1 -- P? We give a partial answer to this problem, by showing that for norms that admit efficient linear sketches, the optimal number of measurements m is closely related to the doubling dimension of the metric induced by the norm ||·|| on the set of all k-sparse vectors. By applying our result to specific norms, we cast known measurement bounds in our general framework (for the [subscript ℓ]p norms, p ∈ [1, 2]) as well as provide new, measurement-efficient schemes (for the Earth-Mover Distance norm). The latter result directly implies more succinct linear sketches for the well-studied planar k-median clustering problem. Finally our lower bound for the doubling dimension of the EMD norm enables us to resolve the open question of [Frahling-Sohler, STOC'05] about the space complexity of clustering problems in the dynamic streaming model.