Overlook: Differentially Private Exploratory Visualization for Big Data

Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited n...

Full description

Bibliographic Details
Main Authors: Mihai Budiu, Pratiksha Thaker, Parikshit Gopalan, Udi Wieder, Matei Zaharia
Format: Article
Language:English
Published: Labor Dynamics Institute 2022-07-01
Series:The Journal of Privacy and Confidentiality
Subjects:
Online Access:http://www.journalprivacyconfidentiality.org/index.php/jpc/article/view/779
Description
Summary:Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store. We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. The key idea in Overlook is virtual synopsis that can be evaluated \emph{incrementally}, without extra space storage or expensive precomputation. Overlook simply executes queries using an existing engine, such as a SQL DBMS, and adds noise to their results. Because Overlook's synopses do not require costly precomputation or storage, data curators can also use Overlook to explore the impact of privacy parameters interactively. Overlook offers a rich visual query interface based on the open source Hillview system. Overlook achieves accuracy comparable to existing synopsis-based systems, while offering better performance and removing the need for extra storage.
ISSN:2575-8527