Private Similarity Search with Sublinear Communication

Nearest neighbor search is a fundamental building-block for a wide range of applications. A privacy-preserving protocol for nearest neighbor search involves a set of clients who send queries to a remote database. Each client retrieves the nearest neighbor(s) to its query in the database without reve...

全面介绍

书目详细资料
主要作者: Servan-Schreiber, Sacha
其他作者: Devadas, Srinivas
格式: Thesis
出版: Massachusetts Institute of Technology 2022
在线阅读:https://hdl.handle.net/1721.1/140146
实物特征
总结:Nearest neighbor search is a fundamental building-block for a wide range of applications. A privacy-preserving protocol for nearest neighbor search involves a set of clients who send queries to a remote database. Each client retrieves the nearest neighbor(s) to its query in the database without revealing any information about the query. For database privacy, the client must not learn anything beyond the query answer. Existing protocols for private nearest neighbor search require heavy cryptographic tools, resulting in poor practical performance or large client overheads. In this thesis, we present the first lightweight protocol for private nearest neighbor search. Our protocol is instantiated using two non-colluding servers, each holding a replica of the database. The protocol supports an arbitrary number of clients simultaneously querying the database via these servers. Each query is only a single round of communication for the client and does not require any communication between servers. If at least one of the servers is non-colluding, we ensure that (1) no information is revealed on the client’s query, (2) the total communication between the client and the servers is sublinear in the database size, and (3) each query answer only leaks a small and precisely quantified amount of information about the database to the client, even when the client is acting maliciously. We implement our protocol and report its performance on real-world data. Our construction requires between 10 and 30 seconds of server processing per query over large databases of 10M feature vectors. Client overhead remained under 10 µs of processing time per query and typically less than 4 MB of communication, depending on parameters.