Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks

Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of g...

Full description

Bibliographic Details
Main Authors: Feldman, Dan, Xiang, Chongyuan, Zhu, Ruihao, Rus, Daniela
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: ACM 2021
Online Access:https://hdl.handle.net/1721.1/137234
_version_ 1811078322311921664
author Feldman, Dan
Xiang, Chongyuan
Zhu, Ruihao
Rus, Daniela
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Feldman, Dan
Xiang, Chongyuan
Zhu, Ruihao
Rus, Daniela
author_sort Feldman, Dan
collection MIT
description Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of gathering places. Can we publish and run queries on mobile sensor network databases without disclosing information about individual nodes? Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This paper describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, we give the first k-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data's dimension d. Previous results introduced errors that are exponential in d. We implemented this algorithm and used it to create differentially private location data from GPS tracks. Specifically our algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. We provide experimental results for the system and algorithms, and compare them to existing techniques. To the best of our knowledge, this is the first practical system that enables differentially private clustering on real data.
first_indexed 2024-09-23T10:57:46Z
format Article
id mit-1721.1/137234
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T10:57:46Z
publishDate 2021
publisher ACM
record_format dspace
spelling mit-1721.1/1372342023-04-11T19:47:27Z Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks Feldman, Dan Xiang, Chongyuan Zhu, Ruihao Rus, Daniela Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of gathering places. Can we publish and run queries on mobile sensor network databases without disclosing information about individual nodes? Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This paper describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, we give the first k-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data's dimension d. Previous results introduced errors that are exponential in d. We implemented this algorithm and used it to create differentially private location data from GPS tracks. Specifically our algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. We provide experimental results for the system and algorithms, and compare them to existing techniques. To the best of our knowledge, this is the first practical system that enables differentially private clustering on real data. 2021-11-03T16:14:02Z 2021-11-03T16:14:02Z 2017-04-18 2019-07-17T14:45:18Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137234 Feldman, Dan, Xiang, Chongyuan, Zhu, Ruihao and Rus, Daniela. 2017. "Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks." en 10.1145/3055031.3055090 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf ACM MIT web domain
spellingShingle Feldman, Dan
Xiang, Chongyuan
Zhu, Ruihao
Rus, Daniela
Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
title Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
title_full Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
title_fullStr Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
title_full_unstemmed Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
title_short Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
title_sort coresets for differentially private k means clustering and applications to privacy in mobile sensor networks
url https://hdl.handle.net/1721.1/137234
work_keys_str_mv AT feldmandan coresetsfordifferentiallyprivatekmeansclusteringandapplicationstoprivacyinmobilesensornetworks
AT xiangchongyuan coresetsfordifferentiallyprivatekmeansclusteringandapplicationstoprivacyinmobilesensornetworks
AT zhuruihao coresetsfordifferentiallyprivatekmeansclusteringandapplicationstoprivacyinmobilesensornetworks
AT rusdaniela coresetsfordifferentiallyprivatekmeansclusteringandapplicationstoprivacyinmobilesensornetworks