K-means clustering with local dᵪ-privacy for privacy-preserving data analysis

Privacy-preserving data analysis is an emerging area that addresses the dilemma of performing data analysis on user data while protecting users' privacy. In this paper, we consider the problem of constructing privacy-preserving K -means clustering protocol for data analysis that provides local...

Full description

Bibliographic Details
Main Authors:	Yang, Mengmeng, Tjuawinata, Ivan, Lam, Kwok-Yan
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Differential Privacy K-means Clustering
Online Access:	https://hdl.handle.net/10356/168038

_version_	1826112258685534208
author	Yang, Mengmeng Tjuawinata, Ivan Lam, Kwok-Yan
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Yang, Mengmeng Tjuawinata, Ivan Lam, Kwok-Yan
author_sort	Yang, Mengmeng
collection	NTU
description	Privacy-preserving data analysis is an emerging area that addresses the dilemma of performing data analysis on user data while protecting users' privacy. In this paper, we consider the problem of constructing privacy-preserving K -means clustering protocol for data analysis that provides local privacy to users' data. To enable a desirable degree of local privacy guarantee while maintaining high accuracy of the clustering, we adopt a generalized differential privacy definition, dχ -privacy, which quantifies the distinguishability level based on the distance between data records defined by the distance function dχ. In our work, we consider the space of data points as a metric space imbued with Euclidean distance and propose a bounded perturbation mechanism (BPM) with bounded sampling space of the perturbed data points, which is formally shown to achieve dχ -privacy. BPM perturbs the data as a whole instead of treating each dimension independently, which is desirable since the privacy budget is no longer required to be split among different dimensions. Bounded output space also means that we will not get into the case where the report or the statistical result is so far out of the data domain that it is hard to interpret. Furthermore, it can also help in limiting the amount of bandwidth needed to send such report to the server. The design of BPM is based on a probability density function which decreases exponentially as the Euclidean distance with respect to the true value grows. It is also designed with the aim of ensuring that BPM produces perturbed data that provides the claimed privacy guarantee while ensuring high utility response. To guarantee the efficiency of the perturbation method, we propose an efficient algorithm to sample from the proposed distribution and apply BPM to the design of dχ -private K -means clustering algorithms. Lastly, we analyse the privacy and utility guarantee provided by the proposed method and provide its experimental results.
first_indexed	2024-10-01T03:04:03Z
format	Journal Article
id	ntu-10356/168038
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:04:03Z
publishDate	2023
record_format	dspace
spelling	ntu-10356/1680382023-05-26T15:36:13Z K-means clustering with local dᵪ-privacy for privacy-preserving data analysis Yang, Mengmeng Tjuawinata, Ivan Lam, Kwok-Yan School of Computer Science and Engineering Strategic Centre for Research in Privacy-Preserving Technologies and Systems Engineering::Computer science and engineering Differential Privacy K-means Clustering Privacy-preserving data analysis is an emerging area that addresses the dilemma of performing data analysis on user data while protecting users' privacy. In this paper, we consider the problem of constructing privacy-preserving K -means clustering protocol for data analysis that provides local privacy to users' data. To enable a desirable degree of local privacy guarantee while maintaining high accuracy of the clustering, we adopt a generalized differential privacy definition, dχ -privacy, which quantifies the distinguishability level based on the distance between data records defined by the distance function dχ. In our work, we consider the space of data points as a metric space imbued with Euclidean distance and propose a bounded perturbation mechanism (BPM) with bounded sampling space of the perturbed data points, which is formally shown to achieve dχ -privacy. BPM perturbs the data as a whole instead of treating each dimension independently, which is desirable since the privacy budget is no longer required to be split among different dimensions. Bounded output space also means that we will not get into the case where the report or the statistical result is so far out of the data domain that it is hard to interpret. Furthermore, it can also help in limiting the amount of bandwidth needed to send such report to the server. The design of BPM is based on a probability density function which decreases exponentially as the Euclidean distance with respect to the true value grows. It is also designed with the aim of ensuring that BPM produces perturbed data that provides the claimed privacy guarantee while ensuring high utility response. To guarantee the efficiency of the perturbation method, we propose an efficient algorithm to sample from the proposed distribution and apply BPM to the design of dχ -private K -means clustering algorithms. Lastly, we analyse the privacy and utility guarantee provided by the proposed method and provide its experimental results. National Research Foundation (NRF) Submitted/Accepted version This work was supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. 2023-05-19T06:14:12Z 2023-05-19T06:14:12Z 2022 Journal Article Yang, M., Tjuawinata, I. & Lam, K. (2022). K-means clustering with local dᵪ-privacy for privacy-preserving data analysis. IEEE Transactions On Information Forensics and Security, 17, 2524-2537. https://dx.doi.org/10.1109/TIFS.2022.3189532 1556-6013 https://hdl.handle.net/10356/168038 10.1109/TIFS.2022.3189532 2-s2.0-85134237951 17 2524 2537 en IEEE Transactions on Information Forensics and Security © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TIFS.2022.3189532. application/pdf application/pdf
spellingShingle	Engineering::Computer science and engineering Differential Privacy K-means Clustering Yang, Mengmeng Tjuawinata, Ivan Lam, Kwok-Yan K-means clustering with local dᵪ-privacy for privacy-preserving data analysis
title	K-means clustering with local dᵪ-privacy for privacy-preserving data analysis
title_full	K-means clustering with local dᵪ-privacy for privacy-preserving data analysis
title_fullStr	K-means clustering with local dᵪ-privacy for privacy-preserving data analysis
title_full_unstemmed	K-means clustering with local dᵪ-privacy for privacy-preserving data analysis
title_short	K-means clustering with local dᵪ-privacy for privacy-preserving data analysis
title_sort	k means clustering with local dᵪ privacy for privacy preserving data analysis
topic	Engineering::Computer science and engineering Differential Privacy K-means Clustering
url	https://hdl.handle.net/10356/168038
work_keys_str_mv	AT yangmengmeng kmeansclusteringwithlocaldχprivacyforprivacypreservingdataanalysis AT tjuawinataivan kmeansclusteringwithlocaldχprivacyforprivacypreservingdataanalysis AT lamkwokyan kmeansclusteringwithlocaldχprivacyforprivacypreservingdataanalysis

K-means clustering with local dᵪ-privacy for privacy-preserving data analysis

Similar Items