K-Means Clustering with Local Distance Privacy

With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life....

Full description

Bibliographic Details
Main Authors:	Mengmeng Yang, Longxia Huang, Chenghua Tang
Format:	Article
Language:	English
Published:	Tsinghua University Press 2023-12-01
Series:	Big Data Mining and Analytics
Subjects:	k-means clustering local differential privacy data analysis
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2022.9020050

_version_	1826817783090905088
author	Mengmeng Yang Longxia Huang Chenghua Tang
author_facet	Mengmeng Yang Longxia Huang Chenghua Tang
author_sort	Mengmeng Yang
collection	DOAJ
description	With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users’ data, which disclose users’ privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users’ records in-distinguish from each other in high-dimensional space, we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local dE-privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.
first_indexed	2024-03-08T21:44:48Z
format	Article
id	doaj.art-33b0b8276f4346eaba69c440509189ac
institution	Directory Open Access Journal
issn	2096-0654
language	English
last_indexed	2025-02-16T05:41:27Z
publishDate	2023-12-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj.art-33b0b8276f4346eaba69c440509189ac2025-02-03T11:01:40ZengTsinghua University PressBig Data Mining and Analytics2096-06542023-12-016443344210.26599/BDMA.2022.9020050K-Means Clustering with Local Distance PrivacyMengmeng Yang0Longxia Huang1Chenghua Tang2Data61, Commonwealth Scientific and Industrial Research Organization, Melbourne 3168, AustraliaSchool of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, ChinaSchool of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541010, ChinaWith the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users’ data, which disclose users’ privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users’ records in-distinguish from each other in high-dimensional space, we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local dE-privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.https://www.sciopen.com/article/10.26599/BDMA.2022.9020050k-means clusteringlocal differential privacydata analysis
spellingShingle	Mengmeng Yang Longxia Huang Chenghua Tang K-Means Clustering with Local Distance Privacy Big Data Mining and Analytics k-means clustering local differential privacy data analysis
title	K-Means Clustering with Local Distance Privacy
title_full	K-Means Clustering with Local Distance Privacy
title_fullStr	K-Means Clustering with Local Distance Privacy
title_full_unstemmed	K-Means Clustering with Local Distance Privacy
title_short	K-Means Clustering with Local Distance Privacy
title_sort	k means clustering with local distance privacy
topic	k-means clustering local differential privacy data analysis
url	https://www.sciopen.com/article/10.26599/BDMA.2022.9020050
work_keys_str_mv	AT mengmengyang kmeansclusteringwithlocaldistanceprivacy AT longxiahuang kmeansclusteringwithlocaldistanceprivacy AT chenghuatang kmeansclusteringwithlocaldistanceprivacy

K-Means Clustering with Local Distance Privacy

Similar Items