Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some o...

Full description

Bibliographic Details
Main Authors:	Zhiguang Chu, Jingsha He, Xiaolei Zhang, Xing Zhang, Nafei Zhu
Format:	Article
Language:	English
Published:	MDPI AG 2023-04-01
Series:	Electronics
Subjects:	high-dimensional data feature selection random forest clustering differential privacy
Online Access:	https://www.mdpi.com/2079-9292/12/9/1959

_version_	1797602751283200000
author	Zhiguang Chu Jingsha He Xiaolei Zhang Xing Zhang Nafei Zhu
author_facet	Zhiguang Chu Jingsha He Xiaolei Zhang Xing Zhang Nafei Zhu
author_sort	Zhiguang Chu
collection	DOAJ
description	As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI’s Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.
first_indexed	2024-03-11T04:21:04Z
format	Article
id	doaj.art-51016bffe9144e3599689d38d2c0b186
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T04:21:04Z
publishDate	2023-04-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-51016bffe9144e3599689d38d2c0b1862023-11-17T22:46:49ZengMDPI AGElectronics2079-92922023-04-01129195910.3390/electronics12091959Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and ClusteringZhiguang Chu0Jingsha He1Xiaolei Zhang2Xing Zhang3Nafei Zhu4School of Software Engineering, Beijing University of Technology, Beijing 100124, ChinaSchool of Software Engineering, Beijing University of Technology, Beijing 100124, ChinaKey Laboratory of Security for Network and Data in Industrial Internet of Liaoning Province, Jinzhou 121000, ChinaKey Laboratory of Security for Network and Data in Industrial Internet of Liaoning Province, Jinzhou 121000, ChinaSchool of Software Engineering, Beijing University of Technology, Beijing 100124, ChinaAs a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI’s Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.https://www.mdpi.com/2079-9292/12/9/1959high-dimensional datafeature selectionrandom forestclusteringdifferential privacy
spellingShingle	Zhiguang Chu Jingsha He Xiaolei Zhang Xing Zhang Nafei Zhu Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering Electronics high-dimensional data feature selection random forest clustering differential privacy
title	Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering
title_full	Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering
title_fullStr	Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering
title_full_unstemmed	Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering
title_short	Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering
title_sort	differential privacy high dimensional data publishing based on feature selection and clustering
topic	high-dimensional data feature selection random forest clustering differential privacy
url	https://www.mdpi.com/2079-9292/12/9/1959
work_keys_str_mv	AT zhiguangchu differentialprivacyhighdimensionaldatapublishingbasedonfeatureselectionandclustering AT jingshahe differentialprivacyhighdimensionaldatapublishingbasedonfeatureselectionandclustering AT xiaoleizhang differentialprivacyhighdimensionaldatapublishingbasedonfeatureselectionandclustering AT xingzhang differentialprivacyhighdimensionaldatapublishingbasedonfeatureselectionandclustering AT nafeizhu differentialprivacyhighdimensionaldatapublishingbasedonfeatureselectionandclustering

Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

Similar Items