Non-equal-width histogram publishing method based on differential privacy

Existing histogram publishing technology based on differential privacy may show phenomenon of "retracting" and "zero bucket" when histogram is used to reflect the real distribution characteristics of data, and "too gentle" in the case of large data volume. In addition,...

Full description

Bibliographic Details
Main Author: YANG Lei, ZHENG Xiao, ZHAO Wei
Format: Article
Language:English
Published: POSTS&TELECOM PRESS Co., LTD 2020-06-01
Series:网络与信息安全学报
Subjects:
Online Access:http://www.infocomm-journal.com/cjnis/CN/10.11959/j.issn.2096-109x.2020035
Description
Summary:Existing histogram publishing technology based on differential privacy may show phenomenon of "retracting" and "zero bucket" when histogram is used to reflect the real distribution characteristics of data, and "too gentle" in the case of large data volume. In addition, the existing technology of the original histogram difference of privacy protection when not considering the amount of information of each group is different. In view of the above problems, a kind of non-equal-width histogram publishing method based on differential privacy was proposed. First of all, a non-isometric histogram based on the sparseness of the data should be reasonably constructed by empirical distribution function. Secondly, differential privacy protection technology should be applied to non-equal-width histogram to protect the privacy of the original non-equal-width histogram. Finally, the privacy budget should be set for each group according to the class widths of the non-equal-width histogram to improve the privacy of each group of data. The experimental results show that the sparseness of the data distribution is fully taken into account when using the proposed method to perform histogram publishing under differential privacy, effectively avoid the phenomenon of histogram with “retracting” and “zero barrels”, and the accuracy of the published histogram for reflecting the characteristics of the data distribution is guaranteed. Also, when adding noise in line with Laplace mechanism to each group, setting a reasonable privacy budget for each group according to the class widths to some extent increases the privacy of different data segments.
ISSN:2096-109X