Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for co...

Full description

Bibliographic Details
Main Authors: M. Owhadi-Kareshki, M.R. Akbarzadeh-T.
Format: Article
Language:English
Published: Shahrood University of Technology 2019-11-01
Series:Journal of Artificial Intelligence and Data Mining
Subjects:
Online Access:http://jad.shahroodut.ac.ir/article_1258_ec43357345aac2fb0e4ccd478b59a215.pdf
Description
Summary:The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in the consensus process, hence no private data are transferred. With the proposed use of entropy as an internal measure of consensus clustering validation at each machine, the cluster centers of the local machines with higher expected clustering validity have more influence in the final consensus centers. We also employ relative cost function of the local Fuzzy C-Means (FCM) and the number of data points in each machine as measures of relative machine validity as compared to other machines and its reliability, respectively. The utility of the proposed consensus strategy is examined on 18 datasets from the UCI repository in terms of clustering accuracy and speed up against the centralized version of FCM. Several experiments confirm that the proposed approach yields to higher speed up and accuracy while maintaining data security due to its protected and distributed processing approach.
ISSN:2322-5211
2322-4444