A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data

Federated learning (FL) is a novel distributed machine learning paradigm. It can protect data privacy in distributed machine learning. Hence, FL provides new ideas for user behavior analysis. User behavior analysis can be modeled using multiple data sources. However, differences between different da...

Full description

Bibliographic Details
Main Authors:	Jianfei Zhang, Zhongxin Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Electronics
Subjects:	federated learning Non-IID user behavior user modeling
Online Access:	https://www.mdpi.com/2079-9292/12/7/1660

_version_	1797608090378436608
author	Jianfei Zhang Zhongxin Li
author_facet	Jianfei Zhang Zhongxin Li
author_sort	Jianfei Zhang
collection	DOAJ
description	Federated learning (FL) is a novel distributed machine learning paradigm. It can protect data privacy in distributed machine learning. Hence, FL provides new ideas for user behavior analysis. User behavior analysis can be modeled using multiple data sources. However, differences between different data sources can lead to different data distributions, i.e., non-identically and non-independently distributed (Non-IID). Non-IID data usually introduce bias in the training process of FL models, which will affect the model accuracy and convergence speed. In this paper, a new federated learning algorithm is proposed to mitigate the impact of Non-IID data on the model, named federated learning with a two-tier caching mechanism (FedTCM). First, FedTCM clustered similar clients based on their data distribution. Clustering reduces the extent of Non-IID between clients in a cluster. Second, FedTCM uses asynchronous communication methods to alleviate the problem of inconsistent computation speed across different clients. Finally, FedTCM sets up a two-tier caching mechanism on the server for mitigating the Non-IID data between different clusters. In multiple simulated datasets, compared to the method without the federated framework, the FedTCM is maximum 15.8% higher than it and average 12.6% higher than it. Compared to the typical federated method FedAvg, the accuracy of FedTCM is maximum 2.3% higher than it and average 1.6% higher than it. Additionally, FedTCM achieves more excellent communication performance than FedAvg.
first_indexed	2024-03-11T05:38:45Z
format	Article
id	doaj.art-5bac902d694d44ceba27ee2c47833837
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T05:38:45Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-5bac902d694d44ceba27ee2c478338372023-11-17T16:33:45ZengMDPI AGElectronics2079-92922023-03-01127166010.3390/electronics12071660A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID DataJianfei Zhang0Zhongxin Li1School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130000, ChinaSchool of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130000, ChinaFederated learning (FL) is a novel distributed machine learning paradigm. It can protect data privacy in distributed machine learning. Hence, FL provides new ideas for user behavior analysis. User behavior analysis can be modeled using multiple data sources. However, differences between different data sources can lead to different data distributions, i.e., non-identically and non-independently distributed (Non-IID). Non-IID data usually introduce bias in the training process of FL models, which will affect the model accuracy and convergence speed. In this paper, a new federated learning algorithm is proposed to mitigate the impact of Non-IID data on the model, named federated learning with a two-tier caching mechanism (FedTCM). First, FedTCM clustered similar clients based on their data distribution. Clustering reduces the extent of Non-IID between clients in a cluster. Second, FedTCM uses asynchronous communication methods to alleviate the problem of inconsistent computation speed across different clients. Finally, FedTCM sets up a two-tier caching mechanism on the server for mitigating the Non-IID data between different clusters. In multiple simulated datasets, compared to the method without the federated framework, the FedTCM is maximum 15.8% higher than it and average 12.6% higher than it. Compared to the typical federated method FedAvg, the accuracy of FedTCM is maximum 2.3% higher than it and average 1.6% higher than it. Additionally, FedTCM achieves more excellent communication performance than FedAvg.https://www.mdpi.com/2079-9292/12/7/1660federated learningNon-IIDuser behavioruser modeling
spellingShingle	Jianfei Zhang Zhongxin Li A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data Electronics federated learning Non-IID user behavior user modeling
title	A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data
title_full	A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data
title_fullStr	A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data
title_full_unstemmed	A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data
title_short	A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data
title_sort	clustered federated learning method of user behavior analysis based on non iid data
topic	federated learning Non-IID user behavior user modeling
url	https://www.mdpi.com/2079-9292/12/7/1660
work_keys_str_mv	AT jianfeizhang aclusteredfederatedlearningmethodofuserbehavioranalysisbasedonnoniiddata AT zhongxinli aclusteredfederatedlearningmethodofuserbehavioranalysisbasedonnoniiddata AT jianfeizhang clusteredfederatedlearningmethodofuserbehavioranalysisbasedonnoniiddata AT zhongxinli clusteredfederatedlearningmethodofuserbehavioranalysisbasedonnoniiddata

A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data

Similar Items