A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle larg...

Full description

Bibliographic Details
Main Authors:	Huidong Ling, Xinmu Zhu, Tao Zhu, Mingxing Nie, Zhenghai Liu, Zhenyu Liu
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Entropy
Subjects:	multiobjective clustering Apache Spark multiobjective particle swarm optimization (MOPSO)
Online Access:	https://www.mdpi.com/1099-4300/25/2/259

_version_	1797621153827651584
author	Huidong Ling Xinmu Zhu Tao Zhu Mingxing Nie Zhenghai Liu Zhenyu Liu
author_facet	Huidong Ling Xinmu Zhu Tao Zhu Mingxing Nie Zhenghai Liu Zhenyu Liu
author_sort	Huidong Ling
collection	DOAJ
description	Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle large-scale data. With the development of distributed parallel computing framework, data parallelism was proposed. However, the increase in parallelism will lead to the problem of unbalanced data distribution affecting the clustering effect. In this paper, we propose a parallel multiobjective PSO weighted average clustering algorithm based on apache Spark (Spark-MOPSO-Avg). First, the entire data set is divided into multiple partitions and cached in memory using the distributed parallel and memory-based computing of Apache Spark. The local fitness value of the particle is calculated in parallel according to the data in the partition. After the calculation is completed, only particle information is transmitted, and there is no need to transmit a large number of data objects between each node, reducing the communication of data in the network and thus effectively reducing the algorithm’s running time. Second, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results. Experimental results show that the Spark-MOPSO-Avg algorithm achieves lower information loss under data parallelism, losing about 1% to 9% accuracy, but can effectively reduce the algorithm time overhead. It shows good execution efficiency and parallel computing capability under the Spark distributed cluster.
first_indexed	2024-03-11T08:51:39Z
format	Article
id	doaj.art-3ef881af8619410f9b15c53e8c6aeeda
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-11T08:51:39Z
publishDate	2023-01-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-3ef881af8619410f9b15c53e8c6aeeda2023-11-16T20:23:02ZengMDPI AGEntropy1099-43002023-01-0125225910.3390/e25020259A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache SparkHuidong Ling0Xinmu Zhu1Tao Zhu2Mingxing Nie3Zhenghai Liu4Zhenyu Liu5School of Computer Science, University of South China, Hengyang 421200, ChinaSchool of Computer Science, University of South China, Hengyang 421200, ChinaSchool of Computer Science, University of South China, Hengyang 421200, ChinaSchool of Computer Science, University of South China, Hengyang 421200, ChinaSchool of Computer Science, University of South China, Hengyang 421200, ChinaSchool of Computer Science, University of South China, Hengyang 421200, ChinaMultiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle large-scale data. With the development of distributed parallel computing framework, data parallelism was proposed. However, the increase in parallelism will lead to the problem of unbalanced data distribution affecting the clustering effect. In this paper, we propose a parallel multiobjective PSO weighted average clustering algorithm based on apache Spark (Spark-MOPSO-Avg). First, the entire data set is divided into multiple partitions and cached in memory using the distributed parallel and memory-based computing of Apache Spark. The local fitness value of the particle is calculated in parallel according to the data in the partition. After the calculation is completed, only particle information is transmitted, and there is no need to transmit a large number of data objects between each node, reducing the communication of data in the network and thus effectively reducing the algorithm’s running time. Second, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results. Experimental results show that the Spark-MOPSO-Avg algorithm achieves lower information loss under data parallelism, losing about 1% to 9% accuracy, but can effectively reduce the algorithm time overhead. It shows good execution efficiency and parallel computing capability under the Spark distributed cluster.https://www.mdpi.com/1099-4300/25/2/259multiobjective clusteringApache Sparkmultiobjective particle swarm optimization (MOPSO)
spellingShingle	Huidong Ling Xinmu Zhu Tao Zhu Mingxing Nie Zhenghai Liu Zhenyu Liu A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark Entropy multiobjective clustering Apache Spark multiobjective particle swarm optimization (MOPSO)
title	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_full	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_fullStr	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_full_unstemmed	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_short	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_sort	parallel multiobjective pso weighted average clustering algorithm based on apache spark
topic	multiobjective clustering Apache Spark multiobjective particle swarm optimization (MOPSO)
url	https://www.mdpi.com/1099-4300/25/2/259
work_keys_str_mv	AT huidongling aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT xinmuzhu aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT taozhu aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT mingxingnie aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhenghailiu aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhenyuliu aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT huidongling parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT xinmuzhu parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT taozhu parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT mingxingnie parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhenghailiu parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhenyuliu parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

Similar Items