RCM: A Remote Cache Management Framework for Spark

With the rapid growth of Internet data, the performance of big data processing platforms is attracting more and more attention. In Spark, cache data are replaced by the Least Recently Used (LRU) Algorithm. LRU cannot identify the cost of cache data, which leads to replacing some important cache data...

Full description

Bibliographic Details
Main Authors:	Yixin Song, Junyang Yu, Bohan Li, Han Li, Xin He, Jinjiang Wang, Rui Zhai
Format:	Article
Language:	English
Published:	MDPI AG 2022-11-01
Series:	Applied Sciences
Subjects:	cache weight generation cache replacement cache placement cache management framework Spark
Online Access:	https://www.mdpi.com/2076-3417/12/22/11491

_version_	1797465995244208128
author	Yixin Song Junyang Yu Bohan Li Han Li Xin He Jinjiang Wang Rui Zhai
author_facet	Yixin Song Junyang Yu Bohan Li Han Li Xin He Jinjiang Wang Rui Zhai
author_sort	Yixin Song
collection	DOAJ
description	With the rapid growth of Internet data, the performance of big data processing platforms is attracting more and more attention. In Spark, cache data are replaced by the Least Recently Used (LRU) Algorithm. LRU cannot identify the cost of cache data, which leads to replacing some important cache data. In addition, the placement of cache data is random, which lacks a measure to find efficient cache servers. Focusing on the above problems, a remote cache management framework (RCM) for the Spark platform was proposed, including a cache weight generation module (CWG), cache replacement module (CREP), and cache placement module (CPL). CWG establishes initial weights from three main factors: the response time of the query database, the number of queries, and the data size. Then, CWG reduces the old data weight through a time loss function. CREP promises that the sum of cache data weights is maximized by a greedy strategy. CPL allocates the best cache server for data based on the Kuhn-Munkres matching algorithm to improve cooperation efficiency. To verify the effectiveness of RCM, RCM is implemented on Redis and deployed on eight computing nodes and four cache servers. Three groups of benchmark jobs, PageRank, K-means and WordCount, is tested. The result of experiments confirmed that compared with MCM, SACM and DMAOM, the execution time of RCM is reduced by 42.1% at most.
first_indexed	2024-03-09T18:30:31Z
format	Article
id	doaj.art-9e46333d02f84e4d8828581761292ddc
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T18:30:31Z
publishDate	2022-11-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-9e46333d02f84e4d8828581761292ddc2023-11-24T07:36:11ZengMDPI AGApplied Sciences2076-34172022-11-0112221149110.3390/app122211491RCM: A Remote Cache Management Framework for SparkYixin Song0Junyang Yu1Bohan Li2Han Li3Xin He4Jinjiang Wang5Rui Zhai6School of Software, Henan University, Kaifeng 475001, ChinaSchool of Software, Henan University, Kaifeng 475001, ChinaSchool of Software, Henan University, Kaifeng 475001, ChinaSchool of Software, Henan University, Kaifeng 475001, ChinaSchool of Software, Henan University, Kaifeng 475001, ChinaSchool of Software, Henan University, Kaifeng 475001, ChinaSchool of Software, Henan University, Kaifeng 475001, ChinaWith the rapid growth of Internet data, the performance of big data processing platforms is attracting more and more attention. In Spark, cache data are replaced by the Least Recently Used (LRU) Algorithm. LRU cannot identify the cost of cache data, which leads to replacing some important cache data. In addition, the placement of cache data is random, which lacks a measure to find efficient cache servers. Focusing on the above problems, a remote cache management framework (RCM) for the Spark platform was proposed, including a cache weight generation module (CWG), cache replacement module (CREP), and cache placement module (CPL). CWG establishes initial weights from three main factors: the response time of the query database, the number of queries, and the data size. Then, CWG reduces the old data weight through a time loss function. CREP promises that the sum of cache data weights is maximized by a greedy strategy. CPL allocates the best cache server for data based on the Kuhn-Munkres matching algorithm to improve cooperation efficiency. To verify the effectiveness of RCM, RCM is implemented on Redis and deployed on eight computing nodes and four cache servers. Three groups of benchmark jobs, PageRank, K-means and WordCount, is tested. The result of experiments confirmed that compared with MCM, SACM and DMAOM, the execution time of RCM is reduced by 42.1% at most.https://www.mdpi.com/2076-3417/12/22/11491cache weight generationcache replacementcache placementcache management frameworkSpark
spellingShingle	Yixin Song Junyang Yu Bohan Li Han Li Xin He Jinjiang Wang Rui Zhai RCM: A Remote Cache Management Framework for Spark Applied Sciences cache weight generation cache replacement cache placement cache management framework Spark
title	RCM: A Remote Cache Management Framework for Spark
title_full	RCM: A Remote Cache Management Framework for Spark
title_fullStr	RCM: A Remote Cache Management Framework for Spark
title_full_unstemmed	RCM: A Remote Cache Management Framework for Spark
title_short	RCM: A Remote Cache Management Framework for Spark
title_sort	rcm a remote cache management framework for spark
topic	cache weight generation cache replacement cache placement cache management framework Spark
url	https://www.mdpi.com/2076-3417/12/22/11491
work_keys_str_mv	AT yixinsong rcmaremotecachemanagementframeworkforspark AT junyangyu rcmaremotecachemanagementframeworkforspark AT bohanli rcmaremotecachemanagementframeworkforspark AT hanli rcmaremotecachemanagementframeworkforspark AT xinhe rcmaremotecachemanagementframeworkforspark AT jinjiangwang rcmaremotecachemanagementframeworkforspark AT ruizhai rcmaremotecachemanagementframeworkforspark

RCM: A Remote Cache Management Framework for Spark

Similar Items