Efficient compute-intensive job allocation in data centers via deep reinforcement learning

Reducing the energy consumption of the servers in a data center via proper job allocation is desirable. Existing advanced job allocation algorithms, based on constrained optimization formulations capturing servers' complex power consumption and thermal dynamics, often scale poorly with the data...

Full description

Bibliographic Details
Main Authors:	Yi, Deliang, Zhou, Xin, Wen, Yonggang, Tan, Rui
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Job Allocation Data Center
Online Access:	https://hdl.handle.net/10356/161048

_version_	1826123415528931328
author	Yi, Deliang Zhou, Xin Wen, Yonggang Tan, Rui
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Yi, Deliang Zhou, Xin Wen, Yonggang Tan, Rui
author_sort	Yi, Deliang
collection	NTU
description	Reducing the energy consumption of the servers in a data center via proper job allocation is desirable. Existing advanced job allocation algorithms, based on constrained optimization formulations capturing servers' complex power consumption and thermal dynamics, often scale poorly with the data center size and optimization horizon. This article applies deep reinforcement learning to build an allocation algorithm for long-lasting and compute-intensive jobs that are increasingly seen among today's computation demands. Specifically, a deep Q-network is trained to allocate jobs, aiming to maximize a cumulative reward over long horizons. The training is performed offline using a computational model based on long short-term memory networks that capture the servers' power and thermal dynamics. This offline training approach avoids slow online convergence, low energy efficiency, and potential server overheating during the agent's extensive state-action space exploration if it directly interacts with the physical data center in the usually adopted online learning scheme. At run time, the trained Q-network is forward-propagated with little computation to allocate jobs. Evaluation based on eight months' physical state and job arrival records from a national supercomputing data center hosting 1,152 processors shows that our solution reduces computing power consumption by more than 10 percent and processor temperature by more than 4°C without sacrificing job processing throughput.
first_indexed	2024-10-01T06:04:17Z
format	Journal Article
id	ntu-10356/161048
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T06:04:17Z
publishDate	2022
record_format	dspace
spelling	ntu-10356/1610482022-08-12T07:01:23Z Efficient compute-intensive job allocation in data centers via deep reinforcement learning Yi, Deliang Zhou, Xin Wen, Yonggang Tan, Rui School of Computer Science and Engineering Engineering::Computer science and engineering Job Allocation Data Center Reducing the energy consumption of the servers in a data center via proper job allocation is desirable. Existing advanced job allocation algorithms, based on constrained optimization formulations capturing servers' complex power consumption and thermal dynamics, often scale poorly with the data center size and optimization horizon. This article applies deep reinforcement learning to build an allocation algorithm for long-lasting and compute-intensive jobs that are increasingly seen among today's computation demands. Specifically, a deep Q-network is trained to allocate jobs, aiming to maximize a cumulative reward over long horizons. The training is performed offline using a computational model based on long short-term memory networks that capture the servers' power and thermal dynamics. This offline training approach avoids slow online convergence, low energy efficiency, and potential server overheating during the agent's extensive state-action space exploration if it directly interacts with the physical data center in the usually adopted online learning scheme. At run time, the trained Q-network is forward-propagated with little computation to allocate jobs. Evaluation based on eight months' physical state and job arrival records from a national supercomputing data center hosting 1,152 processors shows that our solution reduces computing power consumption by more than 10 percent and processor temperature by more than 4°C without sacrificing job processing throughput. National Research Foundation (NRF) This research was in part supported by the Nation Research Foundation, Prime Minister's Office, Singapore under its Green Buildings Innovation Cluster (GBIC Award No. NRF2015ENC-GBICRD001-012) and Green Data Centre Research (GDCR Award No. NRF2015ENC-GDCR01001-003), and the AlibabaGroup under its project (RefNo. M4062352). 2022-08-12T07:01:22Z 2022-08-12T07:01:22Z 2020 Journal Article Yi, D., Zhou, X., Wen, Y. & Tan, R. (2020). Efficient compute-intensive job allocation in data centers via deep reinforcement learning. IEEE Transactions On Parallel and Distributed Systems, 31(6), 1474-1485. https://dx.doi.org/10.1109/TPDS.2020.2968427 1045-9219 https://hdl.handle.net/10356/161048 10.1109/TPDS.2020.2968427 2-s2.0-85080911615 6 31 1474 1485 en NRF2015ENC-GBICRD001-012 NRF2015ENC-GDCR01001-003 IEEE Transactions on Parallel and Distributed Systems © 2020 IEEE. All rights reserved.
spellingShingle	Engineering::Computer science and engineering Job Allocation Data Center Yi, Deliang Zhou, Xin Wen, Yonggang Tan, Rui Efficient compute-intensive job allocation in data centers via deep reinforcement learning
title	Efficient compute-intensive job allocation in data centers via deep reinforcement learning
title_full	Efficient compute-intensive job allocation in data centers via deep reinforcement learning
title_fullStr	Efficient compute-intensive job allocation in data centers via deep reinforcement learning
title_full_unstemmed	Efficient compute-intensive job allocation in data centers via deep reinforcement learning
title_short	Efficient compute-intensive job allocation in data centers via deep reinforcement learning
title_sort	efficient compute intensive job allocation in data centers via deep reinforcement learning
topic	Engineering::Computer science and engineering Job Allocation Data Center
url	https://hdl.handle.net/10356/161048
work_keys_str_mv	AT yideliang efficientcomputeintensivejoballocationindatacentersviadeepreinforcementlearning AT zhouxin efficientcomputeintensivejoballocationindatacentersviadeepreinforcementlearning AT wenyonggang efficientcomputeintensivejoballocationindatacentersviadeepreinforcementlearning AT tanrui efficientcomputeintensivejoballocationindatacentersviadeepreinforcementlearning

Efficient compute-intensive job allocation in data centers via deep reinforcement learning

Similar Items