Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems

Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job re...

Full description

Bibliographic Details
Main Authors: Jiwoo Bang, Alexander Sim, Glenn K. Lockwood, Hyeonsang Eom, Hanul Sung
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10005121/
_version_ 1797902260299104256
author Jiwoo Bang
Alexander Sim
Glenn K. Lockwood
Hyeonsang Eom
Hanul Sung
author_facet Jiwoo Bang
Alexander Sim
Glenn K. Lockwood
Hyeonsang Eom
Hanul Sung
author_sort Jiwoo Bang
collection DOAJ
description Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-the-art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer.
first_indexed 2024-04-10T09:14:52Z
format Article
id doaj.art-f5b1ea64edb8453d997627d7424bfc2f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T09:14:52Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f5b1ea64edb8453d997627d7424bfc2f2023-02-21T00:01:54ZengIEEEIEEE Access2169-35362023-01-01113386340110.1109/ACCESS.2022.323382910005121Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage SystemsJiwoo Bang0https://orcid.org/0000-0002-3556-2535Alexander Sim1https://orcid.org/0000-0002-6295-1982Glenn K. Lockwood2Hyeonsang Eom3Hanul Sung4https://orcid.org/0000-0002-1103-8755Department of Computer Science and Engineering, Seoul National University, Seoul, South KoreaLawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, USALawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, USADepartment of Computer Science and Engineering, Seoul National University, Seoul, South KoreaDepartment of Game Design and Development, Sangmyung University, Seoul, South KoreaBurst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-the-art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer.https://ieeexplore.ieee.org/document/10005121/Burst buffercheckpointdemotionover-subscriptionparallel file systemrestart
spellingShingle Jiwoo Bang
Alexander Sim
Glenn K. Lockwood
Hyeonsang Eom
Hanul Sung
Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
IEEE Access
Burst buffer
checkpoint
demotion
over-subscription
parallel file system
restart
title Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
title_full Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
title_fullStr Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
title_full_unstemmed Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
title_short Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
title_sort design and implementation of burst buffer over subscription scheme for hpc storage systems
topic Burst buffer
checkpoint
demotion
over-subscription
parallel file system
restart
url https://ieeexplore.ieee.org/document/10005121/
work_keys_str_mv AT jiwoobang designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems
AT alexandersim designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems
AT glennklockwood designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems
AT hyeonsangeom designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems
AT hanulsung designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems