Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job re...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10005121/ |
_version_ | 1797902260299104256 |
---|---|
author | Jiwoo Bang Alexander Sim Glenn K. Lockwood Hyeonsang Eom Hanul Sung |
author_facet | Jiwoo Bang Alexander Sim Glenn K. Lockwood Hyeonsang Eom Hanul Sung |
author_sort | Jiwoo Bang |
collection | DOAJ |
description | Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-the-art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer. |
first_indexed | 2024-04-10T09:14:52Z |
format | Article |
id | doaj.art-f5b1ea64edb8453d997627d7424bfc2f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-10T09:14:52Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-f5b1ea64edb8453d997627d7424bfc2f2023-02-21T00:01:54ZengIEEEIEEE Access2169-35362023-01-01113386340110.1109/ACCESS.2022.323382910005121Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage SystemsJiwoo Bang0https://orcid.org/0000-0002-3556-2535Alexander Sim1https://orcid.org/0000-0002-6295-1982Glenn K. Lockwood2Hyeonsang Eom3Hanul Sung4https://orcid.org/0000-0002-1103-8755Department of Computer Science and Engineering, Seoul National University, Seoul, South KoreaLawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, USALawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, USADepartment of Computer Science and Engineering, Seoul National University, Seoul, South KoreaDepartment of Game Design and Development, Sangmyung University, Seoul, South KoreaBurst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-the-art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer.https://ieeexplore.ieee.org/document/10005121/Burst buffercheckpointdemotionover-subscriptionparallel file systemrestart |
spellingShingle | Jiwoo Bang Alexander Sim Glenn K. Lockwood Hyeonsang Eom Hanul Sung Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems IEEE Access Burst buffer checkpoint demotion over-subscription parallel file system restart |
title | Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems |
title_full | Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems |
title_fullStr | Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems |
title_full_unstemmed | Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems |
title_short | Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems |
title_sort | design and implementation of burst buffer over subscription scheme for hpc storage systems |
topic | Burst buffer checkpoint demotion over-subscription parallel file system restart |
url | https://ieeexplore.ieee.org/document/10005121/ |
work_keys_str_mv | AT jiwoobang designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems AT alexandersim designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems AT glennklockwood designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems AT hyeonsangeom designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems AT hanulsung designandimplementationofburstbufferoversubscriptionschemeforhpcstoragesystems |