Improvements to Supercomputing Service Availability Based on Data Analysis
As the demand for high-performance computing (HPC) resources has increased in the field of computational science, an inevitable consideration is service availability in large cluster systems such as supercomputers. In particular, the factor that most affects availability in supercomputing services i...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/13/6166 |
_version_ | 1797528084128202752 |
---|---|
author | Jae-Kook Lee Min-Woo Kwon Do-Sik An Junweon Yoon Taeyoung Hong Joon Woo Sung-Jun Kim Guohua Li |
author_facet | Jae-Kook Lee Min-Woo Kwon Do-Sik An Junweon Yoon Taeyoung Hong Joon Woo Sung-Jun Kim Guohua Li |
author_sort | Jae-Kook Lee |
collection | DOAJ |
description | As the demand for high-performance computing (HPC) resources has increased in the field of computational science, an inevitable consideration is service availability in large cluster systems such as supercomputers. In particular, the factor that most affects availability in supercomputing services is the job scheduler utilized for allocating resources. Consequent to submitting user data through the job scheduler for data analysis, 25.6% of jobs failed because of program errors, scheduler errors, or I/O errors. Based on this analysis, we propose a K-hook method for scheduling to increase the success rate of job submissions and improve the availability of supercomputing services. By applying this method, the job-submission success rate was improved by 15% without negatively affecting users’ waiting time. We also achieved a mean time between interrupts (MTBI) of 24.3 days and maintained average system availability at 97%. As this research was verified on the Nurion supercomputer in a real service environment, the value of the research is expected to be found in significant service improvements. |
first_indexed | 2024-03-10T09:53:04Z |
format | Article |
id | doaj.art-a3fe5ad3b6be4a309b8d7b2d77a7cf3f |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T09:53:04Z |
publishDate | 2021-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-a3fe5ad3b6be4a309b8d7b2d77a7cf3f2023-11-22T02:34:34ZengMDPI AGApplied Sciences2076-34172021-07-011113616610.3390/app11136166Improvements to Supercomputing Service Availability Based on Data AnalysisJae-Kook Lee0Min-Woo Kwon1Do-Sik An2Junweon Yoon3Taeyoung Hong4Joon Woo5Sung-Jun Kim6Guohua Li7National Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaNational Supercomputing Center, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, KoreaAs the demand for high-performance computing (HPC) resources has increased in the field of computational science, an inevitable consideration is service availability in large cluster systems such as supercomputers. In particular, the factor that most affects availability in supercomputing services is the job scheduler utilized for allocating resources. Consequent to submitting user data through the job scheduler for data analysis, 25.6% of jobs failed because of program errors, scheduler errors, or I/O errors. Based on this analysis, we propose a K-hook method for scheduling to increase the success rate of job submissions and improve the availability of supercomputing services. By applying this method, the job-submission success rate was improved by 15% without negatively affecting users’ waiting time. We also achieved a mean time between interrupts (MTBI) of 24.3 days and maintained average system availability at 97%. As this research was verified on the Nurion supercomputer in a real service environment, the value of the research is expected to be found in significant service improvements.https://www.mdpi.com/2076-3417/11/13/6166high-performance computingsupercomputing servicedata analysisservice availabilityresource schedulerresource utilization |
spellingShingle | Jae-Kook Lee Min-Woo Kwon Do-Sik An Junweon Yoon Taeyoung Hong Joon Woo Sung-Jun Kim Guohua Li Improvements to Supercomputing Service Availability Based on Data Analysis Applied Sciences high-performance computing supercomputing service data analysis service availability resource scheduler resource utilization |
title | Improvements to Supercomputing Service Availability Based on Data Analysis |
title_full | Improvements to Supercomputing Service Availability Based on Data Analysis |
title_fullStr | Improvements to Supercomputing Service Availability Based on Data Analysis |
title_full_unstemmed | Improvements to Supercomputing Service Availability Based on Data Analysis |
title_short | Improvements to Supercomputing Service Availability Based on Data Analysis |
title_sort | improvements to supercomputing service availability based on data analysis |
topic | high-performance computing supercomputing service data analysis service availability resource scheduler resource utilization |
url | https://www.mdpi.com/2076-3417/11/13/6166 |
work_keys_str_mv | AT jaekooklee improvementstosupercomputingserviceavailabilitybasedondataanalysis AT minwookwon improvementstosupercomputingserviceavailabilitybasedondataanalysis AT dosikan improvementstosupercomputingserviceavailabilitybasedondataanalysis AT junweonyoon improvementstosupercomputingserviceavailabilitybasedondataanalysis AT taeyounghong improvementstosupercomputingserviceavailabilitybasedondataanalysis AT joonwoo improvementstosupercomputingserviceavailabilitybasedondataanalysis AT sungjunkim improvementstosupercomputingserviceavailabilitybasedondataanalysis AT guohuali improvementstosupercomputingserviceavailabilitybasedondataanalysis |