CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Efficient scheduling among simultaneous simulation jobs is of critical importance in the allocation of limited computing and I/O resources. The difficulty of predicting when a job is completed can cause nontrivial problems for system administrators and users e.g., squandered resources, long waiting...

Full description

Bibliographic Details
Main Authors:	Young-Kyoon Suh, Seounghyeon Kim, Jeeyoung Kim
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Simulation runtime estimation ensemble machine learning pre-processing simulation provenance clustering classification
Online Access:	https://ieeexplore.ieee.org/document/9281033/

_version_	1818429536862208000
author	Young-Kyoon Suh Seounghyeon Kim Jeeyoung Kim
author_facet	Young-Kyoon Suh Seounghyeon Kim Jeeyoung Kim
author_sort	Young-Kyoon Suh
collection	DOAJ
description	Efficient scheduling among simultaneous simulation jobs is of critical importance in the allocation of limited computing and I/O resources. The difficulty of predicting when a job is completed can cause nontrivial problems for system administrators and users e.g., squandered resources, long waiting times, and simulation plan delays. To alleviate these problems, we propose a novel simulation runtime estimation scheme termed CLUTCH, which employs a well-orchestrated ensemble of clustering, classification, and regression techniques. The proposed scheme trains a runtime estimation model through a series of steps: (i) grouping past simulation provenance records by clustering, (ii) labeling each of the grouped records by classification, and (iii) performing regression on the execution times in each group. Given a simulation and its external arguments, the trained model predicts the simulation's runtime with high accuracy in a black box fashion, using only basic external arguments without needing extra information. We additionally propose two optimization algorithms which significantly reduce training overhead without sacrificing estimation quality. In the experiment with real datasets, our model achieved approximately a 14.2% growth in estimation accuracy, compared to the most recent state-of-the-art method; with our optimizations applied, the model was trained 16 times faster while still retaining accuracy.
first_indexed	2024-12-14T15:19:05Z
format	Article
id	doaj.art-ca7fcbcd899f40b49a16688db23cb2b3
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T15:19:05Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ca7fcbcd899f40b49a16688db23cb2b32022-12-21T22:56:13ZengIEEEIEEE Access2169-35362020-01-01822071022072210.1109/ACCESS.2020.30425969281033CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific SimulationsYoung-Kyoon Suh0https://orcid.org/0000-0003-3124-2566Seounghyeon Kim1https://orcid.org/0000-0002-7910-7884Jeeyoung Kim2https://orcid.org/0000-0001-9380-948XSchool of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of KoreaSchool of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of KoreaSchool of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of KoreaEfficient scheduling among simultaneous simulation jobs is of critical importance in the allocation of limited computing and I/O resources. The difficulty of predicting when a job is completed can cause nontrivial problems for system administrators and users e.g., squandered resources, long waiting times, and simulation plan delays. To alleviate these problems, we propose a novel simulation runtime estimation scheme termed CLUTCH, which employs a well-orchestrated ensemble of clustering, classification, and regression techniques. The proposed scheme trains a runtime estimation model through a series of steps: (i) grouping past simulation provenance records by clustering, (ii) labeling each of the grouped records by classification, and (iii) performing regression on the execution times in each group. Given a simulation and its external arguments, the trained model predicts the simulation's runtime with high accuracy in a black box fashion, using only basic external arguments without needing extra information. We additionally propose two optimization algorithms which significantly reduce training overhead without sacrificing estimation quality. In the experiment with real datasets, our model achieved approximately a 14.2% growth in estimation accuracy, compared to the most recent state-of-the-art method; with our optimizations applied, the model was trained 16 times faster while still retaining accuracy.https://ieeexplore.ieee.org/document/9281033/Simulation runtime estimationensemble machine learningpre-processingsimulation provenanceclusteringclassification
spellingShingle	Young-Kyoon Suh Seounghyeon Kim Jeeyoung Kim CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations IEEE Access Simulation runtime estimation ensemble machine learning pre-processing simulation provenance clustering classification
title	CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations
title_full	CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations
title_fullStr	CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations
title_full_unstemmed	CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations
title_short	CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations
title_sort	clutch a clustering driven runtime estimation scheme for scientific simulations
topic	Simulation runtime estimation ensemble machine learning pre-processing simulation provenance clustering classification
url	https://ieeexplore.ieee.org/document/9281033/
work_keys_str_mv	AT youngkyoonsuh clutchaclusteringdrivenruntimeestimationschemeforscientificsimulations AT seounghyeonkim clutchaclusteringdrivenruntimeestimationschemeforscientificsimulations AT jeeyoungkim clutchaclusteringdrivenruntimeestimationschemeforscientificsimulations

CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Similar Items