Zeus: interpretable ML-based job scheduling in GPU datacentres
Hardware accelerators such as GPUs are essential for the development of Deep Learning (DL) models - as their training process is compute-intensive. A growing number of organisations have employed expensive multi-tenant GPU clusters to run distributed DL training jobs. Efficient job schedulers are re...
Tác giả chính: | |
---|---|
Tác giả khác: | |
Định dạng: | Final Year Project (FYP) |
Ngôn ngữ: | English |
Được phát hành: |
Nanyang Technological University
2022
|
Những chủ đề: | |
Truy cập trực tuyến: | https://hdl.handle.net/10356/156566 |