Monkey: A Distributed Orchestrator for a Virtual Pseudo-Homogenous Computational Cluster Consisting of Heterogeneous Sources

As machine learning research becomes increasingly ubiquitous, novel algorithms and state-of-the-art models are progressing to an advanced state with considerably more complex and involved procedures. That is, to achieve groundbreaking results in such a climate, a researcher increasingly depends upon...

詳細記述

書誌詳細
第一著者: Stallone, Matthew J.
その他の著者: Agrawal, Pulkit
フォーマット: 学位論文
出版事項: Massachusetts Institute of Technology 2022
オンライン・アクセス:https://hdl.handle.net/1721.1/144903
その他の書誌記述
要約:As machine learning research becomes increasingly ubiquitous, novel algorithms and state-of-the-art models are progressing to an advanced state with considerably more complex and involved procedures. That is, to achieve groundbreaking results in such a climate, a researcher increasingly depends upon immense computational requisites to develop, train, and evaluate such algorithms. As a result, research labs are faced with the challenge of providing ample computational resources, and researchers are detracted from their core research in order to design, code, and configure experiments for the disparate computational resources provided. The framework proposed herein, therefore, strives to bridge the gaps between research labs, researchers, and computational resources by abstracting and automating the standard process of designing, training, and evaluating an algorithm. This framework, built upon the preexisting Monkey framework, will provide a fault-tolerant, decentralized system that is capable of scheduling and reproducing research training jobs. The framework maintains a virtual pseudo-homogenous cluster built on top of existing heterogeneous computational clusters. Moreover, the framework, designed to be flexible and cost-effective, also prioritizes user accessibility by providing access to an integrated machine learning toolkit with hyperparameter optimizers and a visualization dashboard.