要約: | As machine learning research becomes increasingly ubiquitous, novel algorithms and state-of-the-art models are progressing to an advanced state with considerably more complex and involved procedures. That is, to achieve groundbreaking results in such a climate, a researcher increasingly depends upon immense computational requisites to develop, train, and evaluate such algorithms. As a result, research labs are faced with the challenge of providing ample computational resources, and researchers are detracted from their core research in order to design, code, and configure experiments for the disparate computational resources provided.
The framework proposed herein, therefore, strives to bridge the gaps between research labs, researchers, and computational resources by abstracting and automating the standard process of designing, training, and evaluating an algorithm. This framework, built upon the preexisting Monkey framework, will provide a fault-tolerant, decentralized system that is capable of scheduling and reproducing research training jobs. The framework maintains a virtual pseudo-homogenous cluster built on top of existing heterogeneous computational clusters. Moreover, the framework, designed to be flexible and cost-effective, also prioritizes user accessibility by providing access to an integrated machine learning toolkit with hyperparameter optimizers and a visualization dashboard.
|