Congestion Control in Machine Learning Clusters

This paper argues that fair-sharing, the holy grail of congestion control algorithms for decades, is not necessarily a desirable property in Machine Learning (ML) training clusters. We demonstrate that for a specific combination of jobs, introducing unfairness improves the training time for all comp...

ver descrição completa

Detalhes bibliográficos
Autor principal: Rajasekaran, Sudarsanan
Outros Autores: Ghobadi, Manya
Formato: Tese
Publicado em: Massachusetts Institute of Technology 2024
Acesso em linha:https://hdl.handle.net/1721.1/156313

Registros relacionados