A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce

Abstract Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a grouping AllReduce method b...

Full description

Bibliographic Details
Main Authors: Guozheng Wang, Yongmei Lei, Zeyu Zhang, Cunlu Peng
Format: Article
Language:English
Published: SpringerOpen 2023-01-01
Series:Data Science and Engineering
Subjects:
Online Access:https://doi.org/10.1007/s41019-022-00202-7