H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Deep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more comput...

Full description

Bibliographic Details
Main Authors:	Lintao Xian, Bingzhe Li, Jing Liu, Zhongwen Guo, David H. C. Du
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter
Online Access:	https://ieeexplore.ieee.org/document/9356607/

_version_	1818736211392462848
author	Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du
author_facet	Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du
author_sort	Lintao Xian
collection	DOAJ
description	Deep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more computation resources. To speed up the training process, a centralized distributed training structure named Parameter Server (PS) is widely used to assign training tasks to different workers/nodes. Most existing studies considered all workers having the same computation resources. However, in a heterogeneous environment, fast workers (i.e., workers having more computation resources) can complete tasks earlier than slow workers and thus the system does not fully utilize the resources of fast workers. In this paper, we propose a PS model with heterogeneous types of workers/nodes, called H-PS, which can fully utilize the resources of each worker by dynamically scheduling tasks based on the current status of the workers (e.g., available memory). By doing so, the workers will complete their tasks at the same time and the stragglers (i.e., workers fall behind others) can be avoided. In addition, a pipeline scheme is proposed to further improve the effectiveness of workers by fully utilizing the resources of workers during the time of parameters transmitting between PS and workers. Moreover, a flexible quantization scheme is proposed to reduce the communication overhead between the PS and workers. Finally, the H-PS is implemented using Containers which is an emerging lightweight technology. The experimental results indicate that the proposed H-PS can reduce the overall training time by 1.4x – 3.5x when compared with existing methods.
first_indexed	2024-12-18T00:33:33Z
format	Article
id	doaj.art-f4fe240ef8134af6bcf1ea9587e06172
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-18T00:33:33Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-f4fe240ef8134af6bcf1ea9587e061722022-12-21T21:27:04ZengIEEEIEEE Access2169-35362021-01-019440494405810.1109/ACCESS.2021.30601549356607H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network TrainingLintao Xian0https://orcid.org/0000-0003-1289-7647Bingzhe Li1Jing Liu2Zhongwen Guo3David H. C. Du4Department of Computer Science and Technology, Ocean University of China, Qingdao, ChinaSchool of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK, USACollege of Science and Information, Qingdao Agricultural University, Qingdao, ChinaDepartment of Computer Science and Technology, Ocean University of China, Qingdao, ChinaCollege of science and engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USADeep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more computation resources. To speed up the training process, a centralized distributed training structure named Parameter Server (PS) is widely used to assign training tasks to different workers/nodes. Most existing studies considered all workers having the same computation resources. However, in a heterogeneous environment, fast workers (i.e., workers having more computation resources) can complete tasks earlier than slow workers and thus the system does not fully utilize the resources of fast workers. In this paper, we propose a PS model with heterogeneous types of workers/nodes, called H-PS, which can fully utilize the resources of each worker by dynamically scheduling tasks based on the current status of the workers (e.g., available memory). By doing so, the workers will complete their tasks at the same time and the stragglers (i.e., workers fall behind others) can be avoided. In addition, a pipeline scheme is proposed to further improve the effectiveness of workers by fully utilizing the resources of workers during the time of parameters transmitting between PS and workers. Moreover, a flexible quantization scheme is proposed to reduce the communication overhead between the PS and workers. Finally, the H-PS is implemented using Containers which is an emerging lightweight technology. The experimental results indicate that the proposed H-PS can reduce the overall training time by 1.4x – 3.5x when compared with existing methods.https://ieeexplore.ieee.org/document/9356607/Distributed machine learning (DML)heterogeneous environmentsdynamically scheduling taskspipeline communication and computationdynamic quantization parameter
spellingShingle	Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training IEEE Access Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter
title	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_full	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_fullStr	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_full_unstemmed	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_short	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_sort	h ps a heterogeneous aware parameter server with distributed neural network training
topic	Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter
url	https://ieeexplore.ieee.org/document/9356607/
work_keys_str_mv	AT lintaoxian hpsaheterogeneousawareparameterserverwithdistributedneuralnetworktraining AT bingzheli hpsaheterogeneousawareparameterserverwithdistributedneuralnetworktraining AT jingliu hpsaheterogeneousawareparameterserverwithdistributedneuralnetworktraining AT zhongwenguo hpsaheterogeneousawareparameterserverwithdistributedneuralnetworktraining AT davidhcdu hpsaheterogeneousawareparameterserverwithdistributedneuralnetworktraining

H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Similar Items