GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each st...

Full description

Bibliographic Details
Main Author:	Kim, Sunghyun
Other Authors:	Alizadeh, Mohammad
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/156292

_version_	1811072182243033088
author	Kim, Sunghyun
author2	Alizadeh, Mohammad
author_facet	Alizadeh, Mohammad Kim, Sunghyun
author_sort	Kim, Sunghyun
collection	MIT
description	Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each stage a distinct device. Multiple devices, each storing a partial segment of the DNN, perform their respective operations in sequence to train the whole. Applying pipeline-parallel techniques makes it feasible to train large-scale DNNs, yet there is still room for improvement. Existing approaches only consider sequential pipeline stages and thus ignore the inherent topology of a DNN to train. For example, when the architecture of a DNN has computationally-independent parallel branches, serial execution of them mandated by sequential pipeline stages unnecessarily lengthens the processing time of training data. This shortcoming leaves model-parallel opportunities untapped, resulting in suboptimal training throughput. In this paper, we develop graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes current sequential pipeline stages. By constructing the pipeline based on the DNN topology, GPP enables concurrent execution of computationally independent DNN segments. GPP then optimizes micro-batch schedules for these stages, and parallelizes large-scale DNN training across multiple devices. We show that GPP achieves reduced memory consumption and improved training throughput. We also develop GraphPipe, a distributed system that leverages GPP strategies to enable performant and scalable DNN training. Evaluation on a variety of DNNs demonstrates that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. Despite the fact that GPP involves a much larger search space of parallelization strategies, GraphPipe reduces the search time by 9–21× compared to PipeDream and Piper.
first_indexed	2024-09-23T09:01:55Z
format	Thesis
id	mit-1721.1/156292
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T09:01:55Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1562922024-08-22T03:57:08Z GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism Kim, Sunghyun Alizadeh, Mohammad Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each stage a distinct device. Multiple devices, each storing a partial segment of the DNN, perform their respective operations in sequence to train the whole. Applying pipeline-parallel techniques makes it feasible to train large-scale DNNs, yet there is still room for improvement. Existing approaches only consider sequential pipeline stages and thus ignore the inherent topology of a DNN to train. For example, when the architecture of a DNN has computationally-independent parallel branches, serial execution of them mandated by sequential pipeline stages unnecessarily lengthens the processing time of training data. This shortcoming leaves model-parallel opportunities untapped, resulting in suboptimal training throughput. In this paper, we develop graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes current sequential pipeline stages. By constructing the pipeline based on the DNN topology, GPP enables concurrent execution of computationally independent DNN segments. GPP then optimizes micro-batch schedules for these stages, and parallelizes large-scale DNN training across multiple devices. We show that GPP achieves reduced memory consumption and improved training throughput. We also develop GraphPipe, a distributed system that leverages GPP strategies to enable performant and scalable DNN training. Evaluation on a variety of DNNs demonstrates that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. Despite the fact that GPP involves a much larger search space of parallelization strategies, GraphPipe reduces the search time by 9–21× compared to PipeDream and Piper. S.M. 2024-08-21T18:54:28Z 2024-08-21T18:54:28Z 2024-05 2024-07-10T12:59:41.829Z Thesis https://hdl.handle.net/1721.1/156292 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Kim, Sunghyun GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title	GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_full	GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_fullStr	GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_full_unstemmed	GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_short	GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_sort	graphpipe improving the performance and scalability of dnn training with graph pipeline parallelism
url	https://hdl.handle.net/1721.1/156292
work_keys_str_mv	AT kimsunghyun graphpipeimprovingtheperformanceandscalabilityofdnntrainingwithgraphpipelineparallelism

GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Similar Items