GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each st...

Full description

Bibliographic Details
Main Author: Kim, Sunghyun
Other Authors: Alizadeh, Mohammad
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156292
_version_ 1811072182243033088
author Kim, Sunghyun
author2 Alizadeh, Mohammad
author_facet Alizadeh, Mohammad
Kim, Sunghyun
author_sort Kim, Sunghyun
collection MIT
description Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each stage a distinct device. Multiple devices, each storing a partial segment of the DNN, perform their respective operations in sequence to train the whole. Applying pipeline-parallel techniques makes it feasible to train large-scale DNNs, yet there is still room for improvement. Existing approaches only consider sequential pipeline stages and thus ignore the inherent topology of a DNN to train. For example, when the architecture of a DNN has computationally-independent parallel branches, serial execution of them mandated by sequential pipeline stages unnecessarily lengthens the processing time of training data. This shortcoming leaves model-parallel opportunities untapped, resulting in suboptimal training throughput. In this paper, we develop graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes current sequential pipeline stages. By constructing the pipeline based on the DNN topology, GPP enables concurrent execution of computationally independent DNN segments. GPP then optimizes micro-batch schedules for these stages, and parallelizes large-scale DNN training across multiple devices. We show that GPP achieves reduced memory consumption and improved training throughput. We also develop GraphPipe, a distributed system that leverages GPP strategies to enable performant and scalable DNN training. Evaluation on a variety of DNNs demonstrates that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. Despite the fact that GPP involves a much larger search space of parallelization strategies, GraphPipe reduces the search time by 9–21× compared to PipeDream and Piper.
first_indexed 2024-09-23T09:01:55Z
format Thesis
id mit-1721.1/156292
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:01:55Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1562922024-08-22T03:57:08Z GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism Kim, Sunghyun Alizadeh, Mohammad Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each stage a distinct device. Multiple devices, each storing a partial segment of the DNN, perform their respective operations in sequence to train the whole. Applying pipeline-parallel techniques makes it feasible to train large-scale DNNs, yet there is still room for improvement. Existing approaches only consider sequential pipeline stages and thus ignore the inherent topology of a DNN to train. For example, when the architecture of a DNN has computationally-independent parallel branches, serial execution of them mandated by sequential pipeline stages unnecessarily lengthens the processing time of training data. This shortcoming leaves model-parallel opportunities untapped, resulting in suboptimal training throughput. In this paper, we develop graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes current sequential pipeline stages. By constructing the pipeline based on the DNN topology, GPP enables concurrent execution of computationally independent DNN segments. GPP then optimizes micro-batch schedules for these stages, and parallelizes large-scale DNN training across multiple devices. We show that GPP achieves reduced memory consumption and improved training throughput. We also develop GraphPipe, a distributed system that leverages GPP strategies to enable performant and scalable DNN training. Evaluation on a variety of DNNs demonstrates that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. Despite the fact that GPP involves a much larger search space of parallelization strategies, GraphPipe reduces the search time by 9–21× compared to PipeDream and Piper. S.M. 2024-08-21T18:54:28Z 2024-08-21T18:54:28Z 2024-05 2024-07-10T12:59:41.829Z Thesis https://hdl.handle.net/1721.1/156292 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Kim, Sunghyun
GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_full GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_fullStr GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_full_unstemmed GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_short GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
title_sort graphpipe improving the performance and scalability of dnn training with graph pipeline parallelism
url https://hdl.handle.net/1721.1/156292
work_keys_str_mv AT kimsunghyun graphpipeimprovingtheperformanceandscalabilityofdnntrainingwithgraphpipelineparallelism