GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each st...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156292 |
_version_ | 1811072182243033088 |
---|---|
author | Kim, Sunghyun |
author2 | Alizadeh, Mohammad |
author_facet | Alizadeh, Mohammad Kim, Sunghyun |
author_sort | Kim, Sunghyun |
collection | MIT |
description | Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each stage a distinct device. Multiple devices, each storing a partial segment of the DNN, perform their respective operations in sequence to train the whole. Applying pipeline-parallel techniques makes it feasible to train large-scale DNNs, yet there is still room for improvement. Existing approaches only consider sequential pipeline stages and thus ignore the inherent topology of a DNN to train. For example, when the architecture of a DNN has computationally-independent parallel branches, serial execution of them mandated by sequential pipeline stages unnecessarily lengthens the processing time of training data. This shortcoming leaves model-parallel opportunities untapped, resulting in suboptimal training throughput. In this paper, we develop graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes current sequential pipeline stages. By constructing the pipeline based on the DNN topology, GPP enables concurrent execution of computationally independent DNN segments. GPP then optimizes micro-batch schedules for these stages, and parallelizes large-scale DNN training across multiple devices. We show that GPP achieves reduced memory consumption and improved training throughput. We also develop GraphPipe, a distributed system that leverages GPP strategies to enable performant and scalable DNN training. Evaluation on a variety of DNNs demonstrates that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. Despite the fact that GPP involves a much larger search space of parallelization strategies, GraphPipe reduces the search time by 9–21× compared to PipeDream and Piper. |
first_indexed | 2024-09-23T09:01:55Z |
format | Thesis |
id | mit-1721.1/156292 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:01:55Z |
publishDate | 2024 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1562922024-08-22T03:57:08Z GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism Kim, Sunghyun Alizadeh, Mohammad Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Deep neural networks (DNNs) continue to grow rapidly in size, thus it is infeasible to train them on a single device. To address this challenge, current DNN training systems apply pipeline-parallel techniques. They split a DNN into multiple stages, construct a pipeline of them, and assign to each stage a distinct device. Multiple devices, each storing a partial segment of the DNN, perform their respective operations in sequence to train the whole. Applying pipeline-parallel techniques makes it feasible to train large-scale DNNs, yet there is still room for improvement. Existing approaches only consider sequential pipeline stages and thus ignore the inherent topology of a DNN to train. For example, when the architecture of a DNN has computationally-independent parallel branches, serial execution of them mandated by sequential pipeline stages unnecessarily lengthens the processing time of training data. This shortcoming leaves model-parallel opportunities untapped, resulting in suboptimal training throughput. In this paper, we develop graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes current sequential pipeline stages. By constructing the pipeline based on the DNN topology, GPP enables concurrent execution of computationally independent DNN segments. GPP then optimizes micro-batch schedules for these stages, and parallelizes large-scale DNN training across multiple devices. We show that GPP achieves reduced memory consumption and improved training throughput. We also develop GraphPipe, a distributed system that leverages GPP strategies to enable performant and scalable DNN training. Evaluation on a variety of DNNs demonstrates that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. Despite the fact that GPP involves a much larger search space of parallelization strategies, GraphPipe reduces the search time by 9–21× compared to PipeDream and Piper. S.M. 2024-08-21T18:54:28Z 2024-08-21T18:54:28Z 2024-05 2024-07-10T12:59:41.829Z Thesis https://hdl.handle.net/1721.1/156292 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Kim, Sunghyun GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism |
title | GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism |
title_full | GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism |
title_fullStr | GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism |
title_full_unstemmed | GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism |
title_short | GraphPipe: Improving the Performance and Scalability of DNN Training with Graph Pipeline Parallelism |
title_sort | graphpipe improving the performance and scalability of dnn training with graph pipeline parallelism |
url | https://hdl.handle.net/1721.1/156292 |
work_keys_str_mv | AT kimsunghyun graphpipeimprovingtheperformanceandscalabilityofdnntrainingwithgraphpipelineparallelism |