Accelerating Irregular Applications with Pipeline Parallelism
Irregular applications have frequent data-dependent memory accesses and control flow. They arise in many emerging and important domains, including sparse deep learning, graph analytics, and database processing. Conventional architectures cannot handle irregular applications efficiently because their...
Հիմնական հեղինակ: | |
---|---|
Այլ հեղինակներ: | |
Ձևաչափ: | Թեզիս |
Հրապարակվել է: |
Massachusetts Institute of Technology
2022
|
Առցանց հասանելիություն: | https://hdl.handle.net/1721.1/144589 |
_version_ | 1826204120746295296 |
---|---|
author | Nguyen, Quan Minh |
author2 | Sanchez, Daniel |
author_facet | Sanchez, Daniel Nguyen, Quan Minh |
author_sort | Nguyen, Quan Minh |
collection | MIT |
description | Irregular applications have frequent data-dependent memory accesses and control flow. They arise in many emerging and important domains, including sparse deep learning, graph analytics, and database processing. Conventional architectures cannot handle irregular applications efficiently because their techniques for improving performance, like exploiting instruction-level or data-level parallelism, are not tailored to them. Thus, continued progress in these crucial domains depends on exploring new avenues of parallelism.
Fortunately, irregular applications contain abundant but untapped pipeline parallelism: they can be divided into networks of stages. Pipelining not only exposes parallelism but also enables decoupling, which hides the latency of long events by allowing producer stages to run ahead of consumer stages. To properly decouple these applications, though, this pipeline parallelism must be exploited at fine-grain, with few operations per stage. Prior work has proposed architectures, compilers, and languages, but focus on regular pipelines, and thus are unable to overcome several challenges of irregular applications. First, architectures need to support the efficient execution of many fine-grain pipeline stages. Second, such irregular pipelines suffer from load imbalance, as the amount of work in each stage varies rapidly as the program runs. Finally, these stages must communicate and coordinate changes in control flow.
This thesis demonstrates that exploiting fine-grain pipeline parallelism in irregular applications is effective and practical. To this end, this thesis proposes two hardware architectures and a compiler: Pipette, the first architecture, reuses existing structures in modern out-of-order cores to implement load-balanced decoupled communication between stages; and Fifer, the second architecture, makes the acceleration benefits of coarse-grain reconfigurable arrays available to irregular applications. Pipette achieves gmean 1.9x speedup over a data-parallel implementation, and Fifer achieves up to 47x speedup over an out-of-order multicore while using considerably less area. Both architectures also further accelerate challenging memory accesses and resolve the load balancing and control flow challenges that are ubiquitous in irregular applications. Finally, Phloem is a compiler that makes it easy for programmers to use these architectures by producing high-performance pipeline-parallel implementations of irregular applications from serial code. Phloem automatically achieves 85% of the performance of manually pipelined versions. |
first_indexed | 2024-09-23T12:49:13Z |
format | Thesis |
id | mit-1721.1/144589 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T12:49:13Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1445892022-08-30T03:32:59Z Accelerating Irregular Applications with Pipeline Parallelism Nguyen, Quan Minh Sanchez, Daniel Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Irregular applications have frequent data-dependent memory accesses and control flow. They arise in many emerging and important domains, including sparse deep learning, graph analytics, and database processing. Conventional architectures cannot handle irregular applications efficiently because their techniques for improving performance, like exploiting instruction-level or data-level parallelism, are not tailored to them. Thus, continued progress in these crucial domains depends on exploring new avenues of parallelism. Fortunately, irregular applications contain abundant but untapped pipeline parallelism: they can be divided into networks of stages. Pipelining not only exposes parallelism but also enables decoupling, which hides the latency of long events by allowing producer stages to run ahead of consumer stages. To properly decouple these applications, though, this pipeline parallelism must be exploited at fine-grain, with few operations per stage. Prior work has proposed architectures, compilers, and languages, but focus on regular pipelines, and thus are unable to overcome several challenges of irregular applications. First, architectures need to support the efficient execution of many fine-grain pipeline stages. Second, such irregular pipelines suffer from load imbalance, as the amount of work in each stage varies rapidly as the program runs. Finally, these stages must communicate and coordinate changes in control flow. This thesis demonstrates that exploiting fine-grain pipeline parallelism in irregular applications is effective and practical. To this end, this thesis proposes two hardware architectures and a compiler: Pipette, the first architecture, reuses existing structures in modern out-of-order cores to implement load-balanced decoupled communication between stages; and Fifer, the second architecture, makes the acceleration benefits of coarse-grain reconfigurable arrays available to irregular applications. Pipette achieves gmean 1.9x speedup over a data-parallel implementation, and Fifer achieves up to 47x speedup over an out-of-order multicore while using considerably less area. Both architectures also further accelerate challenging memory accesses and resolve the load balancing and control flow challenges that are ubiquitous in irregular applications. Finally, Phloem is a compiler that makes it easy for programmers to use these architectures by producing high-performance pipeline-parallel implementations of irregular applications from serial code. Phloem automatically achieves 85% of the performance of manually pipelined versions. Ph.D. 2022-08-29T15:57:43Z 2022-08-29T15:57:43Z 2022-05 2022-06-21T19:15:27.606Z Thesis https://hdl.handle.net/1721.1/144589 0000-0002-3820-6421 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Nguyen, Quan Minh Accelerating Irregular Applications with Pipeline Parallelism |
title | Accelerating Irregular Applications with Pipeline Parallelism |
title_full | Accelerating Irregular Applications with Pipeline Parallelism |
title_fullStr | Accelerating Irregular Applications with Pipeline Parallelism |
title_full_unstemmed | Accelerating Irregular Applications with Pipeline Parallelism |
title_short | Accelerating Irregular Applications with Pipeline Parallelism |
title_sort | accelerating irregular applications with pipeline parallelism |
url | https://hdl.handle.net/1721.1/144589 |
work_keys_str_mv | AT nguyenquanminh acceleratingirregularapplicationswithpipelineparallelism |