Accelerating Irregular Applications with Pipeline Parallelism

Irregular applications have frequent data-dependent memory accesses and control flow. They arise in many emerging and important domains, including sparse deep learning, graph analytics, and database processing. Conventional architectures cannot handle irregular applications efficiently because their...

Ամբողջական նկարագրություն

Մատենագիտական մանրամասներ
Հիմնական հեղինակ:	Nguyen, Quan Minh
Այլ հեղինակներ:	Sanchez, Daniel
Ձևաչափ:	Թեզիս
Հրապարակվել է:	Massachusetts Institute of Technology 2022
Առցանց հասանելիություն:	https://hdl.handle.net/1721.1/144589

_version_	1826204120746295296
author	Nguyen, Quan Minh
author2	Sanchez, Daniel
author_facet	Sanchez, Daniel Nguyen, Quan Minh
author_sort	Nguyen, Quan Minh
collection	MIT
description	Irregular applications have frequent data-dependent memory accesses and control flow. They arise in many emerging and important domains, including sparse deep learning, graph analytics, and database processing. Conventional architectures cannot handle irregular applications efficiently because their techniques for improving performance, like exploiting instruction-level or data-level parallelism, are not tailored to them. Thus, continued progress in these crucial domains depends on exploring new avenues of parallelism. Fortunately, irregular applications contain abundant but untapped pipeline parallelism: they can be divided into networks of stages. Pipelining not only exposes parallelism but also enables decoupling, which hides the latency of long events by allowing producer stages to run ahead of consumer stages. To properly decouple these applications, though, this pipeline parallelism must be exploited at fine-grain, with few operations per stage. Prior work has proposed architectures, compilers, and languages, but focus on regular pipelines, and thus are unable to overcome several challenges of irregular applications. First, architectures need to support the efficient execution of many fine-grain pipeline stages. Second, such irregular pipelines suffer from load imbalance, as the amount of work in each stage varies rapidly as the program runs. Finally, these stages must communicate and coordinate changes in control flow. This thesis demonstrates that exploiting fine-grain pipeline parallelism in irregular applications is effective and practical. To this end, this thesis proposes two hardware architectures and a compiler: Pipette, the first architecture, reuses existing structures in modern out-of-order cores to implement load-balanced decoupled communication between stages; and Fifer, the second architecture, makes the acceleration benefits of coarse-grain reconfigurable arrays available to irregular applications. Pipette achieves gmean 1.9x speedup over a data-parallel implementation, and Fifer achieves up to 47x speedup over an out-of-order multicore while using considerably less area. Both architectures also further accelerate challenging memory accesses and resolve the load balancing and control flow challenges that are ubiquitous in irregular applications. Finally, Phloem is a compiler that makes it easy for programmers to use these architectures by producing high-performance pipeline-parallel implementations of irregular applications from serial code. Phloem automatically achieves 85% of the performance of manually pipelined versions.
first_indexed	2024-09-23T12:49:13Z
format	Thesis
id	mit-1721.1/144589
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T12:49:13Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1445892022-08-30T03:32:59Z Accelerating Irregular Applications with Pipeline Parallelism Nguyen, Quan Minh Sanchez, Daniel Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Irregular applications have frequent data-dependent memory accesses and control flow. They arise in many emerging and important domains, including sparse deep learning, graph analytics, and database processing. Conventional architectures cannot handle irregular applications efficiently because their techniques for improving performance, like exploiting instruction-level or data-level parallelism, are not tailored to them. Thus, continued progress in these crucial domains depends on exploring new avenues of parallelism. Fortunately, irregular applications contain abundant but untapped pipeline parallelism: they can be divided into networks of stages. Pipelining not only exposes parallelism but also enables decoupling, which hides the latency of long events by allowing producer stages to run ahead of consumer stages. To properly decouple these applications, though, this pipeline parallelism must be exploited at fine-grain, with few operations per stage. Prior work has proposed architectures, compilers, and languages, but focus on regular pipelines, and thus are unable to overcome several challenges of irregular applications. First, architectures need to support the efficient execution of many fine-grain pipeline stages. Second, such irregular pipelines suffer from load imbalance, as the amount of work in each stage varies rapidly as the program runs. Finally, these stages must communicate and coordinate changes in control flow. This thesis demonstrates that exploiting fine-grain pipeline parallelism in irregular applications is effective and practical. To this end, this thesis proposes two hardware architectures and a compiler: Pipette, the first architecture, reuses existing structures in modern out-of-order cores to implement load-balanced decoupled communication between stages; and Fifer, the second architecture, makes the acceleration benefits of coarse-grain reconfigurable arrays available to irregular applications. Pipette achieves gmean 1.9x speedup over a data-parallel implementation, and Fifer achieves up to 47x speedup over an out-of-order multicore while using considerably less area. Both architectures also further accelerate challenging memory accesses and resolve the load balancing and control flow challenges that are ubiquitous in irregular applications. Finally, Phloem is a compiler that makes it easy for programmers to use these architectures by producing high-performance pipeline-parallel implementations of irregular applications from serial code. Phloem automatically achieves 85% of the performance of manually pipelined versions. Ph.D. 2022-08-29T15:57:43Z 2022-08-29T15:57:43Z 2022-05 2022-06-21T19:15:27.606Z Thesis https://hdl.handle.net/1721.1/144589 0000-0002-3820-6421 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Nguyen, Quan Minh Accelerating Irregular Applications with Pipeline Parallelism
title	Accelerating Irregular Applications with Pipeline Parallelism
title_full	Accelerating Irregular Applications with Pipeline Parallelism
title_fullStr	Accelerating Irregular Applications with Pipeline Parallelism
title_full_unstemmed	Accelerating Irregular Applications with Pipeline Parallelism
title_short	Accelerating Irregular Applications with Pipeline Parallelism
title_sort	accelerating irregular applications with pipeline parallelism
url	https://hdl.handle.net/1721.1/144589
work_keys_str_mv	AT nguyenquanminh acceleratingirregularapplicationswithpipelineparallelism

Accelerating Irregular Applications with Pipeline Parallelism

Նմանատիպ նյութեր