Distributed Halide

Many image processing tasks are naturally expressed as a pipeline of small computational kernels known as stencils. Halide is a popular domain-specific language and compiler designed to implement image processing algorithms. Halide uses simple language constructs to express what to compute and a sep...

Full description

Bibliographic Details
Main Authors: Denniston, Tyler, Kamil, Shoaib, Amarasinghe, Saman P
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computing Machinery 2017
Online Access:http://hdl.handle.net/1721.1/110762
https://orcid.org/0000-0003-4400-8947
https://orcid.org/0000-0002-7231-7643
_version_ 1826215889656086528
author Denniston, Tyler
Kamil, Shoaib
Amarasinghe, Saman P
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Denniston, Tyler
Kamil, Shoaib
Amarasinghe, Saman P
author_sort Denniston, Tyler
collection MIT
description Many image processing tasks are naturally expressed as a pipeline of small computational kernels known as stencils. Halide is a popular domain-specific language and compiler designed to implement image processing algorithms. Halide uses simple language constructs to express what to compute and a separate scheduling co-language for expressing when and where to perform the computation. This approach has demonstrated performance comparable to or better than hand-optimized code. Until now, however, Halide has been restricted to parallel shared memory execution, limiting its performance for memory-bandwidth-bound pipelines or large-scale image processing tasks. We present an extension to Halide to support distributed-memory parallel execution of complex stencil pipelines. These extensions compose with the existing scheduling constructs in Halide, allowing expression of complex computation and communication strategies. Existing Halide applications can be distributed with minimal changes, allowing programmers to explore the tradeoff between recomputation and communication with little effort. Approximately 10 new of lines code are needed even for a 200 line, 99 stage application. On nine image processing benchmarks, our extensions give up to a 1.4× speedup on a single node over regular multithreaded execution with the same number of cores, by mitigating the effects of non-uniform memory access. The distributed benchmarks achieve up to 18× speedup on a 16 node testing machine and up to 57× speedup on 64 nodes of the NERSC Cori supercomputer.
first_indexed 2024-09-23T16:38:39Z
format Article
id mit-1721.1/110762
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:38:39Z
publishDate 2017
publisher Association for Computing Machinery
record_format dspace
spelling mit-1721.1/1107622022-09-29T20:33:07Z Distributed Halide Denniston, Tyler Kamil, Shoaib Amarasinghe, Saman P Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Denniston, Tyler Kamil, Shoaib Amarasinghe, Saman P Many image processing tasks are naturally expressed as a pipeline of small computational kernels known as stencils. Halide is a popular domain-specific language and compiler designed to implement image processing algorithms. Halide uses simple language constructs to express what to compute and a separate scheduling co-language for expressing when and where to perform the computation. This approach has demonstrated performance comparable to or better than hand-optimized code. Until now, however, Halide has been restricted to parallel shared memory execution, limiting its performance for memory-bandwidth-bound pipelines or large-scale image processing tasks. We present an extension to Halide to support distributed-memory parallel execution of complex stencil pipelines. These extensions compose with the existing scheduling constructs in Halide, allowing expression of complex computation and communication strategies. Existing Halide applications can be distributed with minimal changes, allowing programmers to explore the tradeoff between recomputation and communication with little effort. Approximately 10 new of lines code are needed even for a 200 line, 99 stage application. On nine image processing benchmarks, our extensions give up to a 1.4× speedup on a single node over regular multithreaded execution with the same number of cores, by mitigating the effects of non-uniform memory access. The distributed benchmarks achieve up to 18× speedup on a 16 node testing machine and up to 57× speedup on 64 nodes of the NERSC Cori supercomputer. United States. Department of Energy (award DE-SC0005288) United States. Department of Energy (award DE-SC0008923) National Science Foundation (U.S.) (XPS-1533753) 2017-07-18T16:00:20Z 2017-07-18T16:00:20Z 2016-08 Article http://purl.org/eprint/type/ConferencePaper 9781450340922 http://hdl.handle.net/1721.1/110762 Denniston, Tyler, Shoaib Kamil, and Saman Amarasinghe. “Distributed Halide.” Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP ’16 (2016). https://orcid.org/0000-0003-4400-8947 https://orcid.org/0000-0002-7231-7643 en_US http://dx.doi.org/10.1145/2851141.2851157 Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16 Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Association for Computing Machinery ACM
spellingShingle Denniston, Tyler
Kamil, Shoaib
Amarasinghe, Saman P
Distributed Halide
title Distributed Halide
title_full Distributed Halide
title_fullStr Distributed Halide
title_full_unstemmed Distributed Halide
title_short Distributed Halide
title_sort distributed halide
url http://hdl.handle.net/1721.1/110762
https://orcid.org/0000-0003-4400-8947
https://orcid.org/0000-0002-7231-7643
work_keys_str_mv AT dennistontyler distributedhalide
AT kamilshoaib distributedhalide
AT amarasinghesamanp distributedhalide