Decoupling algorithms from the organization of computation for high performance image processing

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.

Bibliographic Details
Main Author: Ragan-Kelley, Jonathan Millard
Other Authors: Frédo Durand and Saman Amarashinghe.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1721.1/89996
_version_ 1811080675304931328
author Ragan-Kelley, Jonathan Millard
author2 Frédo Durand and Saman Amarashinghe.
author_facet Frédo Durand and Saman Amarashinghe.
Ragan-Kelley, Jonathan Millard
author_sort Ragan-Kelley, Jonathan Millard
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
first_indexed 2024-09-23T11:34:59Z
format Thesis
id mit-1721.1/89996
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T11:34:59Z
publishDate 2014
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/899962019-04-12T15:24:16Z Decoupling algorithms from the organization of computation for high performance image processing Ragan-Kelley, Jonathan Millard Frédo Durand and Saman Amarashinghe. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. Cataloged from PDF version of thesis. "June 2014." Includes bibliographical references (pages 127-133). Future graphics and imaging applications-from self-driving cards, to 4D light field cameras, to pervasive sensing-demand orders of magnitude more computation than we currently have. This thesis argues that the efficiency and performance of an application are determined not only by the algorithm and the hardware architecture on which it runs, but critically also by the organization of computations and data on that architecture. Real graphics and imaging applications appear embarrassingly parallel, but have complex dependencies, and are limited by locality (the distance over which data has to move, e.g., from nearby caches or far away main memory) and synchronization. Increasingly, the cost of communication-both within a chip and over a network-dominates computation and power consumption, and limits the gains realized from shrinking transistors. Driven by these trends, writing high-performance processing code is challenging because it requires global reorganization of computations and data, not simply local optimization of an inner loop. Existing programming languages make it difficult for clear and composable code to express optimized organizations because they conflate the intrinsic algorithms being defined with their organization. To address the challenge of productively building efficient, high-performance programs, this thesis presents the Halide language and compiler for image processing. Halide explicitly separates what computations define an algorithm from the choices of execution structure which determine parallelism, locality, memory footprint, and synchronization. For image processing algorithms with the same complexity-even the exact same set of arithmetic operations and data-executing on the same hardware, the order and granularity of execution and placement of data can easily change performance by an order of magnitude because of locality and parallelism. I will show that, for data-parallel pipelines common in graphics, imaging, and other data-intensive applications, the organization of computations and data for a given algorithm is constrained by a fundamental tension between parallelism, locality, and redundant computation of shared values. I will present a systematic model of "schedules" which explicitly trade off these pressures by globally reorganizing the computations and data for an entire pipeline, and an optimizing compiler that synthesizes high performance implementations from a Halide algorithm and a schedule. The end result is much simpler programs, delivering performance often many times faster than the best prior hand-tuned C, assembly, and CUDA implementations, while scaling across radically different architectures, from ARM mobile processors to massively parallel GPUs. by Jonathan Ragan-Kelley. Ph. D. 2014-09-19T21:33:21Z 2014-09-19T21:33:21Z 2014 Thesis http://hdl.handle.net/1721.1/89996 890132208 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 179 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Ragan-Kelley, Jonathan Millard
Decoupling algorithms from the organization of computation for high performance image processing
title Decoupling algorithms from the organization of computation for high performance image processing
title_full Decoupling algorithms from the organization of computation for high performance image processing
title_fullStr Decoupling algorithms from the organization of computation for high performance image processing
title_full_unstemmed Decoupling algorithms from the organization of computation for high performance image processing
title_short Decoupling algorithms from the organization of computation for high performance image processing
title_sort decoupling algorithms from the organization of computation for high performance image processing
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/89996
work_keys_str_mv AT ragankelleyjonathanmillard decouplingalgorithmsfromtheorganizationofcomputationforhighperformanceimageprocessing