A Parallel Butterfly Algorithm
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simpl...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Society for Industrial and Applied Mathematics
2014
|
Online Access: | http://hdl.handle.net/1721.1/88176 https://orcid.org/0000-0001-7052-5097 |
_version_ | 1826206196415070208 |
---|---|
author | Poulson, Jack Demanet, Laurent Maxwell, Nicholas Ying, Lexing |
author2 | Massachusetts Institute of Technology. Department of Mathematics |
author_facet | Massachusetts Institute of Technology. Department of Mathematics Poulson, Jack Demanet, Laurent Maxwell, Nicholas Ying, Lexing |
author_sort | Poulson, Jack |
collection | MIT |
description | The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. |
first_indexed | 2024-09-23T13:25:37Z |
format | Article |
id | mit-1721.1/88176 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T13:25:37Z |
publishDate | 2014 |
publisher | Society for Industrial and Applied Mathematics |
record_format | dspace |
spelling | mit-1721.1/881762022-10-01T15:13:18Z A Parallel Butterfly Algorithm Poulson, Jack Demanet, Laurent Maxwell, Nicholas Ying, Lexing Massachusetts Institute of Technology. Department of Mathematics Demanet, Laurent The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. National Science Foundation (U.S.) (NSF CAREER grant 0846501) United States. Dept. of Energy (DOE grant DE-SC0009409) King Abdullah University of Science and Technology 2014-07-01T21:10:24Z 2014-07-01T21:10:24Z 2014-02 Article http://purl.org/eprint/type/JournalArticle 1064-8275 1095-7197 http://hdl.handle.net/1721.1/88176 Poulson, Jack, Laurent Demanet, Nicholas Maxwell, and Lexing Ying. “A Parallel Butterfly Algorithm.” SIAM Journal on Scientific Computing 36, no. 1 (February 4, 2014): C49–C65.© 2014, Society for Industrial and Applied Mathematics. https://orcid.org/0000-0001-7052-5097 en_US http://dx.doi.org/10.1137/130921544 SIAM Journal on Scientific Computing Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Society for Industrial and Applied Mathematics Society for Industrial and Applied Mathematics |
spellingShingle | Poulson, Jack Demanet, Laurent Maxwell, Nicholas Ying, Lexing A Parallel Butterfly Algorithm |
title | A Parallel Butterfly Algorithm |
title_full | A Parallel Butterfly Algorithm |
title_fullStr | A Parallel Butterfly Algorithm |
title_full_unstemmed | A Parallel Butterfly Algorithm |
title_short | A Parallel Butterfly Algorithm |
title_sort | parallel butterfly algorithm |
url | http://hdl.handle.net/1721.1/88176 https://orcid.org/0000-0001-7052-5097 |
work_keys_str_mv | AT poulsonjack aparallelbutterflyalgorithm AT demanetlaurent aparallelbutterflyalgorithm AT maxwellnicholas aparallelbutterflyalgorithm AT yinglexing aparallelbutterflyalgorithm AT poulsonjack parallelbutterflyalgorithm AT demanetlaurent parallelbutterflyalgorithm AT maxwellnicholas parallelbutterflyalgorithm AT yinglexing parallelbutterflyalgorithm |