A Parallel Butterfly Algorithm

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simpl...

Full description

Bibliographic Details
Main Authors: Poulson, Jack, Demanet, Laurent, Maxwell, Nicholas, Ying, Lexing
Other Authors: Massachusetts Institute of Technology. Department of Mathematics
Format: Article
Language:en_US
Published: Society for Industrial and Applied Mathematics 2014
Online Access:http://hdl.handle.net/1721.1/88176
https://orcid.org/0000-0001-7052-5097
_version_ 1826206196415070208
author Poulson, Jack
Demanet, Laurent
Maxwell, Nicholas
Ying, Lexing
author2 Massachusetts Institute of Technology. Department of Mathematics
author_facet Massachusetts Institute of Technology. Department of Mathematics
Poulson, Jack
Demanet, Laurent
Maxwell, Nicholas
Ying, Lexing
author_sort Poulson, Jack
collection MIT
description The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.
first_indexed 2024-09-23T13:25:37Z
format Article
id mit-1721.1/88176
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:25:37Z
publishDate 2014
publisher Society for Industrial and Applied Mathematics
record_format dspace
spelling mit-1721.1/881762022-10-01T15:13:18Z A Parallel Butterfly Algorithm Poulson, Jack Demanet, Laurent Maxwell, Nicholas Ying, Lexing Massachusetts Institute of Technology. Department of Mathematics Demanet, Laurent The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. National Science Foundation (U.S.) (NSF CAREER grant 0846501) United States. Dept. of Energy (DOE grant DE-SC0009409) King Abdullah University of Science and Technology 2014-07-01T21:10:24Z 2014-07-01T21:10:24Z 2014-02 Article http://purl.org/eprint/type/JournalArticle 1064-8275 1095-7197 http://hdl.handle.net/1721.1/88176 Poulson, Jack, Laurent Demanet, Nicholas Maxwell, and Lexing Ying. “A Parallel Butterfly Algorithm.” SIAM Journal on Scientific Computing 36, no. 1 (February 4, 2014): C49–C65.© 2014, Society for Industrial and Applied Mathematics. https://orcid.org/0000-0001-7052-5097 en_US http://dx.doi.org/10.1137/130921544 SIAM Journal on Scientific Computing Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Society for Industrial and Applied Mathematics Society for Industrial and Applied Mathematics
spellingShingle Poulson, Jack
Demanet, Laurent
Maxwell, Nicholas
Ying, Lexing
A Parallel Butterfly Algorithm
title A Parallel Butterfly Algorithm
title_full A Parallel Butterfly Algorithm
title_fullStr A Parallel Butterfly Algorithm
title_full_unstemmed A Parallel Butterfly Algorithm
title_short A Parallel Butterfly Algorithm
title_sort parallel butterfly algorithm
url http://hdl.handle.net/1721.1/88176
https://orcid.org/0000-0001-7052-5097
work_keys_str_mv AT poulsonjack aparallelbutterflyalgorithm
AT demanetlaurent aparallelbutterflyalgorithm
AT maxwellnicholas aparallelbutterflyalgorithm
AT yinglexing aparallelbutterflyalgorithm
AT poulsonjack parallelbutterflyalgorithm
AT demanetlaurent parallelbutterflyalgorithm
AT maxwellnicholas parallelbutterflyalgorithm
AT yinglexing parallelbutterflyalgorithm