Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums

© 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accu...

Full description

Bibliographic Details
Main Authors: Fraser, Sean, Xu, Helen, Leiserson, Charles E
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE) 2022
Online Access:https://hdl.handle.net/1721.1/143740
_version_ 1826204997962956800
author Fraser, Sean
Xu, Helen
Leiserson, Charles E
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Fraser, Sean
Xu, Helen
Leiserson, Charles E
author_sort Fraser, Sean
collection MIT
description © 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST _BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error.
first_indexed 2024-09-23T13:05:08Z
format Article
id mit-1721.1/143740
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T13:05:08Z
publishDate 2022
publisher Institute of Electrical and Electronics Engineers (IEEE)
record_format dspace
spelling mit-1721.1/1437402023-01-26T21:50:39Z Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums Fraser, Sean Xu, Helen Leiserson, Charles E Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST _BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error. 2022-07-14T18:27:54Z 2022-07-14T18:27:54Z 2020 2022-07-14T17:57:19Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/143740 Fraser, Sean, Xu, Helen and Leiserson, Charles E. 2020. "Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums." 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020. en 10.1109/HPEC43674.2020.9286240 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain
spellingShingle Fraser, Sean
Xu, Helen
Leiserson, Charles E
Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_full Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_fullStr Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_full_unstemmed Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_short Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_sort work efficient parallel algorithms for accurate floating point prefix sums
url https://hdl.handle.net/1721.1/143740
work_keys_str_mv AT frasersean workefficientparallelalgorithmsforaccuratefloatingpointprefixsums
AT xuhelen workefficientparallelalgorithmsforaccuratefloatingpointprefixsums
AT leisersoncharlese workefficientparallelalgorithmsforaccuratefloatingpointprefixsums