Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
© 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accu...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE)
2022
|
Online Access: | https://hdl.handle.net/1721.1/143740 |
_version_ | 1826204997962956800 |
---|---|
author | Fraser, Sean Xu, Helen Leiserson, Charles E |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Fraser, Sean Xu, Helen Leiserson, Charles E |
author_sort | Fraser, Sean |
collection | MIT |
description | © 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST _BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error. |
first_indexed | 2024-09-23T13:05:08Z |
format | Article |
id | mit-1721.1/143740 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T13:05:08Z |
publishDate | 2022 |
publisher | Institute of Electrical and Electronics Engineers (IEEE) |
record_format | dspace |
spelling | mit-1721.1/1437402023-01-26T21:50:39Z Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums Fraser, Sean Xu, Helen Leiserson, Charles E Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST _BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error. 2022-07-14T18:27:54Z 2022-07-14T18:27:54Z 2020 2022-07-14T17:57:19Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/143740 Fraser, Sean, Xu, Helen and Leiserson, Charles E. 2020. "Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums." 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020. en 10.1109/HPEC43674.2020.9286240 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain |
spellingShingle | Fraser, Sean Xu, Helen Leiserson, Charles E Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums |
title | Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums |
title_full | Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums |
title_fullStr | Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums |
title_full_unstemmed | Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums |
title_short | Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums |
title_sort | work efficient parallel algorithms for accurate floating point prefix sums |
url | https://hdl.handle.net/1721.1/143740 |
work_keys_str_mv | AT frasersean workefficientparallelalgorithmsforaccuratefloatingpointprefixsums AT xuhelen workefficientparallelalgorithmsforaccuratefloatingpointprefixsums AT leisersoncharlese workefficientparallelalgorithmsforaccuratefloatingpointprefixsums |