Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums

© 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accu...

Full description

Bibliographic Details
Main Authors:	Fraser, Sean, Xu, Helen, Leiserson, Charles E
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2022
Online Access:	https://hdl.handle.net/1721.1/143740

_version_	1826204997962956800
author	Fraser, Sean Xu, Helen Leiserson, Charles E
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Fraser, Sean Xu, Helen Leiserson, Charles E
author_sort	Fraser, Sean
collection	MIT
description	© 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST _BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error.
first_indexed	2024-09-23T13:05:08Z
format	Article
id	mit-1721.1/143740
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T13:05:08Z
publishDate	2022
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1437402023-01-26T21:50:39Z Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums Fraser, Sean Xu, Helen Leiserson, Charles E Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST _BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error. 2022-07-14T18:27:54Z 2022-07-14T18:27:54Z 2020 2022-07-14T17:57:19Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/143740 Fraser, Sean, Xu, Helen and Leiserson, Charles E. 2020. "Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums." 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020. en 10.1109/HPEC43674.2020.9286240 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain
spellingShingle	Fraser, Sean Xu, Helen Leiserson, Charles E Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title	Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_full	Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_fullStr	Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_full_unstemmed	Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_short	Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums
title_sort	work efficient parallel algorithms for accurate floating point prefix sums
url	https://hdl.handle.net/1721.1/143740
work_keys_str_mv	AT frasersean workefficientparallelalgorithmsforaccuratefloatingpointprefixsums AT xuhelen workefficientparallelalgorithmsforaccuratefloatingpointprefixsums AT leisersoncharlese workefficientparallelalgorithmsforaccuratefloatingpointprefixsums

Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums

Similar Items