Adaptive Cache-Oblivious All-to-All Operation

Modern processors rely on cache memories to reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. Howe...

Full description

Bibliographic Details
Main Authors:	Chung, Shin Yee, Hsu, Wen Jing
Format:	Article
Language:	en_US
Published:	2003
Subjects:	cache-oblivious algorithms all-to-all operations cache misses
Online Access:	http://hdl.handle.net/1721.1/3848

_version_	1826215012436279296
author	Chung, Shin Yee Hsu, Wen Jing
author_facet	Chung, Shin Yee Hsu, Wen Jing
author_sort	Chung, Shin Yee
collection	MIT
description	Modern processors rely on cache memories to reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. However, careful tuning of these parameters for these algorithms is needed for different hardware platforms. Cache-oblivious (CO) algorithms were first introduced by Leiserson to work without the knowledge of the cache parameters mentioned earlier, but still achieve optimal work complexity and optimal cache complexity. Here we present CO algorithms for all-to-all operations (analogous to the cross-product operation). Its applications include Convolution, Polynomial Arithmetic, Multiple Sequence Alignment, N-Body Simulation, etc. Given two lists each with n elements, a naive implementation of all-to-all operation incurs O(n²/L) cache misses. Our CO version incurs only O(n²/L²√Z) cache misses. Preliminary experiments on Opteron 1.4GHz and MIPS 250MHz show that the CO implementation achieves two times faster. The profiling tool further confirms that the amount of cache misses is significantly lower. We also consider various situations where (a) the elements have non-uniform sizes, (b) an element cannot fit into the cache, (c) the lengths of the lists vary, and (d) an element is linked list. In addition, we study the extension to K-lists All-to-All Operation and its application. Finally, we will present the empirical results and compare with cache-aware algorithms.
first_indexed	2024-09-23T16:15:26Z
format	Article
id	mit-1721.1/3848
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T16:15:26Z
publishDate	2003
record_format	dspace
spelling	mit-1721.1/38482019-04-12T20:13:51Z Adaptive Cache-Oblivious All-to-All Operation Chung, Shin Yee Hsu, Wen Jing cache-oblivious algorithms all-to-all operations cache misses Modern processors rely on cache memories to reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. However, careful tuning of these parameters for these algorithms is needed for different hardware platforms. Cache-oblivious (CO) algorithms were first introduced by Leiserson to work without the knowledge of the cache parameters mentioned earlier, but still achieve optimal work complexity and optimal cache complexity. Here we present CO algorithms for all-to-all operations (analogous to the cross-product operation). Its applications include Convolution, Polynomial Arithmetic, Multiple Sequence Alignment, N-Body Simulation, etc. Given two lists each with n elements, a naive implementation of all-to-all operation incurs O(n²/L) cache misses. Our CO version incurs only O(n²/L²√Z) cache misses. Preliminary experiments on Opteron 1.4GHz and MIPS 250MHz show that the CO implementation achieves two times faster. The profiling tool further confirms that the amount of cache misses is significantly lower. We also consider various situations where (a) the elements have non-uniform sizes, (b) an element cannot fit into the cache, (c) the lengths of the lists vary, and (d) an element is linked list. In addition, we study the extension to K-lists All-to-All Operation and its application. Finally, we will present the empirical results and compare with cache-aware algorithms. Singapore-MIT Alliance (SMA) 2003-12-13T18:20:46Z 2003-12-13T18:20:46Z 2004-01 Article http://hdl.handle.net/1721.1/3848 en_US Computer Science (CS); 48735 bytes application/pdf application/pdf
spellingShingle	cache-oblivious algorithms all-to-all operations cache misses Chung, Shin Yee Hsu, Wen Jing Adaptive Cache-Oblivious All-to-All Operation
title	Adaptive Cache-Oblivious All-to-All Operation
title_full	Adaptive Cache-Oblivious All-to-All Operation
title_fullStr	Adaptive Cache-Oblivious All-to-All Operation
title_full_unstemmed	Adaptive Cache-Oblivious All-to-All Operation
title_short	Adaptive Cache-Oblivious All-to-All Operation
title_sort	adaptive cache oblivious all to all operation
topic	cache-oblivious algorithms all-to-all operations cache misses
url	http://hdl.handle.net/1721.1/3848
work_keys_str_mv	AT chungshinyee adaptivecacheobliviousalltoalloperation AT hsuwenjing adaptivecacheobliviousalltoalloperation

Adaptive Cache-Oblivious All-to-All Operation

Similar Items