BOOM: Broadcast Optimizations for On-chip Meshes

Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multi...

Full description

Bibliographic Details
Main Authors:	Krishna, Tushar, Beckmann, Bradford M., Peh, Li-Shiuan, Reinhardt, Steven K.
Other Authors:	Li-Shiuan Peh
Published:	2011
Subjects:	multicore
Online Access:	http://hdl.handle.net/1721.1/61695

_version_	1811077454151811072
author	Krishna, Tushar Beckmann, Bradford M. Peh, Li-Shiuan Reinhardt, Steven K.
author2	Li-Shiuan Peh
author_facet	Li-Shiuan Peh Krishna, Tushar Beckmann, Bradford M. Peh, Li-Shiuan Reinhardt, Steven K.
author_sort	Krishna, Tushar
collection	MIT
description	Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multicast support has proposed forking of broadcast/multicast packets within the network at the router buffers, but these techniques are far from ideal, since they increase buffer occupancy which lowers throughput, and packets incur delay and power penalties at each router. In this work, we analyze an ideal broadcast mesh; show the substantial gaps between state-of-the-art multicast NoCs and the ideal; then propose BOOM, which comprises a WHIRL routing protocol that ideally load balances broadcast traffic, a mXbar multicast crossbar circuit that enables multicast traversal at similar energy-delay as unicasts, and speculative bypassing of buffering for multicast flits. Together, they enable broadcast packets to approach the delay, energy, and throughput of the ideal fabric. Our simulations show BOOM realizing an average network latency that is 5% off ideal, attaining 96% of ideal throughput, with energy consumption that is 9% above ideal. Evaluations using synthetic traffic show BOOM achieving a latency reduction of 61%, throughput improvement of 63%, and buffer power reduction of 80% as compared to a baseline broadcast. Simulations with PARSEC benchmarks show BOOM reducing average request and network latency by 40% and 15% respectively.
first_indexed	2024-09-23T10:43:17Z
id	mit-1721.1/61695
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T10:43:17Z
publishDate	2011
record_format	dspace
spelling	mit-1721.1/616952019-04-11T10:31:06Z BOOM: Broadcast Optimizations for On-chip Meshes Krishna, Tushar Beckmann, Bradford M. Peh, Li-Shiuan Reinhardt, Steven K. Li-Shiuan Peh Computer Architecture multicore Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multicast support has proposed forking of broadcast/multicast packets within the network at the router buffers, but these techniques are far from ideal, since they increase buffer occupancy which lowers throughput, and packets incur delay and power penalties at each router. In this work, we analyze an ideal broadcast mesh; show the substantial gaps between state-of-the-art multicast NoCs and the ideal; then propose BOOM, which comprises a WHIRL routing protocol that ideally load balances broadcast traffic, a mXbar multicast crossbar circuit that enables multicast traversal at similar energy-delay as unicasts, and speculative bypassing of buffering for multicast flits. Together, they enable broadcast packets to approach the delay, energy, and throughput of the ideal fabric. Our simulations show BOOM realizing an average network latency that is 5% off ideal, attaining 96% of ideal throughput, with energy consumption that is 9% above ideal. Evaluations using synthetic traffic show BOOM achieving a latency reduction of 61%, throughput improvement of 63%, and buffer power reduction of 80% as compared to a baseline broadcast. Simulations with PARSEC benchmarks show BOOM reducing average request and network latency by 40% and 15% respectively. 2011-03-14T19:45:24Z 2011-03-14T19:45:24Z 2011-03-14 http://hdl.handle.net/1721.1/61695 MIT-CSAIL-TR-2011-013 Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported http://creativecommons.org/licenses/by-nc-nd/3.0/ 12 p. application/pdf
spellingShingle	multicore Krishna, Tushar Beckmann, Bradford M. Peh, Li-Shiuan Reinhardt, Steven K. BOOM: Broadcast Optimizations for On-chip Meshes
title	BOOM: Broadcast Optimizations for On-chip Meshes
title_full	BOOM: Broadcast Optimizations for On-chip Meshes
title_fullStr	BOOM: Broadcast Optimizations for On-chip Meshes
title_full_unstemmed	BOOM: Broadcast Optimizations for On-chip Meshes
title_short	BOOM: Broadcast Optimizations for On-chip Meshes
title_sort	boom broadcast optimizations for on chip meshes
topic	multicore
url	http://hdl.handle.net/1721.1/61695
work_keys_str_mv	AT krishnatushar boombroadcastoptimizationsforonchipmeshes AT beckmannbradfordm boombroadcastoptimizationsforonchipmeshes AT pehlishiuan boombroadcastoptimizationsforonchipmeshes AT reinhardtstevenk boombroadcastoptimizationsforonchipmeshes

BOOM: Broadcast Optimizations for On-chip Meshes

Similar Items