Combined Distributed Shared-Buffered and Diagonally-Linked Mesh Topology for High-Performance Interconnect

Networks-on-Chip (NoCs) have become the <i>de-facto</i> on-chip interconnect for multi/manycore systems. A typical NoC router is made up of buffers used to store packets that are unable to advance to their desired destination. However, buffers consume significant power/area and are often...

Full description

Bibliographic Details
Main Authors: Charles Effiong, Gilles Sassatelli, Abdoulaye Gamatié
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Micromachines
Subjects:
Online Access:https://www.mdpi.com/2072-666X/13/12/2246
Description
Summary:Networks-on-Chip (NoCs) have become the <i>de-facto</i> on-chip interconnect for multi/manycore systems. A typical NoC router is made up of buffers used to store packets that are unable to advance to their desired destination. However, buffers consume significant power/area and are often underutilized, especially in cases of applications with non-uniform traffic patterns thus leading to performance degradation for such applications. To improve network performance, the Roundabout NoC (<i>R-NoC</i>) concept is considered. <i>R-NoC</i> is inspired by real-life multi-lane traffic roundabouts and consists of lanes that are shared by multiple input/output ports to maximize buffering resource utilization. <i>R-NoC</i> relies on router-internal adaptive routing that decides the lane path based on back pressure. Back pressure makes it possible to assess lane utilization and route packets accordingly. This is made possible thanks to the use of elastic buffers for control flow, a clever type of handshaking in a way similar to asynchronous circuits. Another prominent feature of R-NoC is that internal routing and arbitration are completely distributed which allows for significant freedom in deciding internal router topology and parameters. This work leverages this property and proposes novel yet unexplored configurations for which an in-depth evaluation of corresponding implementations on 45 nm CMOS technology is given. Each configuration is evaluated performance and power-wise on both synthetic and real application traffic. Several <i>R-NoC</i> configurations are identified and demonstrated to provide very significant performance improvements over standard mesh configurations and a typical input-buffered router, without compromising area and power consumption. Exploiting the distributed nature of <i>R-NoC</i> routers, a diagonally-linked configuration is then proposed which incurs moderate area overhead and features yet better performance and energy efficiency.
ISSN:2072-666X