SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering

In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do...

Full description

Bibliographic Details
Main Authors:	Daya, Bhavya Kishor, Chen, Chia-Hsin, Subramanian, Suvinay, Kwon, Woo-Cheol, Park, Sunghyun, Krishna, Tushar, Holt, Jim, Chandrakasan, Anantha P, Peh, Li-Shiuan
Format:	Article
Language:	English
Published:	Association for Computing Machinery (ACM) 2021
Online Access:	https://hdl.handle.net/1721.1/130036

_version_	1811083913599123456
author	Daya, Bhavya Kishor Chen, Chia-Hsin Subramanian, Suvinay Kwon, Woo-Cheol Park, Sunghyun Krishna, Tushar Holt, Jim Chandrakasan, Anantha P Peh, Li-Shiuan
author_facet	Daya, Bhavya Kishor Chen, Chia-Hsin Subramanian, Suvinay Kwon, Woo-Cheol Park, Sunghyun Krishna, Tushar Holt, Jim Chandrakasan, Anantha P Peh, Li-Shiuan
author_sort	Daya, Bhavya Kishor
collection	MIT
description	In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do not scale. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking. Prior techniques supporting ordering on arbitrary unordered networks are impractical for full multicore chip designs. We present SCORPIO, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering. Message delivery is decoupledfrom the ordering, allowing messages to arrive in any order and at any time, and still be correctly ordered. The architecture is designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns. Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively. The SCORPIO architecture is incorporated in an 11 mm-by13mm chip prototype, fabricated in IBM 45nm SOI technology, comprising 36 Freescale e200 Power ArchitectureTMcores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers. The chip prototype achieves a post synthesis operating frequency of 1 GHz (833 MHz post-layout) with an estimated power of 28.8 W (768 mW per tile), while the network consumes only 10% of tile area and 19 % of tile power.
first_indexed	2024-09-23T12:41:37Z
format	Article
id	mit-1721.1/130036
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T12:41:37Z
publishDate	2021
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/1300362021-03-03T03:18:34Z SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering Daya, Bhavya Kishor Chen, Chia-Hsin Subramanian, Suvinay Kwon, Woo-Cheol Park, Sunghyun Krishna, Tushar Holt, Jim Chandrakasan, Anantha P Peh, Li-Shiuan In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do not scale. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking. Prior techniques supporting ordering on arbitrary unordered networks are impractical for full multicore chip designs. We present SCORPIO, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering. Message delivery is decoupledfrom the ordering, allowing messages to arrive in any order and at any time, and still be correctly ordered. The architecture is designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns. Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively. The SCORPIO architecture is incorporated in an 11 mm-by13mm chip prototype, fabricated in IBM 45nm SOI technology, comprising 36 Freescale e200 Power ArchitectureTMcores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers. The chip prototype achieves a post synthesis operating frequency of 1 GHz (833 MHz post-layout) with an estimated power of 28.8 W (768 mW per tile), while the network consumes only 10% of tile area and 19 % of tile power. 2021-03-02T15:08:30Z 2021-03-02T15:08:30Z 2014-06 2020-12-04T17:34:42Z Article http://purl.org/eprint/type/JournalArticle 978-1-4799-4394-4/14 0163-5964 https://hdl.handle.net/1721.1/130036 Daya, Bhavya K. et al. “SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering.” Paper in the SIGARCH computer architecture news, 42, 2, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, 14-18 June 2014, Association for Computing Machinery (ACM) © 2014 The Author(s) en 10.1145/2678373.2665680 SIGARCH computer architecture news Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain
spellingShingle	Daya, Bhavya Kishor Chen, Chia-Hsin Subramanian, Suvinay Kwon, Woo-Cheol Park, Sunghyun Krishna, Tushar Holt, Jim Chandrakasan, Anantha P Peh, Li-Shiuan SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
title	SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
title_full	SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
title_fullStr	SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
title_full_unstemmed	SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
title_short	SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
title_sort	scorpio a 36 core research chip demonstrating snoopy coherence on a scalable mesh noc with in network ordering
url	https://hdl.handle.net/1721.1/130036
work_keys_str_mv	AT dayabhavyakishor scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT chenchiahsin scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT subramaniansuvinay scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT kwonwoocheol scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT parksunghyun scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT krishnatushar scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT holtjim scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT chandrakasanananthap scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering AT pehlishiuan scorpioa36coreresearchchipdemonstratingsnoopycoherenceonascalablemeshnocwithinnetworkordering

SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering

Similar Items