Brief announcement: Distributed shared memory based on computation migration

Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and...

Full description

Bibliographic Details
Main Authors:	Lis, Mieszko, Shim, Keun Sup, Cho, Myong Hyon, Fletcher, Christopher Wardlaw, Kinsy, Michel A., Lebedev, Ilia A., Khan, Omer, Devadas, Srinivas
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	Association for Computing Machinery (ACM) 2012
Online Access:	http://hdl.handle.net/1721.1/72358 https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150

_version_	1811096073767223296
author	Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas
author_sort	Lis, Mieszko
collection	MIT
description	Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7].
first_indexed	2024-09-23T16:37:56Z
format	Article
id	mit-1721.1/72358
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T16:37:56Z
publishDate	2012
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/723582022-10-02T08:34:37Z Brief announcement: Distributed shared memory based on computation migration Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Devadas, Srinivas Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7]. 2012-08-27T20:39:51Z 2012-08-27T20:39:51Z 2011-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4503-0743-7 http://hdl.handle.net/1721.1/72358 Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Christopher W. Fletcher, Michel Kinsy, Ilia Lebedev, Omer Khan, and Srinivas Devadas. 2011. Brief announcement: distributed shared memory based on computation migration. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, 253-256. https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150 en_US http://dx.doi.org/10.1145/1989493.1989530 Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11) Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain
spellingShingle	Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Brief announcement: Distributed shared memory based on computation migration
title	Brief announcement: Distributed shared memory based on computation migration
title_full	Brief announcement: Distributed shared memory based on computation migration
title_fullStr	Brief announcement: Distributed shared memory based on computation migration
title_full_unstemmed	Brief announcement: Distributed shared memory based on computation migration
title_short	Brief announcement: Distributed shared memory based on computation migration
title_sort	brief announcement distributed shared memory based on computation migration
url	http://hdl.handle.net/1721.1/72358 https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150
work_keys_str_mv	AT lismieszko briefannouncementdistributedsharedmemorybasedoncomputationmigration AT shimkeunsup briefannouncementdistributedsharedmemorybasedoncomputationmigration AT chomyonghyon briefannouncementdistributedsharedmemorybasedoncomputationmigration AT fletcherchristopherwardlaw briefannouncementdistributedsharedmemorybasedoncomputationmigration AT kinsymichela briefannouncementdistributedsharedmemorybasedoncomputationmigration AT lebedeviliaa briefannouncementdistributedsharedmemorybasedoncomputationmigration AT khanomer briefannouncementdistributedsharedmemorybasedoncomputationmigration AT devadassrinivas briefannouncementdistributedsharedmemorybasedoncomputationmigration

Brief announcement: Distributed shared memory based on computation migration

Similar Items