Brief announcement: Distributed shared memory based on computation migration

Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and...

Full description

Bibliographic Details
Main Authors: Lis, Mieszko, Shim, Keun Sup, Cho, Myong Hyon, Fletcher, Christopher Wardlaw, Kinsy, Michel A., Lebedev, Ilia A., Khan, Omer, Devadas, Srinivas
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computing Machinery (ACM) 2012
Online Access:http://hdl.handle.net/1721.1/72358
https://orcid.org/0000-0001-8253-7714
https://orcid.org/0000-0003-4301-1159
https://orcid.org/0000-0001-5490-2323
https://orcid.org/0000-0003-1467-2150
_version_ 1811096073767223296
author Lis, Mieszko
Shim, Keun Sup
Cho, Myong Hyon
Fletcher, Christopher Wardlaw
Kinsy, Michel A.
Lebedev, Ilia A.
Khan, Omer
Devadas, Srinivas
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Lis, Mieszko
Shim, Keun Sup
Cho, Myong Hyon
Fletcher, Christopher Wardlaw
Kinsy, Michel A.
Lebedev, Ilia A.
Khan, Omer
Devadas, Srinivas
author_sort Lis, Mieszko
collection MIT
description Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7].
first_indexed 2024-09-23T16:37:56Z
format Article
id mit-1721.1/72358
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:37:56Z
publishDate 2012
publisher Association for Computing Machinery (ACM)
record_format dspace
spelling mit-1721.1/723582022-10-02T08:34:37Z Brief announcement: Distributed shared memory based on computation migration Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Devadas, Srinivas Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7]. 2012-08-27T20:39:51Z 2012-08-27T20:39:51Z 2011-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4503-0743-7 http://hdl.handle.net/1721.1/72358 Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Christopher W. Fletcher, Michel Kinsy, Ilia Lebedev, Omer Khan, and Srinivas Devadas. 2011. Brief announcement: distributed shared memory based on computation migration. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, 253-256. https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150 en_US http://dx.doi.org/10.1145/1989493.1989530 Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11) Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain
spellingShingle Lis, Mieszko
Shim, Keun Sup
Cho, Myong Hyon
Fletcher, Christopher Wardlaw
Kinsy, Michel A.
Lebedev, Ilia A.
Khan, Omer
Devadas, Srinivas
Brief announcement: Distributed shared memory based on computation migration
title Brief announcement: Distributed shared memory based on computation migration
title_full Brief announcement: Distributed shared memory based on computation migration
title_fullStr Brief announcement: Distributed shared memory based on computation migration
title_full_unstemmed Brief announcement: Distributed shared memory based on computation migration
title_short Brief announcement: Distributed shared memory based on computation migration
title_sort brief announcement distributed shared memory based on computation migration
url http://hdl.handle.net/1721.1/72358
https://orcid.org/0000-0001-8253-7714
https://orcid.org/0000-0003-4301-1159
https://orcid.org/0000-0001-5490-2323
https://orcid.org/0000-0003-1467-2150
work_keys_str_mv AT lismieszko briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT shimkeunsup briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT chomyonghyon briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT fletcherchristopherwardlaw briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT kinsymichela briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT lebedeviliaa briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT khanomer briefannouncementdistributedsharedmemorybasedoncomputationmigration
AT devadassrinivas briefannouncementdistributedsharedmemorybasedoncomputationmigration