Brief announcement: Distributed shared memory based on computation migration
Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and...
Main Authors: | , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Association for Computing Machinery (ACM)
2012
|
Online Access: | http://hdl.handle.net/1721.1/72358 https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150 |
_version_ | 1811096073767223296 |
---|---|
author | Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas |
author_sort | Lis, Mieszko |
collection | MIT |
description | Driven by increasingly unbalanced technology scaling and power
dissipation limits, microprocessor designers have resorted to increasing
the number of cores on a single chip, and pundits expect
1000-core designs to materialize in the next few years [1]. But how
will memory architectures scale and how will these next-generation
multicores be programmed?
One barrier to scaling current memory architectures is the offchip
memory bandwidth wall [1,2]: off-chip bandwidth grows with
package pin density, which scales much more slowly than on-die
transistor density [3]. To reduce reliance on external memories and
keep data on-chip, today’s multicores integrate very large shared
last-level caches on chip [4]; interconnects used with such shared
caches, however, do not scale beyond relatively few cores, and the
power requirements and access latencies of large caches exclude
their use in chips on a 1000-core scale. For massive-scale multicores,
then, we are left with relatively small per-core caches.
Per-core caches on a 1000-core scale, in turn, raise the question
of memory coherence. On the one hand, a shared memory abstraction
is a practical necessity for general-purpose programming, and
most programmers prefer a shared memory model [5]. On the other
hand, ensuring coherence among private caches is an expensive
proposition: bus-based and snoopy protocols don’t scale beyond
relatively few cores, and directory sizes needed in cache-coherence
protocols must equal a significant portion of the combined size of
the per-core caches as otherwise directory evictions will limit performance
[6]. Moreover, directory-based coherence protocols are
notoriously difficult to implement and verify [7]. |
first_indexed | 2024-09-23T16:37:56Z |
format | Article |
id | mit-1721.1/72358 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T16:37:56Z |
publishDate | 2012 |
publisher | Association for Computing Machinery (ACM) |
record_format | dspace |
spelling | mit-1721.1/723582022-10-02T08:34:37Z Brief announcement: Distributed shared memory based on computation migration Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Devadas, Srinivas Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7]. 2012-08-27T20:39:51Z 2012-08-27T20:39:51Z 2011-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4503-0743-7 http://hdl.handle.net/1721.1/72358 Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Christopher W. Fletcher, Michel Kinsy, Ilia Lebedev, Omer Khan, and Srinivas Devadas. 2011. Brief announcement: distributed shared memory based on computation migration. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, 253-256. https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150 en_US http://dx.doi.org/10.1145/1989493.1989530 Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11) Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain |
spellingShingle | Lis, Mieszko Shim, Keun Sup Cho, Myong Hyon Fletcher, Christopher Wardlaw Kinsy, Michel A. Lebedev, Ilia A. Khan, Omer Devadas, Srinivas Brief announcement: Distributed shared memory based on computation migration |
title | Brief announcement: Distributed shared memory based on computation migration |
title_full | Brief announcement: Distributed shared memory based on computation migration |
title_fullStr | Brief announcement: Distributed shared memory based on computation migration |
title_full_unstemmed | Brief announcement: Distributed shared memory based on computation migration |
title_short | Brief announcement: Distributed shared memory based on computation migration |
title_sort | brief announcement distributed shared memory based on computation migration |
url | http://hdl.handle.net/1721.1/72358 https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0003-4301-1159 https://orcid.org/0000-0001-5490-2323 https://orcid.org/0000-0003-1467-2150 |
work_keys_str_mv | AT lismieszko briefannouncementdistributedsharedmemorybasedoncomputationmigration AT shimkeunsup briefannouncementdistributedsharedmemorybasedoncomputationmigration AT chomyonghyon briefannouncementdistributedsharedmemorybasedoncomputationmigration AT fletcherchristopherwardlaw briefannouncementdistributedsharedmemorybasedoncomputationmigration AT kinsymichela briefannouncementdistributedsharedmemorybasedoncomputationmigration AT lebedeviliaa briefannouncementdistributedsharedmemorybasedoncomputationmigration AT khanomer briefannouncementdistributedsharedmemorybasedoncomputationmigration AT devadassrinivas briefannouncementdistributedsharedmemorybasedoncomputationmigration |