Thread Migration Prediction for Distributed Shared Caches

Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Access (NUCA) design, where on-chip acce...

Full description

Bibliographic Details
Main Authors:	Shim, Keun Sup, Lis, Mieszko, Khan, Omer, Devadas, Srinivas
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2015
Online Access:	http://hdl.handle.net/1721.1/100003 https://orcid.org/0000-0001-8253-7714

_version_	1811093899646599168
author	Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas
author_sort	Shim, Keun Sup
collection	MIT
description	Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Access (NUCA) design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware-level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple round-trip remote cache accesses with a smaller number of migrations. High migration costs, however, make it crucial to use thread migrations judiciously; we therefore propose a novel, on-line prediction scheme which decides whether to perform a remote access (as in traditional NUCA designs) or to perform a thread migration at the instruction level. For a set of parallel benchmarks, our thread migration predictor improves the performance by 24% on average over the shared-NUCA design that only uses remote accesses.
first_indexed	2024-09-23T15:52:28Z
format	Article
id	mit-1721.1/100003
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T15:52:28Z
publishDate	2015
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1000032022-09-29T16:45:08Z Thread Migration Prediction for Distributed Shared Caches Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Shim, Keun Sup Lis, Mieszko Devadas, Srinivas Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Access (NUCA) design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware-level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple round-trip remote cache accesses with a smaller number of migrations. High migration costs, however, make it crucial to use thread migrations judiciously; we therefore propose a novel, on-line prediction scheme which decides whether to perform a remote access (as in traditional NUCA designs) or to perform a thread migration at the instruction level. For a set of parallel benchmarks, our thread migration predictor improves the performance by 24% on average over the shared-NUCA design that only uses remote accesses. 2015-11-23T17:04:28Z 2015-11-23T17:04:28Z 2014-10 2012-09 Article http://purl.org/eprint/type/JournalArticle 1556-6056 http://hdl.handle.net/1721.1/100003 Shim, Keun Sup, Mieszko Lis, Omer Khan, and Srinivas Devadas. “Thread Migration Prediction for Distributed Shared Caches.” IEEE Computer Architecture Letters 13, no. 1 (January 14, 2014): 53–56. https://orcid.org/0000-0001-8253-7714 en_US http://dx.doi.org/10.1109/l-ca.2012.30 IEEE Computer Architecture Letters Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain
spellingShingle	Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas Thread Migration Prediction for Distributed Shared Caches
title	Thread Migration Prediction for Distributed Shared Caches
title_full	Thread Migration Prediction for Distributed Shared Caches
title_fullStr	Thread Migration Prediction for Distributed Shared Caches
title_full_unstemmed	Thread Migration Prediction for Distributed Shared Caches
title_short	Thread Migration Prediction for Distributed Shared Caches
title_sort	thread migration prediction for distributed shared caches
url	http://hdl.handle.net/1721.1/100003 https://orcid.org/0000-0001-8253-7714
work_keys_str_mv	AT shimkeunsup threadmigrationpredictionfordistributedsharedcaches AT lismieszko threadmigrationpredictionfordistributedsharedcaches AT khanomer threadmigrationpredictionfordistributedsharedcaches AT devadassrinivas threadmigrationpredictionfordistributedsharedcaches

Thread Migration Prediction for Distributed Shared Caches

Similar Items