Judicious Thread Migration When Accessing Distributed Shared Caches

Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Architecture (NUCA) design, where on chip access l...

Full description

Bibliographic Details
Main Authors:	Shim, Keun Sup, Lis, Mieszko, Khan, Omer, Devadas, Srinivas
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	2012
Online Access:	http://hdl.handle.net/1721.1/73130 https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0001-5490-2323

_version_	1826205995741741056
author	Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas
author_sort	Shim, Keun Sup
collection	MIT
description	Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Architecture (NUCA) design, where on chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple round-trip remote cache accesses with a smaller number of migrations. High migration costs, however, make it crucial to use thread migrations judiciously; we therefore propose a novel, on-line prediction scheme which decides whether to perform a remote access (as in traditional NUCA designs) or to perform a thread migration at the instruction level. For a set of parallel benchmarks, our thread migration predictor improves the performance by 18% on average and at best by 2.3X over the standard NUCA design that only uses remote accesses.
first_indexed	2024-09-23T13:22:19Z
format	Article
id	mit-1721.1/73130
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T13:22:19Z
publishDate	2012
record_format	dspace
spelling	mit-1721.1/731302022-09-28T13:43:58Z Judicious Thread Migration When Accessing Distributed Shared Caches Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Devadas, Srinivas Devadas, Srinivas Shim, Keun Sup Lis, Mieszko Khan, Omer Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Architecture (NUCA) design, where on chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple round-trip remote cache accesses with a smaller number of migrations. High migration costs, however, make it crucial to use thread migrations judiciously; we therefore propose a novel, on-line prediction scheme which decides whether to perform a remote access (as in traditional NUCA designs) or to perform a thread migration at the instruction level. For a set of parallel benchmarks, our thread migration predictor improves the performance by 18% on average and at best by 2.3X over the standard NUCA design that only uses remote accesses. 2012-09-24T18:49:52Z 2012-09-24T18:49:52Z 2012-01 Article http://purl.org/eprint/type/ConferencePaper http://hdl.handle.net/1721.1/73130 Shim, Keun Sup, Mieszko Lis, Omer Khan, and Srinivas Devadas. Judicious Thread Migration When Accessing Distributed Shared Caches." in Proceedings of the Third Workshop on Computer Architecture and Operating System Co-design (CAOS), 2012 January 25, 2012, Paris, France. https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0001-5490-2323 en_US http://projects.csail.mit.edu/caos/caos_2012.pdf Proceedings of the Third Workshop on Computer Architecture and Operating System Co-design (CAOS), 2012 Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf MIT web domain
spellingShingle	Shim, Keun Sup Lis, Mieszko Khan, Omer Devadas, Srinivas Judicious Thread Migration When Accessing Distributed Shared Caches
title	Judicious Thread Migration When Accessing Distributed Shared Caches
title_full	Judicious Thread Migration When Accessing Distributed Shared Caches
title_fullStr	Judicious Thread Migration When Accessing Distributed Shared Caches
title_full_unstemmed	Judicious Thread Migration When Accessing Distributed Shared Caches
title_short	Judicious Thread Migration When Accessing Distributed Shared Caches
title_sort	judicious thread migration when accessing distributed shared caches
url	http://hdl.handle.net/1721.1/73130 https://orcid.org/0000-0001-8253-7714 https://orcid.org/0000-0001-5490-2323
work_keys_str_mv	AT shimkeunsup judiciousthreadmigrationwhenaccessingdistributedsharedcaches AT lismieszko judiciousthreadmigrationwhenaccessingdistributedsharedcaches AT khanomer judiciousthreadmigrationwhenaccessingdistributedsharedcaches AT devadassrinivas judiciousthreadmigrationwhenaccessingdistributedsharedcaches

Judicious Thread Migration When Accessing Distributed Shared Caches

Similar Items