Locality-aware data replication in the Last-Level Cache

Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LL...

Full description

Bibliographic Details
Main Authors:	Kurian, George, Devadas, Srinivas, Khan, Omer
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2015
Online Access:	http://hdl.handle.net/1721.1/100001 https://orcid.org/0000-0001-8253-7714

_version_	1826207453170106368
author	Kurian, George Devadas, Srinivas Khan, Omer
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kurian, George Devadas, Srinivas Khan, Omer
author_sort	Kurian, George
collection	MIT
description	Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes.
first_indexed	2024-09-23T13:50:12Z
format	Article
id	mit-1721.1/100001
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T13:50:12Z
publishDate	2015
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1000012022-10-01T17:26:31Z Locality-aware data replication in the Last-Level Cache Kurian, George Devadas, Srinivas Khan, Omer Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kurian, George Devadas, Srinivas Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes. 2015-11-23T16:39:27Z 2015-11-23T16:39:27Z 2014-02 Article http://purl.org/eprint/type/ConferencePaper 978-1-4799-3097-5 http://hdl.handle.net/1721.1/100001 Kurian, George, Srinivas Devadas, and Omer Khan. “Locality-Aware Data Replication in the Last-Level Cache.” 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (February 2014). https://orcid.org/0000-0001-8253-7714 en_US http://dx.doi.org/10.1109/HPCA.2014.6835921 Proceedings of the 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain
spellingShingle	Kurian, George Devadas, Srinivas Khan, Omer Locality-aware data replication in the Last-Level Cache
title	Locality-aware data replication in the Last-Level Cache
title_full	Locality-aware data replication in the Last-Level Cache
title_fullStr	Locality-aware data replication in the Last-Level Cache
title_full_unstemmed	Locality-aware data replication in the Last-Level Cache
title_short	Locality-aware data replication in the Last-Level Cache
title_sort	locality aware data replication in the last level cache
url	http://hdl.handle.net/1721.1/100001 https://orcid.org/0000-0001-8253-7714
work_keys_str_mv	AT kuriangeorge localityawaredatareplicationinthelastlevelcache AT devadassrinivas localityawaredatareplicationinthelastlevelcache AT khanomer localityawaredatareplicationinthelastlevelcache

Locality-aware data replication in the Last-Level Cache

Similar Items