Locality-aware data replication in the Last-Level Cache

Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LL...

Full description

Bibliographic Details
Main Authors: Kurian, George, Devadas, Srinivas, Khan, Omer
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute of Electrical and Electronics Engineers (IEEE) 2015
Online Access:http://hdl.handle.net/1721.1/100001
https://orcid.org/0000-0001-8253-7714
_version_ 1826207453170106368
author Kurian, George
Devadas, Srinivas
Khan, Omer
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Kurian, George
Devadas, Srinivas
Khan, Omer
author_sort Kurian, George
collection MIT
description Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes.
first_indexed 2024-09-23T13:50:12Z
format Article
id mit-1721.1/100001
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:50:12Z
publishDate 2015
publisher Institute of Electrical and Electronics Engineers (IEEE)
record_format dspace
spelling mit-1721.1/1000012022-10-01T17:26:31Z Locality-aware data replication in the Last-Level Cache Kurian, George Devadas, Srinivas Khan, Omer Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kurian, George Devadas, Srinivas Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes. 2015-11-23T16:39:27Z 2015-11-23T16:39:27Z 2014-02 Article http://purl.org/eprint/type/ConferencePaper 978-1-4799-3097-5 http://hdl.handle.net/1721.1/100001 Kurian, George, Srinivas Devadas, and Omer Khan. “Locality-Aware Data Replication in the Last-Level Cache.” 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (February 2014). https://orcid.org/0000-0001-8253-7714 en_US http://dx.doi.org/10.1109/HPCA.2014.6835921 Proceedings of the 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain
spellingShingle Kurian, George
Devadas, Srinivas
Khan, Omer
Locality-aware data replication in the Last-Level Cache
title Locality-aware data replication in the Last-Level Cache
title_full Locality-aware data replication in the Last-Level Cache
title_fullStr Locality-aware data replication in the Last-Level Cache
title_full_unstemmed Locality-aware data replication in the Last-Level Cache
title_short Locality-aware data replication in the Last-Level Cache
title_sort locality aware data replication in the last level cache
url http://hdl.handle.net/1721.1/100001
https://orcid.org/0000-0001-8253-7714
work_keys_str_mv AT kuriangeorge localityawaredatareplicationinthelastlevelcache
AT devadassrinivas localityawaredatareplicationinthelastlevelcache
AT khanomer localityawaredatareplicationinthelastlevelcache