Locality-aware data replication in the Last-Level Cache
Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LL...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute of Electrical and Electronics Engineers (IEEE)
2015
|
Online Access: | http://hdl.handle.net/1721.1/100001 https://orcid.org/0000-0001-8253-7714 |
_version_ | 1826207453170106368 |
---|---|
author | Kurian, George Devadas, Srinivas Khan, Omer |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kurian, George Devadas, Srinivas Khan, Omer |
author_sort | Kurian, George |
collection | MIT |
description | Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes. |
first_indexed | 2024-09-23T13:50:12Z |
format | Article |
id | mit-1721.1/100001 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T13:50:12Z |
publishDate | 2015 |
publisher | Institute of Electrical and Electronics Engineers (IEEE) |
record_format | dspace |
spelling | mit-1721.1/1000012022-10-01T17:26:31Z Locality-aware data replication in the Last-Level Cache Kurian, George Devadas, Srinivas Khan, Omer Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kurian, George Devadas, Srinivas Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes. 2015-11-23T16:39:27Z 2015-11-23T16:39:27Z 2014-02 Article http://purl.org/eprint/type/ConferencePaper 978-1-4799-3097-5 http://hdl.handle.net/1721.1/100001 Kurian, George, Srinivas Devadas, and Omer Khan. “Locality-Aware Data Replication in the Last-Level Cache.” 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (February 2014). https://orcid.org/0000-0001-8253-7714 en_US http://dx.doi.org/10.1109/HPCA.2014.6835921 Proceedings of the 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) MIT web domain |
spellingShingle | Kurian, George Devadas, Srinivas Khan, Omer Locality-aware data replication in the Last-Level Cache |
title | Locality-aware data replication in the Last-Level Cache |
title_full | Locality-aware data replication in the Last-Level Cache |
title_fullStr | Locality-aware data replication in the Last-Level Cache |
title_full_unstemmed | Locality-aware data replication in the Last-Level Cache |
title_short | Locality-aware data replication in the Last-Level Cache |
title_sort | locality aware data replication in the last level cache |
url | http://hdl.handle.net/1721.1/100001 https://orcid.org/0000-0001-8253-7714 |
work_keys_str_mv | AT kuriangeorge localityawaredatareplicationinthelastlevelcache AT devadassrinivas localityawaredatareplicationinthelastlevelcache AT khanomer localityawaredatareplicationinthelastlevelcache |