Ubik: efficient cache sharing with strict qos for latency-critical workloads
Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the ot...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Association for Computing Machinery (ACM)
2014
|
Online Access: | http://hdl.handle.net/1721.1/90846 https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064 |
_version_ | 1826199947399135232 |
---|---|
author | Kasture, Harshad Sanchez, Daniel |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Kasture, Harshad Sanchez, Daniel |
author_sort | Kasture, Harshad |
collection | MIT |
description | Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency.
In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications. |
first_indexed | 2024-09-23T11:28:20Z |
format | Article |
id | mit-1721.1/90846 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T11:28:20Z |
publishDate | 2014 |
publisher | Association for Computing Machinery (ACM) |
record_format | dspace |
spelling | mit-1721.1/908462022-10-01T03:53:12Z Ubik: efficient cache sharing with strict qos for latency-critical workloads Kasture, Harshad Sanchez, Daniel Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kasture, Harshad Sanchez, Daniel Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications. United States. Defense Advanced Research Projects Agency (Power Efficiency Revolution For Embedded Computing Technologies Contract HR0011-13-2-0005) National Science Foundation (U.S.) (Grant CCF-1318384) 2014-10-09T18:20:42Z 2014-10-09T18:20:42Z 2014-03 Article http://purl.org/eprint/type/ConferencePaper 9781450323055 http://hdl.handle.net/1721.1/90846 Harshad Kasture and Daniel Sanchez. 2014. Ubik: efficient cache sharing with strict qos for latency-critical workloads. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14). ACM, New York, NY, USA, 729-742. https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064 en_US http://dx.doi.org/10.1145/2541940.2541944 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain |
spellingShingle | Kasture, Harshad Sanchez, Daniel Ubik: efficient cache sharing with strict qos for latency-critical workloads |
title | Ubik: efficient cache sharing with strict qos for latency-critical workloads |
title_full | Ubik: efficient cache sharing with strict qos for latency-critical workloads |
title_fullStr | Ubik: efficient cache sharing with strict qos for latency-critical workloads |
title_full_unstemmed | Ubik: efficient cache sharing with strict qos for latency-critical workloads |
title_short | Ubik: efficient cache sharing with strict qos for latency-critical workloads |
title_sort | ubik efficient cache sharing with strict qos for latency critical workloads |
url | http://hdl.handle.net/1721.1/90846 https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064 |
work_keys_str_mv | AT kastureharshad ubikefficientcachesharingwithstrictqosforlatencycriticalworkloads AT sanchezdaniel ubikefficientcachesharingwithstrictqosforlatencycriticalworkloads |