Ubik: efficient cache sharing with strict qos for latency-critical workloads

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the ot...

Full description

Bibliographic Details
Main Authors: Kasture, Harshad, Sanchez, Daniel
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computing Machinery (ACM) 2014
Online Access:http://hdl.handle.net/1721.1/90846
https://orcid.org/0000-0002-2453-2904
https://orcid.org/0000-0002-3964-9064
_version_ 1826199947399135232
author Kasture, Harshad
Sanchez, Daniel
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Kasture, Harshad
Sanchez, Daniel
author_sort Kasture, Harshad
collection MIT
description Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.
first_indexed 2024-09-23T11:28:20Z
format Article
id mit-1721.1/90846
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T11:28:20Z
publishDate 2014
publisher Association for Computing Machinery (ACM)
record_format dspace
spelling mit-1721.1/908462022-10-01T03:53:12Z Ubik: efficient cache sharing with strict qos for latency-critical workloads Kasture, Harshad Sanchez, Daniel Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kasture, Harshad Sanchez, Daniel Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications. United States. Defense Advanced Research Projects Agency (Power Efficiency Revolution For Embedded Computing Technologies Contract HR0011-13-2-0005) National Science Foundation (U.S.) (Grant CCF-1318384) 2014-10-09T18:20:42Z 2014-10-09T18:20:42Z 2014-03 Article http://purl.org/eprint/type/ConferencePaper 9781450323055 http://hdl.handle.net/1721.1/90846 Harshad Kasture and Daniel Sanchez. 2014. Ubik: efficient cache sharing with strict qos for latency-critical workloads. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14). ACM, New York, NY, USA, 729-742. https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064 en_US http://dx.doi.org/10.1145/2541940.2541944 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain
spellingShingle Kasture, Harshad
Sanchez, Daniel
Ubik: efficient cache sharing with strict qos for latency-critical workloads
title Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_full Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_fullStr Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_full_unstemmed Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_short Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_sort ubik efficient cache sharing with strict qos for latency critical workloads
url http://hdl.handle.net/1721.1/90846
https://orcid.org/0000-0002-2453-2904
https://orcid.org/0000-0002-3964-9064
work_keys_str_mv AT kastureharshad ubikefficientcachesharingwithstrictqosforlatencycriticalworkloads
AT sanchezdaniel ubikefficientcachesharingwithstrictqosforlatencycriticalworkloads