Ubik: efficient cache sharing with strict qos for latency-critical workloads

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the ot...

Full description

Bibliographic Details
Main Authors:	Kasture, Harshad, Sanchez, Daniel
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	Association for Computing Machinery (ACM) 2014
Online Access:	http://hdl.handle.net/1721.1/90846 https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064

_version_	1826199947399135232
author	Kasture, Harshad Sanchez, Daniel
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Kasture, Harshad Sanchez, Daniel
author_sort	Kasture, Harshad
collection	MIT
description	Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.
first_indexed	2024-09-23T11:28:20Z
format	Article
id	mit-1721.1/90846
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T11:28:20Z
publishDate	2014
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/908462022-10-01T03:53:12Z Ubik: efficient cache sharing with strict qos for latency-critical workloads Kasture, Harshad Sanchez, Daniel Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kasture, Harshad Sanchez, Daniel Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications. United States. Defense Advanced Research Projects Agency (Power Efficiency Revolution For Embedded Computing Technologies Contract HR0011-13-2-0005) National Science Foundation (U.S.) (Grant CCF-1318384) 2014-10-09T18:20:42Z 2014-10-09T18:20:42Z 2014-03 Article http://purl.org/eprint/type/ConferencePaper 9781450323055 http://hdl.handle.net/1721.1/90846 Harshad Kasture and Daniel Sanchez. 2014. Ubik: efficient cache sharing with strict qos for latency-critical workloads. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14). ACM, New York, NY, USA, 729-742. https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064 en_US http://dx.doi.org/10.1145/2541940.2541944 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain
spellingShingle	Kasture, Harshad Sanchez, Daniel Ubik: efficient cache sharing with strict qos for latency-critical workloads
title	Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_full	Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_fullStr	Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_full_unstemmed	Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_short	Ubik: efficient cache sharing with strict qos for latency-critical workloads
title_sort	ubik efficient cache sharing with strict qos for latency critical workloads
url	http://hdl.handle.net/1721.1/90846 https://orcid.org/0000-0002-2453-2904 https://orcid.org/0000-0002-3964-9064
work_keys_str_mv	AT kastureharshad ubikefficientcachesharingwithstrictqosforlatencycriticalworkloads AT sanchezdaniel ubikefficientcachesharingwithstrictqosforlatencycriticalworkloads

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Similar Items