Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads

Datacenter applications demand microsecond-scale tail latencies and high request rates from operating systems, and most applications handle loads that have high variance over multiple timescales. Achieving these goals in a CPU-efficient way is an open problem. Because of the high overheads of today&...

Full description

Bibliographic Details
Main Authors:	Ousterhout, Amy Elizabeth, Fried, Joshua, Behrens, Jonathan (Jonathan Kyle), Belay, Adam M, Balakrishnan, Hari
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	English
Published:	Association for Computing Machinery (ACM)/ USENIX Association 2021
Online Access:	https://hdl.handle.net/1721.1/131018

_version_	1826190116099457024
author	Ousterhout, Amy Elizabeth Fried, Joshua Behrens, Jonathan (Jonathan Kyle) Belay, Adam M Balakrishnan, Hari
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Ousterhout, Amy Elizabeth Fried, Joshua Behrens, Jonathan (Jonathan Kyle) Belay, Adam M Balakrishnan, Hari
author_sort	Ousterhout, Amy Elizabeth
collection	MIT
description	Datacenter applications demand microsecond-scale tail latencies and high request rates from operating systems, and most applications handle loads that have high variance over multiple timescales. Achieving these goals in a CPU-efficient way is an open problem. Because of the high overheads of today's kernels, the best available solution to achieve microsecond-scale latencies is kernel-bypass networking, which dedicates CPU cores to applications for spin-polling the network card. But this approach wastes CPU: even at modest average loads, one must dedicate enough cores for the peak expected load. Shenango achieves comparable latencies but at far greater CPU efficiency. It reallocates cores across applications at very fine granularity-every 5 µs-enabling cycles unused by latency-sensitive applications to be used productively by batch processing applications. It achieves such fast reallocation rates with (1) an efficient algorithm that detects when applications would benefit from more cores, and (2) a privileged component called the IOKernel that runs on a dedicated core, steering packets from the NIC and orchestrating core reallocations. When handling latency-sensitive applications, such as memcached, we found that Shenango achieves tail latency and throughput comparable to ZygOS, a state-of-the-art, kernel-bypass network stack, but can linearly trade latency-sensitive application throughput for batch processing application throughput, vastly increasing CPU efficiency.
first_indexed	2024-09-23T08:35:17Z
format	Article
id	mit-1721.1/131018
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T08:35:17Z
publishDate	2021
publisher	Association for Computing Machinery (ACM)/ USENIX Association
record_format	dspace
spelling	mit-1721.1/1310182022-09-23T13:07:04Z Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads Ousterhout, Amy Elizabeth Fried, Joshua Behrens, Jonathan (Jonathan Kyle) Belay, Adam M Balakrishnan, Hari Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Datacenter applications demand microsecond-scale tail latencies and high request rates from operating systems, and most applications handle loads that have high variance over multiple timescales. Achieving these goals in a CPU-efficient way is an open problem. Because of the high overheads of today's kernels, the best available solution to achieve microsecond-scale latencies is kernel-bypass networking, which dedicates CPU cores to applications for spin-polling the network card. But this approach wastes CPU: even at modest average loads, one must dedicate enough cores for the peak expected load. Shenango achieves comparable latencies but at far greater CPU efficiency. It reallocates cores across applications at very fine granularity-every 5 µs-enabling cycles unused by latency-sensitive applications to be used productively by batch processing applications. It achieves such fast reallocation rates with (1) an efficient algorithm that detects when applications would benefit from more cores, and (2) a privileged component called the IOKernel that runs on a dedicated core, steering packets from the NIC and orchestrating core reallocations. When handling latency-sensitive applications, such as memcached, we found that Shenango achieves tail latency and throughput comparable to ZygOS, a state-of-the-art, kernel-bypass network stack, but can linearly trade latency-sensitive application throughput for batch processing application throughput, vastly increasing CPU efficiency. NSF (Grants CNS-1407470, CNS-1526791, CNS-1563826) 2021-06-17T19:10:35Z 2021-06-17T19:10:35Z 2019-02 2021-06-17T17:02:26Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/131018 Ousterhout, Amy et al. "Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads." Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, February 2019, Boston, MA, Association for Computing Machinery / USENIX Association, February 2019. © 2019 The USENIX Association en https://www.usenix.org/system/files/nsdi19-ousterhout.pdf Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM)/ USENIX Association Prof. Belay via Phoebe Ayers
spellingShingle	Ousterhout, Amy Elizabeth Fried, Joshua Behrens, Jonathan (Jonathan Kyle) Belay, Adam M Balakrishnan, Hari Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads
title	Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads
title_full	Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads
title_fullStr	Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads
title_full_unstemmed	Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads
title_short	Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads
title_sort	shenango achieving high cpu efficiency for latency sensitive datacenter workloads
url	https://hdl.handle.net/1721.1/131018
work_keys_str_mv	AT ousterhoutamyelizabeth shenangoachievinghighcpuefficiencyforlatencysensitivedatacenterworkloads AT friedjoshua shenangoachievinghighcpuefficiencyforlatencysensitivedatacenterworkloads AT behrensjonathanjonathankyle shenangoachievinghighcpuefficiencyforlatencysensitivedatacenterworkloads AT belayadamm shenangoachievinghighcpuefficiencyforlatencysensitivedatacenterworkloads AT balakrishnanhari shenangoachievinghighcpuefficiencyforlatencysensitivedatacenterworkloads

Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads

Similar Items