A hardware and software architecture for efficient datacenters
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/109005 |
_version_ | 1826193525479309312 |
---|---|
author | Kasture, Harshad |
author2 | Daniel Sanchez. |
author_facet | Daniel Sanchez. Kasture, Harshad |
author_sort | Kasture, Harshad |
collection | MIT |
description | Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. |
first_indexed | 2024-09-23T09:40:30Z |
format | Thesis |
id | mit-1721.1/109005 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T09:40:30Z |
publishDate | 2017 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1090052019-04-10T13:19:22Z A hardware and software architecture for efficient datacenters Kasture, Harshad Daniel Sanchez. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 121-131). Datacenters host an increasing amount of the world's compute, powering a diverse set of applications that range from scientific computing and business analytics to massive online services such as social media and online maps. Despite their growing importance, however, datacenters suffer from low resource and energy efficiency, using only 10-30% of their compute capacity on average. This overprovisioning adds billions of dollars annually to datacenter equipment costs, and wastes significant energy. This low efficiency stems from two sources. First, latency-critical applications, which form the backbone of user-facing, interactive services, need guaranteed low response times, often a few tens of milliseconds or less. By contrast, current systems are architected to maximize long-term, average performance (e.g., throughput over a period of seconds), and cannot provide the short-term performance guarantees needed by these applications. The stringent performance requirements of latency-critical applications make power management challenging, and make it hard to colocate them with other applications, as interference in shared resources hurts their responsiveness. Second, throughput-oriented batch applications, while easier to colocate, experience performance degradation as multiple colocated applications compete for shared resources on servers. This thesis presents novel hardware and software techniques that improve resource and energy efficiency for both classes of applications. First, Ubik is a dynamic cache partitioning technique that allows latency-critical and batch applications to safely share the last-level cache, maximizing batch throughput while providing latency guarantees for latency-critical applications. Ubik accurately predicts the transients that result when caches are reconfigured, and can thus mitigate latency degradation due to performance inertia, i.e., the loss of performance as an application transitions between steady states. Second, Rubik is a fine-grain voltage and frequency scaling scheme that quickly and accurately adapts to short-term load variations in latency-critical applications to minimize dynamic power consumption without hurting latency. Rubik uses a novel, lightweight statistical model that accurately predicts queued work, and accounts for variations in per-request compute requirements as well as queuing delays. Further, Rubik improves system utilization by allowing latency-critical and batch applications to safely share cores, using frequency scaling to mitigate performance degradation due to interference in per-core resources such as private caches. Third, Shepherd is a cluster scheduler that uses per-node cache-partitioning decisions to drive application placement across machines. Shepherd uses detailed application profiling data to partition the last-level cache on each machine and to predict the performance of colocated applications, and uses randomized search to find a schedule that maximizes throughput. A common theme across these techniques is the use of lightweight, general-purpose architectural support to provide performance isolation and fast state transitions, coupled with intelligent software runtimes that configure the hardware to meet application performance requirements. Unlike prior work, which often relies on heuristics, these techniques use accurate analytical modeling to guide resource allocation, boosting efficiency while satisfying applications' disparate performance goals. Ubik allows latency-critical and batch applications to be safely and efficiently colocated, improving batch throughput by an average of 17% over a static partitioning scheme while guaranteeing tail latency. Rubik further allows these two classes of applications to share cores, reducing datacenter power consumption by up to 31% while using 41% fewer machines over a scheme that segregates these applications. Shepherd improves batch throughput by 39% over a randomly scheduled, unpartitioned baseline, and significantly outperforms scheduling-only and partitioning-only approaches. by Harshad Kasture. Ph. D. 2017-05-11T20:00:13Z 2017-05-11T20:00:13Z 2017 2017 Thesis http://hdl.handle.net/1721.1/109005 986529222 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 131 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Kasture, Harshad A hardware and software architecture for efficient datacenters |
title | A hardware and software architecture for efficient datacenters |
title_full | A hardware and software architecture for efficient datacenters |
title_fullStr | A hardware and software architecture for efficient datacenters |
title_full_unstemmed | A hardware and software architecture for efficient datacenters |
title_short | A hardware and software architecture for efficient datacenters |
title_sort | hardware and software architecture for efficient datacenters |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/109005 |
work_keys_str_mv | AT kastureharshad ahardwareandsoftwarearchitectureforefficientdatacenters AT kastureharshad hardwareandsoftwarearchitectureforefficientdatacenters |