𝜎OS: Elastic Realms for Multi-Tenant Cloud Computing

Despite the enormous success of cloud computing, programming and deploying cloud applications remains challenging. Application developers are forced to either explicitly provision resources or limit the types of applications they write to fit a serverless framework such as AWS Lambda. 𝜎OS is a ne...

Full description

Bibliographic Details
Main Author: Szekely, Ariel
Other Authors: Kaashoek, M. Frans
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/147373
Description
Summary:Despite the enormous success of cloud computing, programming and deploying cloud applications remains challenging. Application developers are forced to either explicitly provision resources or limit the types of applications they write to fit a serverless framework such as AWS Lambda. 𝜎OS is a new multi-tenant cloud operating system that allows providers to manage resources for tenants while simplifying application development. A key contribution of 𝜎OS is its novel abstraction: realms. Realms present tenants with the illusion of a single-system image and abstract boundaries between physical machines. Developers structure their applications as processes, called procs in 𝜎OS. Much like a time-sharing OS multiplexes users’ processes across a machine’s cores, 𝜎OS multiplexes tenants’ procs across the cloud provider’s physical machines. Since each tenant tends to plan for peak load, realms can improve data center utilization by enabling providers to transparently reallocate partial machines to another tenant’s realm when load dips. An evaluation of 𝜎OS demonstrates that a 𝜎OS-based MapReduce (𝜎OS-MR) implementation grows quickly from 1 core to 32 and scales near-perfectly achieving 15.26× speedup over the same implementation running on 2 cores. Similarly, an elastic Key-Value service built on 𝜎OS (𝜎OS-KV) cooperates with 𝜎OS to scale the number of kvd servers and balance shards across them, according to client load. 𝜎OS also achieves high resource utilization when multiple tenants’ realms compete for a shared group of machines. For example, when 𝜎OS multiplexes a long-running 𝜎OS-MR job in one realm and a 𝜎OS-KV service with varying numbers of clients in another realm, 𝜎OS keeps utilization above 90% and transparently moves partial machines between the realms as the 𝜎OS-KV client load changes.