HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving
To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective mann...
Main Authors: | Hao Mo, Ligu Zhu, Lei Shi, Songfu Tan, Suping Wang |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/1/240 |
Similar Items
-
Multilayered Autoscaling Performance Evaluation: Can Virtual Machines and Containers Co–Scale?
by: Podolskiy Vladimir, et al.
Published: (2019-06-01) -
Predictive Hybrid Autoscaling for Containerized Applications
by: Dinh-Dai Vu, et al.
Published: (2022-01-01) -
Toward Optimal Load Prediction and Customizable Autoscaling Scheme for Kubernetes
by: Subrota Kumar Mondal, et al.
Published: (2023-06-01) -
Online Workload Burst Detection for Efficient Predictive Autoscaling of Applications
by: Fatima Tahir, et al.
Published: (2020-01-01) -
An Autoscaling System Based on Predicting the Demand for Resources and Responding to Failure in Forecasting
by: Jieun Park, et al.
Published: (2023-11-01)