HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective mann...

Full description

Bibliographic Details
Main Authors: Hao Mo, Ligu Zhu, Lei Shi, Songfu Tan, Suping Wang
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/1/240