HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective mann...

Full description

Bibliographic Details
Main Authors:	Hao Mo, Ligu Zhu, Lei Shi, Songfu Tan, Suping Wang
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Electronics
Subjects:	inference serving autoscaling cost effectiveness multi-tenant inference
Online Access:	https://www.mdpi.com/2079-9292/12/1/240

Internet

https://www.mdpi.com/2079-9292/12/1/240

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Internet

Similar Items