Hardware-Centric AutoML for Mixed-Precision Quantization

Abstract Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support flexible bitwidth (1–8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal b...

Full description

Bibliographic Details
Main Authors: Wang, Kuan, Liu, Zhijian, Lin, Yujun, Lin, Ji, Han, Song
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Springer US 2021
Online Access:https://hdl.handle.net/1721.1/131513

Similar Items