Hardware-Centric AutoML for Mixed-Precision Quantization

Abstract Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support flexible bitwidth (1–8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal b...

Deskribapen osoa

Xehetasun bibliografikoak
Egile Nagusiak: Wang, Kuan, Liu, Zhijian, Lin, Yujun, Lin, Ji, Han, Song
Beste egile batzuk: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Formatua: Artikulua
Hizkuntza:English
Argitaratua: Springer US 2021
Sarrera elektronikoa:https://hdl.handle.net/1721.1/131513