HAQ: Hardware-Aware Automated Quantization With Mixed Precision

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς:	Wang, Kuan, Liu, Zhijian, Lin, Yujun, Lin, Ji, Han, Song
Άλλοι συγγραφείς:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Μορφή:	Άρθρο
Γλώσσα:	English
Έκδοση:	Institute of Electrical and Electronics Engineers (IEEE) 2021
Διαθέσιμο Online:	https://hdl.handle.net/1721.1/129522

Διαδίκτυο

https://hdl.handle.net/1721.1/129522

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

Διαδίκτυο

Παρόμοια τεκμήρια