HAQ: Hardware-Aware Automated Quantization With Mixed Precision
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for...
Κύριοι συγγραφείς: | , , , , |
---|---|
Άλλοι συγγραφείς: | |
Μορφή: | Άρθρο |
Γλώσσα: | English |
Έκδοση: |
Institute of Electrical and Electronics Engineers (IEEE)
2021
|
Διαθέσιμο Online: | https://hdl.handle.net/1721.1/129522 |