Hardware-Centric AutoML for Mixed-Precision Quantization
Abstract Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support flexible bitwidth (1–8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal b...
Egile Nagusiak: | , , , , |
---|---|
Beste egile batzuk: | |
Formatua: | Artikulua |
Hizkuntza: | English |
Argitaratua: |
Springer US
2021
|
Sarrera elektronikoa: | https://hdl.handle.net/1721.1/131513 |