DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

To fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for...

Full description

Bibliographic Details
Main Authors:	Jin-Chuan See, Hui-Fuang Ng, Hung-Khoon Tan, Jing-Jing Chang, Wai-Kong Lee, Seong Oun Hwang
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Convolutional neural network quantization Internet of Things deep learning field programmable gate array
Online Access:	https://ieeexplore.ieee.org/document/9663269/

_version_	1818753110648029184
author	Jin-Chuan See Hui-Fuang Ng Hung-Khoon Tan Jing-Jing Chang Wai-Kong Lee Seong Oun Hwang
author_facet	Jin-Chuan See Hui-Fuang Ng Hung-Khoon Tan Jing-Jing Chang Wai-Kong Lee Seong Oun Hwang
author_sort	Jin-Chuan See
collection	DOAJ
description	To fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for IoT applications. Among various proposed quantization schemes, Power-of-two (PoT) quantization enables efficient hardware implementation and small memory consumption for CNN accelerators, but requires retraining of CNN to retain its accuracy. This paper proposes a two-level post-training static quantization technique (DoubleQ) that combines the 8-bit and PoT weight quantization. The CNN weight is first quantized to 8-bit (level one), then further quantized to PoT (level two). This allows multiplication to be carried out using shifters, by expressing the weights in their PoT exponent form. DoubleQ also reduces the memory storage requirement for CNN, as only the exponent of the weights is needed for storage. However, DoubleQ trades the accuracy of the network for reduced memory storage. To recover the accuracy, a selection process (DoubleQExt) was proposed to strategically select some of the less informative layers in the network to be quantized with PoT at the second level. On ResNet-20, the proposed DoubleQ can reduce the memory consumption by 37.50% with 7.28% accuracy degradation compared to 8-bit quantization. By applying DoubleQExt, the accuracy is only degraded by 1.19% compared to 8-bit version while achieving a memory reduction of 23.05%. This result is also 1% more accurate than the state-of-the-art work (SegLog). The proposed DoubleQExt also allows flexible configuration to trade off the memory consumption with better accuracy, which is not found in the other state-of-the-art works. With the proposed two-level weight quantization, one can achieve a more efficient hardware architecture for CNN with minimal impact to the accuracy, which is crucial for IoT applications.
first_indexed	2024-12-18T05:02:09Z
format	Article
id	doaj.art-549c7a9e5ebe4e209f3dbe340081f068
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-18T05:02:09Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-549c7a9e5ebe4e209f3dbe340081f0682022-12-21T21:20:06ZengIEEEIEEE Access2169-35362021-01-01916908216909110.1109/ACCESS.2021.31387569663269DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of QuantizationJin-Chuan See0Hui-Fuang Ng1https://orcid.org/0000-0003-4394-2770Hung-Khoon Tan2https://orcid.org/0000-0001-9964-7186Jing-Jing Chang3https://orcid.org/0000-0003-3447-4856Wai-Kong Lee4https://orcid.org/0000-0003-4659-8979Seong Oun Hwang5https://orcid.org/0000-0003-4240-6255Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya, MalaysiaFaculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya, MalaysiaFaculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya, MalaysiaFaculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya, MalaysiaDepartment of Computer Engineering, Gachon University, Seongnam, South KoreaDepartment of Computer Engineering, Gachon University, Seongnam, South KoreaTo fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for IoT applications. Among various proposed quantization schemes, Power-of-two (PoT) quantization enables efficient hardware implementation and small memory consumption for CNN accelerators, but requires retraining of CNN to retain its accuracy. This paper proposes a two-level post-training static quantization technique (DoubleQ) that combines the 8-bit and PoT weight quantization. The CNN weight is first quantized to 8-bit (level one), then further quantized to PoT (level two). This allows multiplication to be carried out using shifters, by expressing the weights in their PoT exponent form. DoubleQ also reduces the memory storage requirement for CNN, as only the exponent of the weights is needed for storage. However, DoubleQ trades the accuracy of the network for reduced memory storage. To recover the accuracy, a selection process (DoubleQExt) was proposed to strategically select some of the less informative layers in the network to be quantized with PoT at the second level. On ResNet-20, the proposed DoubleQ can reduce the memory consumption by 37.50% with 7.28% accuracy degradation compared to 8-bit quantization. By applying DoubleQExt, the accuracy is only degraded by 1.19% compared to 8-bit version while achieving a memory reduction of 23.05%. This result is also 1% more accurate than the state-of-the-art work (SegLog). The proposed DoubleQExt also allows flexible configuration to trade off the memory consumption with better accuracy, which is not found in the other state-of-the-art works. With the proposed two-level weight quantization, one can achieve a more efficient hardware architecture for CNN with minimal impact to the accuracy, which is crucial for IoT applications.https://ieeexplore.ieee.org/document/9663269/Convolutional neural networkquantizationInternet of Thingsdeep learningfield programmable gate array
spellingShingle	Jin-Chuan See Hui-Fuang Ng Hung-Khoon Tan Jing-Jing Chang Wai-Kong Lee Seong Oun Hwang DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization IEEE Access Convolutional neural network quantization Internet of Things deep learning field programmable gate array
title	DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
title_full	DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
title_fullStr	DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
title_full_unstemmed	DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
title_short	DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
title_sort	doubleqext hardware and memory efficient cnn through two levels of quantization
topic	Convolutional neural network quantization Internet of Things deep learning field programmable gate array
url	https://ieeexplore.ieee.org/document/9663269/
work_keys_str_mv	AT jinchuansee doubleqexthardwareandmemoryefficientcnnthroughtwolevelsofquantization AT huifuangng doubleqexthardwareandmemoryefficientcnnthroughtwolevelsofquantization AT hungkhoontan doubleqexthardwareandmemoryefficientcnnthroughtwolevelsofquantization AT jingjingchang doubleqexthardwareandmemoryefficientcnnthroughtwolevelsofquantization AT waikonglee doubleqexthardwareandmemoryefficientcnnthroughtwolevelsofquantization AT seongounhwang doubleqexthardwareandmemoryefficientcnnthroughtwolevelsofquantization

DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

Similar Items