Computation-efficient knowledge distillation via uncertainty-aware mixup

Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this w...

Full description

Bibliographic Details
Main Authors:	Xu, Guodong, Liu, Ziwei, Loy, Chen Change
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Knowledge Distillation Training Cost
Online Access:	https://hdl.handle.net/10356/172038

_version_	1826118092686622720
author	Xu, Guodong Liu, Ziwei Loy, Chen Change
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Xu, Guodong Liu, Ziwei Loy, Chen Change
author_sort	Xu, Guodong
collection	NTU
description	Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.
first_indexed	2024-10-01T04:38:05Z
format	Journal Article
id	ntu-10356/172038
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T04:38:05Z
publishDate	2023
record_format	dspace
spelling	ntu-10356/1720382023-11-20T04:39:10Z Computation-efficient knowledge distillation via uncertainty-aware mixup Xu, Guodong Liu, Ziwei Loy, Chen Change School of Computer Science and Engineering Engineering::Computer science and engineering Knowledge Distillation Training Cost Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD. Nanyang Technological University This study is supported by Collaborative Research Grant from SenseTime Group (CUHK Agreement No. TS1610626 & No. TS1712093) and NTU NAP. 2023-11-20T04:39:10Z 2023-11-20T04:39:10Z 2023 Journal Article Xu, G., Liu, Z. & Loy, C. C. (2023). Computation-efficient knowledge distillation via uncertainty-aware mixup. Pattern Recognition, 138, 109338-. https://dx.doi.org/10.1016/j.patcog.2023.109338 0031-3203 https://hdl.handle.net/10356/172038 10.1016/j.patcog.2023.109338 2-s2.0-85147248505 138 109338 en Pattern Recognition © 2023 Elsevier Ltd. All rights reserved.
spellingShingle	Engineering::Computer science and engineering Knowledge Distillation Training Cost Xu, Guodong Liu, Ziwei Loy, Chen Change Computation-efficient knowledge distillation via uncertainty-aware mixup
title	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_fullStr	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full_unstemmed	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_short	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_sort	computation efficient knowledge distillation via uncertainty aware mixup
topic	Engineering::Computer science and engineering Knowledge Distillation Training Cost
url	https://hdl.handle.net/10356/172038
work_keys_str_mv	AT xuguodong computationefficientknowledgedistillationviauncertaintyawaremixup AT liuziwei computationefficientknowledgedistillationviauncertaintyawaremixup AT loychenchange computationefficientknowledgedistillationviauncertaintyawaremixup

Computation-efficient knowledge distillation via uncertainty-aware mixup

Similar Items