Computation-efficient knowledge distillation via uncertainty-aware mixup

Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this w...

Full description

Bibliographic Details
Main Authors: Xu, Guodong, Liu, Ziwei, Loy, Chen Change
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172038
_version_ 1826118092686622720
author Xu, Guodong
Liu, Ziwei
Loy, Chen Change
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Xu, Guodong
Liu, Ziwei
Loy, Chen Change
author_sort Xu, Guodong
collection NTU
description Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.
first_indexed 2024-10-01T04:38:05Z
format Journal Article
id ntu-10356/172038
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:38:05Z
publishDate 2023
record_format dspace
spelling ntu-10356/1720382023-11-20T04:39:10Z Computation-efficient knowledge distillation via uncertainty-aware mixup Xu, Guodong Liu, Ziwei Loy, Chen Change School of Computer Science and Engineering Engineering::Computer science and engineering Knowledge Distillation Training Cost Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD. Nanyang Technological University This study is supported by Collaborative Research Grant from SenseTime Group (CUHK Agreement No. TS1610626 & No. TS1712093) and NTU NAP. 2023-11-20T04:39:10Z 2023-11-20T04:39:10Z 2023 Journal Article Xu, G., Liu, Z. & Loy, C. C. (2023). Computation-efficient knowledge distillation via uncertainty-aware mixup. Pattern Recognition, 138, 109338-. https://dx.doi.org/10.1016/j.patcog.2023.109338 0031-3203 https://hdl.handle.net/10356/172038 10.1016/j.patcog.2023.109338 2-s2.0-85147248505 138 109338 en Pattern Recognition © 2023 Elsevier Ltd. All rights reserved.
spellingShingle Engineering::Computer science and engineering
Knowledge Distillation
Training Cost
Xu, Guodong
Liu, Ziwei
Loy, Chen Change
Computation-efficient knowledge distillation via uncertainty-aware mixup
title Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full Computation-efficient knowledge distillation via uncertainty-aware mixup
title_fullStr Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full_unstemmed Computation-efficient knowledge distillation via uncertainty-aware mixup
title_short Computation-efficient knowledge distillation via uncertainty-aware mixup
title_sort computation efficient knowledge distillation via uncertainty aware mixup
topic Engineering::Computer science and engineering
Knowledge Distillation
Training Cost
url https://hdl.handle.net/10356/172038
work_keys_str_mv AT xuguodong computationefficientknowledgedistillationviauncertaintyawaremixup
AT liuziwei computationefficientknowledgedistillationviauncertaintyawaremixup
AT loychenchange computationefficientknowledgedistillationviauncertaintyawaremixup