Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation

Compared with traditional knowledge distillation, which relies on a large amount of data, few-shot knowledge distillation can distill student networks with good performance using only a small number of samples. Some recent studies treat the network as a combination of a series of network blocks, ado...

Full description

Bibliographic Details
Main Author:	Weidong Du
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Knowledge distillation few-shot learning model compression features embedding
Online Access:	https://ieeexplore.ieee.org/document/9934906/

_version_	1797984293537972224
author	Weidong Du
author_facet	Weidong Du
author_sort	Weidong Du
collection	DOAJ
description	Compared with traditional knowledge distillation, which relies on a large amount of data, few-shot knowledge distillation can distill student networks with good performance using only a small number of samples. Some recent studies treat the network as a combination of a series of network blocks, adopt a progressive graft strategy, and use the output of the teacher network to distill the student network. However, this strategy ignores the importance of the local feature information generated by the teacher block, which indicates what features should be learned by the corresponding student block. In this paper, we argue that using the features output from the teacher block can guide the student block to further learn more useful information from the teacher block. Therefore, we propose a joint learning framework for few-shot knowledge distillation that exploits both the output of the teacher network and the local features generated by the teacher block to optimize the student network. The local features will guide the student block to learn the output of the teacher block, and the output of the teacher network will allow the student network to take its learned local features to better contribute to the classification. In addition, further model compression was carried out to design a series of student networks with fewer number of parameters by reducing the number of network channels. Finally, extensive experiments using the model on CIFAR10 and CIFAR100 datasets show that our method outperforms SOTA, and our method has considerable advantages even with a very small number of parameters in further model compression experiments.
first_indexed	2024-04-11T06:59:19Z
format	Article
id	doaj.art-c18a4b6d2bc8486682ace312d2bc6e36
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-11T06:59:19Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-c18a4b6d2bc8486682ace312d2bc6e362022-12-22T04:38:53ZengIEEEIEEE Access2169-35362022-01-011011619611620410.1109/ACCESS.2022.32188909934906Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge DistillationWeidong Du0https://orcid.org/0000-0001-5215-9142School of Mechanical Engineering, Southeast University, Nanjing, ChinaCompared with traditional knowledge distillation, which relies on a large amount of data, few-shot knowledge distillation can distill student networks with good performance using only a small number of samples. Some recent studies treat the network as a combination of a series of network blocks, adopt a progressive graft strategy, and use the output of the teacher network to distill the student network. However, this strategy ignores the importance of the local feature information generated by the teacher block, which indicates what features should be learned by the corresponding student block. In this paper, we argue that using the features output from the teacher block can guide the student block to further learn more useful information from the teacher block. Therefore, we propose a joint learning framework for few-shot knowledge distillation that exploits both the output of the teacher network and the local features generated by the teacher block to optimize the student network. The local features will guide the student block to learn the output of the teacher block, and the output of the teacher network will allow the student network to take its learned local features to better contribute to the classification. In addition, further model compression was carried out to design a series of student networks with fewer number of parameters by reducing the number of network channels. Finally, extensive experiments using the model on CIFAR10 and CIFAR100 datasets show that our method outperforms SOTA, and our method has considerable advantages even with a very small number of parameters in further model compression experiments.https://ieeexplore.ieee.org/document/9934906/Knowledge distillationfew-shot learningmodel compressionfeatures embedding
spellingShingle	Weidong Du Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation IEEE Access Knowledge distillation few-shot learning model compression features embedding
title	Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation
title_full	Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation
title_fullStr	Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation
title_full_unstemmed	Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation
title_short	Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation
title_sort	progressive network grafting with local features embedding for few shot knowledge distillation
topic	Knowledge distillation few-shot learning model compression features embedding
url	https://ieeexplore.ieee.org/document/9934906/
work_keys_str_mv	AT weidongdu progressivenetworkgraftingwithlocalfeaturesembeddingforfewshotknowledgedistillation

Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation

Similar Items