The effect of softmax temperature on recent knowledge distillation algorithms

Knowledge distillation is a technique to transfer the knowledge from a large and complex teacher model to a smaller and faster student model, and an important category among methods of model compression. In this study, I survey various knowledge distillation algorithms that have been proposed in re...

Полное описание

Библиографические подробности
Главный автор:	Poh, Dominique
Другие авторы:	Weichen Liu
Формат:	Final Year Project (FYP)
Язык:	English
Опубликовано:	Nanyang Technological University 2023
Предметы:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online-ссылка:	https://hdl.handle.net/10356/172431

Описание
Итог:	Knowledge distillation is a technique to transfer the knowledge from a large and complex teacher model to a smaller and faster student model, and an important category among methods of model compression. In this study, I survey various knowledge distillation algorithms that have been proposed in recent years and consider their merits in principle, as well as attempt to empirically verify some of the results that have been shown. This study compares their performance on two image classification datasets, CIFAR-10, and CIFAR-100, using ResNet as the teacher and student architectures. I investigate the effect of softmax temperature, a key hyperparameter in knowledge distillation, on the classification accuracy of the student models. The results show that higher temperatures tend to work better for datasets with fewer classes, and vice versa for datasets with more classes. Some algorithms perform better than others depending on the dataset and the temperature.

The effect of softmax temperature on recent knowledge distillation algorithms

Схожие документы