Effective Online Knowledge Distillation via Attention-Based Model Ensembling

Large-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, whic...

Full description

Bibliographic Details
Main Authors:	Diana-Laura Borza, Adrian Sergiu Darabant, Tudor Alexandru Ileni, Alexandru-Ion Marinescu
Format:	Article
Language:	English
Published:	MDPI AG 2022-11-01
Series:	Mathematics
Subjects:	online knowledge distillation ensemble learning attention aggregation deep learning
Online Access:	https://www.mdpi.com/2227-7390/10/22/4285

_version_	1797464693475901440
author	Diana-Laura Borza Adrian Sergiu Darabant Tudor Alexandru Ileni Alexandru-Ion Marinescu
author_facet	Diana-Laura Borza Adrian Sergiu Darabant Tudor Alexandru Ileni Alexandru-Ion Marinescu
author_sort	Diana-Laura Borza
collection	DOAJ
description	Large-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, which can boost the performance of a lightweight student network by transferring the knowledge from a more complex model or an ensemble of models. Due to its reduced size, this lightweight model is more suitable for deployment on edge devices. In this paper, we introduce an online knowledge distillation framework, which relies on an original attention mechanism to effectively combine the predictions of a cohort of lightweight (student) networks into a powerful ensemble, and use this as a distillation signal. The proposed aggregation strategy uses the predictions of the individual students as well as ground truth data to determine a set of weights needed for ensembling these predictions. This mechanism is solely used during system training. When testing or at inference time, a single, lightweight student is extracted and used. The extensive experiments we performed on several image classification benchmarks, both by training models from scratch (on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets) and using transfer learning (on Oxford Pets and Oxford Flowers datasets), showed that the proposed framework always leads to an improvement in the accuracy of knowledge-distilled students and demonstrates the effectiveness of the proposed solution. Moreover, in the case of ResNet architecture, we observed that the knowledge-distilled model achieves a higher accuracy than a deeper, individually trained ResNet model.
first_indexed	2024-03-09T18:10:48Z
format	Article
id	doaj.art-5a308cbcac93469790df7e2d5f125d17
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-09T18:10:48Z
publishDate	2022-11-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-5a308cbcac93469790df7e2d5f125d172023-11-24T09:09:03ZengMDPI AGMathematics2227-73902022-11-011022428510.3390/math10224285Effective Online Knowledge Distillation via Attention-Based Model EnsemblingDiana-Laura Borza0Adrian Sergiu Darabant1Tudor Alexandru Ileni2Alexandru-Ion Marinescu3Computer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaComputer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaComputer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaComputer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaLarge-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, which can boost the performance of a lightweight student network by transferring the knowledge from a more complex model or an ensemble of models. Due to its reduced size, this lightweight model is more suitable for deployment on edge devices. In this paper, we introduce an online knowledge distillation framework, which relies on an original attention mechanism to effectively combine the predictions of a cohort of lightweight (student) networks into a powerful ensemble, and use this as a distillation signal. The proposed aggregation strategy uses the predictions of the individual students as well as ground truth data to determine a set of weights needed for ensembling these predictions. This mechanism is solely used during system training. When testing or at inference time, a single, lightweight student is extracted and used. The extensive experiments we performed on several image classification benchmarks, both by training models from scratch (on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets) and using transfer learning (on Oxford Pets and Oxford Flowers datasets), showed that the proposed framework always leads to an improvement in the accuracy of knowledge-distilled students and demonstrates the effectiveness of the proposed solution. Moreover, in the case of ResNet architecture, we observed that the knowledge-distilled model achieves a higher accuracy than a deeper, individually trained ResNet model.https://www.mdpi.com/2227-7390/10/22/4285online knowledge distillationensemble learningattention aggregationdeep learning
spellingShingle	Diana-Laura Borza Adrian Sergiu Darabant Tudor Alexandru Ileni Alexandru-Ion Marinescu Effective Online Knowledge Distillation via Attention-Based Model Ensembling Mathematics online knowledge distillation ensemble learning attention aggregation deep learning
title	Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_full	Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_fullStr	Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_full_unstemmed	Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_short	Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_sort	effective online knowledge distillation via attention based model ensembling
topic	online knowledge distillation ensemble learning attention aggregation deep learning
url	https://www.mdpi.com/2227-7390/10/22/4285
work_keys_str_mv	AT dianalauraborza effectiveonlineknowledgedistillationviaattentionbasedmodelensembling AT adriansergiudarabant effectiveonlineknowledgedistillationviaattentionbasedmodelensembling AT tudoralexandruileni effectiveonlineknowledgedistillationviaattentionbasedmodelensembling AT alexandruionmarinescu effectiveonlineknowledgedistillationviaattentionbasedmodelensembling

Effective Online Knowledge Distillation via Attention-Based Model Ensembling

Similar Items