Effective Online Knowledge Distillation via Attention-Based Model Ensembling

Large-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, whic...

Full description

Bibliographic Details
Main Authors: Diana-Laura Borza, Adrian Sergiu Darabant, Tudor Alexandru Ileni, Alexandru-Ion Marinescu
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/22/4285
_version_ 1797464693475901440
author Diana-Laura Borza
Adrian Sergiu Darabant
Tudor Alexandru Ileni
Alexandru-Ion Marinescu
author_facet Diana-Laura Borza
Adrian Sergiu Darabant
Tudor Alexandru Ileni
Alexandru-Ion Marinescu
author_sort Diana-Laura Borza
collection DOAJ
description Large-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, which can boost the performance of a lightweight student network by transferring the knowledge from a more complex model or an ensemble of models. Due to its reduced size, this lightweight model is more suitable for deployment on edge devices. In this paper, we introduce an online knowledge distillation framework, which relies on an original attention mechanism to effectively combine the predictions of a cohort of lightweight (student) networks into a powerful ensemble, and use this as a distillation signal. The proposed aggregation strategy uses the predictions of the individual students as well as ground truth data to determine a set of weights needed for ensembling these predictions. This mechanism is solely used during system training. When testing or at inference time, a single, lightweight student is extracted and used. The extensive experiments we performed on several image classification benchmarks, both by training models from scratch (on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets) and using transfer learning (on Oxford Pets and Oxford Flowers datasets), showed that the proposed framework always leads to an improvement in the accuracy of knowledge-distilled students and demonstrates the effectiveness of the proposed solution. Moreover, in the case of ResNet architecture, we observed that the knowledge-distilled model achieves a higher accuracy than a deeper, individually trained ResNet model.
first_indexed 2024-03-09T18:10:48Z
format Article
id doaj.art-5a308cbcac93469790df7e2d5f125d17
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T18:10:48Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-5a308cbcac93469790df7e2d5f125d172023-11-24T09:09:03ZengMDPI AGMathematics2227-73902022-11-011022428510.3390/math10224285Effective Online Knowledge Distillation via Attention-Based Model EnsemblingDiana-Laura Borza0Adrian Sergiu Darabant1Tudor Alexandru Ileni2Alexandru-Ion Marinescu3Computer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaComputer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaComputer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaComputer Science Department, Babes Bolyai University, 400084 Cluj-Napoca, RomaniaLarge-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, which can boost the performance of a lightweight student network by transferring the knowledge from a more complex model or an ensemble of models. Due to its reduced size, this lightweight model is more suitable for deployment on edge devices. In this paper, we introduce an online knowledge distillation framework, which relies on an original attention mechanism to effectively combine the predictions of a cohort of lightweight (student) networks into a powerful ensemble, and use this as a distillation signal. The proposed aggregation strategy uses the predictions of the individual students as well as ground truth data to determine a set of weights needed for ensembling these predictions. This mechanism is solely used during system training. When testing or at inference time, a single, lightweight student is extracted and used. The extensive experiments we performed on several image classification benchmarks, both by training models from scratch (on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets) and using transfer learning (on Oxford Pets and Oxford Flowers datasets), showed that the proposed framework always leads to an improvement in the accuracy of knowledge-distilled students and demonstrates the effectiveness of the proposed solution. Moreover, in the case of ResNet architecture, we observed that the knowledge-distilled model achieves a higher accuracy than a deeper, individually trained ResNet model.https://www.mdpi.com/2227-7390/10/22/4285online knowledge distillationensemble learningattention aggregationdeep learning
spellingShingle Diana-Laura Borza
Adrian Sergiu Darabant
Tudor Alexandru Ileni
Alexandru-Ion Marinescu
Effective Online Knowledge Distillation via Attention-Based Model Ensembling
Mathematics
online knowledge distillation
ensemble learning
attention aggregation
deep learning
title Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_full Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_fullStr Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_full_unstemmed Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_short Effective Online Knowledge Distillation via Attention-Based Model Ensembling
title_sort effective online knowledge distillation via attention based model ensembling
topic online knowledge distillation
ensemble learning
attention aggregation
deep learning
url https://www.mdpi.com/2227-7390/10/22/4285
work_keys_str_mv AT dianalauraborza effectiveonlineknowledgedistillationviaattentionbasedmodelensembling
AT adriansergiudarabant effectiveonlineknowledgedistillationviaattentionbasedmodelensembling
AT tudoralexandruileni effectiveonlineknowledgedistillationviaattentionbasedmodelensembling
AT alexandruionmarinescu effectiveonlineknowledgedistillationviaattentionbasedmodelensembling