Multi-Fidelity Neural Architecture Search With Knowledge Distillation

Neural architecture search (NAS) targets at finding the optimal architecture of a neural network for a problem or a family of problems. Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on...

Full description

Bibliographic Details
Main Authors:	Ilya Trofimov, Nikita Klyuchnikov, Mikhail Salnikov, Alexander Filippov, Evgeny Burnaev
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Bayesian optimization knowledge distillation multi-fidelity optimization neural architecture search
Online Access:	https://ieeexplore.ieee.org/document/10007805/

_version_	1797799273360785408
author	Ilya Trofimov Nikita Klyuchnikov Mikhail Salnikov Alexander Filippov Evgeny Burnaev
author_facet	Ilya Trofimov Nikita Klyuchnikov Mikhail Salnikov Alexander Filippov Evgeny Burnaev
author_sort	Ilya Trofimov
collection	DOAJ
description	Neural architecture search (NAS) targets at finding the optimal architecture of a neural network for a problem or a family of problems. Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on a part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we propose a Bayesian multi-fidelity (MF) method for neural architecture search: MF-KD. The method relies on a new approach to low-fidelity evaluations of neural architectures by training for a few epochs using a knowledge distillation (KD). Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We carry out experiments on CIFAR-10, CIFAR-100, and ImageNet-16-120. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss. The proposed method outperforms several state-of-the-art baselines.
first_indexed	2024-03-13T04:17:27Z
format	Article
id	doaj.art-2aea4af24d8d4f0fa7042fffa9540e20
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-13T04:17:27Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-2aea4af24d8d4f0fa7042fffa9540e202023-06-20T23:00:27ZengIEEEIEEE Access2169-35362023-01-0111592175922510.1109/ACCESS.2023.323481010007805Multi-Fidelity Neural Architecture Search With Knowledge DistillationIlya Trofimov0https://orcid.org/0000-0002-2961-7368Nikita Klyuchnikov1https://orcid.org/0000-0001-5065-4000Mikhail Salnikov2Alexander Filippov3https://orcid.org/0000-0002-9826-2425Evgeny Burnaev4https://orcid.org/0000-0001-8424-0690Skolkovo Institute of Science and Technology, Moscow, RussiaSkolkovo Institute of Science and Technology, Moscow, RussiaSkolkovo Institute of Science and Technology, Moscow, RussiaHuawei, Moscow, RussiaSkolkovo Institute of Science and Technology, Moscow, RussiaNeural architecture search (NAS) targets at finding the optimal architecture of a neural network for a problem or a family of problems. Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on a part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we propose a Bayesian multi-fidelity (MF) method for neural architecture search: MF-KD. The method relies on a new approach to low-fidelity evaluations of neural architectures by training for a few epochs using a knowledge distillation (KD). Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We carry out experiments on CIFAR-10, CIFAR-100, and ImageNet-16-120. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss. The proposed method outperforms several state-of-the-art baselines.https://ieeexplore.ieee.org/document/10007805/Bayesian optimizationknowledge distillationmulti-fidelity optimizationneural architecture search
spellingShingle	Ilya Trofimov Nikita Klyuchnikov Mikhail Salnikov Alexander Filippov Evgeny Burnaev Multi-Fidelity Neural Architecture Search With Knowledge Distillation IEEE Access Bayesian optimization knowledge distillation multi-fidelity optimization neural architecture search
title	Multi-Fidelity Neural Architecture Search With Knowledge Distillation
title_full	Multi-Fidelity Neural Architecture Search With Knowledge Distillation
title_fullStr	Multi-Fidelity Neural Architecture Search With Knowledge Distillation
title_full_unstemmed	Multi-Fidelity Neural Architecture Search With Knowledge Distillation
title_short	Multi-Fidelity Neural Architecture Search With Knowledge Distillation
title_sort	multi fidelity neural architecture search with knowledge distillation
topic	Bayesian optimization knowledge distillation multi-fidelity optimization neural architecture search
url	https://ieeexplore.ieee.org/document/10007805/
work_keys_str_mv	AT ilyatrofimov multifidelityneuralarchitecturesearchwithknowledgedistillation AT nikitaklyuchnikov multifidelityneuralarchitecturesearchwithknowledgedistillation AT mikhailsalnikov multifidelityneuralarchitecturesearchwithknowledgedistillation AT alexanderfilippov multifidelityneuralarchitecturesearchwithknowledgedistillation AT evgenyburnaev multifidelityneuralarchitecturesearchwithknowledgedistillation

Multi-Fidelity Neural Architecture Search With Knowledge Distillation

Similar Items