Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems

The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowled...

Full description

Bibliographic Details
Main Authors: Penghao Xiao, Teng Xu, Xiayang Xiao, Weisong Li, Haipeng Wang
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/10/2609
_version_ 1797598499923034112
author Penghao Xiao
Teng Xu
Xiayang Xiao
Weisong Li
Haipeng Wang
author_facet Penghao Xiao
Teng Xu
Xiayang Xiao
Weisong Li
Haipeng Wang
author_sort Penghao Xiao
collection DOAJ
description The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks.
first_indexed 2024-03-11T03:21:58Z
format Article
id doaj.art-feb740627037479c86113363c2beb22a
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T03:21:58Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-feb740627037479c86113363c2beb22a2023-11-18T03:07:36ZengMDPI AGRemote Sensing2072-42922023-05-011510260910.3390/rs15102609Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded SystemsPenghao Xiao0Teng Xu1Xiayang Xiao2Weisong Li3Haipeng Wang4Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaThe rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks.https://www.mdpi.com/2072-4292/15/10/2609neural networksdistillation sparsity traininguniformity half-pruninggeneral-purpose hardware acceleration
spellingShingle Penghao Xiao
Teng Xu
Xiayang Xiao
Weisong Li
Haipeng Wang
Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
Remote Sensing
neural networks
distillation sparsity training
uniformity half-pruning
general-purpose hardware acceleration
title Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_full Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_fullStr Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_full_unstemmed Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_short Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_sort distillation sparsity training algorithm for accelerating convolutional neural networks in embedded systems
topic neural networks
distillation sparsity training
uniformity half-pruning
general-purpose hardware acceleration
url https://www.mdpi.com/2072-4292/15/10/2609
work_keys_str_mv AT penghaoxiao distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems
AT tengxu distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems
AT xiayangxiao distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems
AT weisongli distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems
AT haipengwang distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems