Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems

The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowled...

Full description

Bibliographic Details
Main Authors:	Penghao Xiao, Teng Xu, Xiayang Xiao, Weisong Li, Haipeng Wang
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Remote Sensing
Subjects:	neural networks distillation sparsity training uniformity half-pruning general-purpose hardware acceleration
Online Access:	https://www.mdpi.com/2072-4292/15/10/2609

_version_	1797598499923034112
author	Penghao Xiao Teng Xu Xiayang Xiao Weisong Li Haipeng Wang
author_facet	Penghao Xiao Teng Xu Xiayang Xiao Weisong Li Haipeng Wang
author_sort	Penghao Xiao
collection	DOAJ
description	The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks.
first_indexed	2024-03-11T03:21:58Z
format	Article
id	doaj.art-feb740627037479c86113363c2beb22a
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-11T03:21:58Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-feb740627037479c86113363c2beb22a2023-11-18T03:07:36ZengMDPI AGRemote Sensing2072-42922023-05-011510260910.3390/rs15102609Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded SystemsPenghao Xiao0Teng Xu1Xiayang Xiao2Weisong Li3Haipeng Wang4Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaThe rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks.https://www.mdpi.com/2072-4292/15/10/2609neural networksdistillation sparsity traininguniformity half-pruninggeneral-purpose hardware acceleration
spellingShingle	Penghao Xiao Teng Xu Xiayang Xiao Weisong Li Haipeng Wang Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems Remote Sensing neural networks distillation sparsity training uniformity half-pruning general-purpose hardware acceleration
title	Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_full	Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_fullStr	Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_full_unstemmed	Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_short	Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
title_sort	distillation sparsity training algorithm for accelerating convolutional neural networks in embedded systems
topic	neural networks distillation sparsity training uniformity half-pruning general-purpose hardware acceleration
url	https://www.mdpi.com/2072-4292/15/10/2609
work_keys_str_mv	AT penghaoxiao distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT tengxu distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT xiayangxiao distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT weisongli distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT haipengwang distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems

Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems

Similar Items