Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems
The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowled...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/15/10/2609 |
_version_ | 1797598499923034112 |
---|---|
author | Penghao Xiao Teng Xu Xiayang Xiao Weisong Li Haipeng Wang |
author_facet | Penghao Xiao Teng Xu Xiayang Xiao Weisong Li Haipeng Wang |
author_sort | Penghao Xiao |
collection | DOAJ |
description | The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks. |
first_indexed | 2024-03-11T03:21:58Z |
format | Article |
id | doaj.art-feb740627037479c86113363c2beb22a |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-11T03:21:58Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-feb740627037479c86113363c2beb22a2023-11-18T03:07:36ZengMDPI AGRemote Sensing2072-42922023-05-011510260910.3390/rs15102609Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded SystemsPenghao Xiao0Teng Xu1Xiayang Xiao2Weisong Li3Haipeng Wang4Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaKey Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, ChinaThe rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks.https://www.mdpi.com/2072-4292/15/10/2609neural networksdistillation sparsity traininguniformity half-pruninggeneral-purpose hardware acceleration |
spellingShingle | Penghao Xiao Teng Xu Xiayang Xiao Weisong Li Haipeng Wang Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems Remote Sensing neural networks distillation sparsity training uniformity half-pruning general-purpose hardware acceleration |
title | Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems |
title_full | Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems |
title_fullStr | Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems |
title_full_unstemmed | Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems |
title_short | Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems |
title_sort | distillation sparsity training algorithm for accelerating convolutional neural networks in embedded systems |
topic | neural networks distillation sparsity training uniformity half-pruning general-purpose hardware acceleration |
url | https://www.mdpi.com/2072-4292/15/10/2609 |
work_keys_str_mv | AT penghaoxiao distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT tengxu distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT xiayangxiao distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT weisongli distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems AT haipengwang distillationsparsitytrainingalgorithmforacceleratingconvolutionalneuralnetworksinembeddedsystems |