Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/15/7829 |
_version_ | 1797442774879961088 |
---|---|
author | Xinyu Zhang Ian Colbert Srinjoy Das |
author_facet | Xinyu Zhang Ian Colbert Srinjoy Das |
author_sort | Xinyu Zhang |
collection | DOAJ |
description | Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks. |
first_indexed | 2024-03-09T12:46:58Z |
format | Article |
id | doaj.art-7749a2fc55bd4415a03d5ee7d2d8e281 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T12:46:58Z |
publishDate | 2022-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-7749a2fc55bd4415a03d5ee7d2d8e2812023-11-30T22:11:37ZengMDPI AGApplied Sciences2076-34172022-08-011215782910.3390/app12157829Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform QuantizationXinyu Zhang0Ian Colbert1Srinjoy Das2Department of Computer Science, Rutgers University, New Brunswick, NJ 08901, USADepartment of Electrical and Computer Engineering, University of California, San Diego, CA 92093, USASchool of Mathematical and Data Sciences, West Virginia University, Morgantown, WV 26506, USAPruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks.https://www.mdpi.com/2076-3417/12/15/7829channel pruninglayerwise pruningquantizationjoint pruningquantization |
spellingShingle | Xinyu Zhang Ian Colbert Srinjoy Das Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization Applied Sciences channel pruning layerwise pruning quantization joint pruning quantization |
title | Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization |
title_full | Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization |
title_fullStr | Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization |
title_full_unstemmed | Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization |
title_short | Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization |
title_sort | learning low precision structured subnetworks using joint layerwise channel pruning and uniform quantization |
topic | channel pruning layerwise pruning quantization joint pruning quantization |
url | https://www.mdpi.com/2076-3417/12/15/7829 |
work_keys_str_mv | AT xinyuzhang learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization AT iancolbert learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization AT srinjoydas learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization |