Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization

Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find...

Full description

Bibliographic Details
Main Authors: Xinyu Zhang, Ian Colbert, Srinjoy Das
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/15/7829
_version_ 1797442774879961088
author Xinyu Zhang
Ian Colbert
Srinjoy Das
author_facet Xinyu Zhang
Ian Colbert
Srinjoy Das
author_sort Xinyu Zhang
collection DOAJ
description Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks.
first_indexed 2024-03-09T12:46:58Z
format Article
id doaj.art-7749a2fc55bd4415a03d5ee7d2d8e281
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T12:46:58Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-7749a2fc55bd4415a03d5ee7d2d8e2812023-11-30T22:11:37ZengMDPI AGApplied Sciences2076-34172022-08-011215782910.3390/app12157829Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform QuantizationXinyu Zhang0Ian Colbert1Srinjoy Das2Department of Computer Science, Rutgers University, New Brunswick, NJ 08901, USADepartment of Electrical and Computer Engineering, University of California, San Diego, CA 92093, USASchool of Mathematical and Data Sciences, West Virginia University, Morgantown, WV 26506, USAPruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks.https://www.mdpi.com/2076-3417/12/15/7829channel pruninglayerwise pruningquantizationjoint pruningquantization
spellingShingle Xinyu Zhang
Ian Colbert
Srinjoy Das
Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
Applied Sciences
channel pruning
layerwise pruning
quantization
joint pruning
quantization
title Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_full Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_fullStr Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_full_unstemmed Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_short Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_sort learning low precision structured subnetworks using joint layerwise channel pruning and uniform quantization
topic channel pruning
layerwise pruning
quantization
joint pruning
quantization
url https://www.mdpi.com/2076-3417/12/15/7829
work_keys_str_mv AT xinyuzhang learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization
AT iancolbert learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization
AT srinjoydas learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization