Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization

Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find...

Full description

Bibliographic Details
Main Authors:	Xinyu Zhang, Ian Colbert, Srinjoy Das
Format:	Article
Language:	English
Published:	MDPI AG 2022-08-01
Series:	Applied Sciences
Subjects:	channel pruning layerwise pruning quantization joint pruning
Online Access:	https://www.mdpi.com/2076-3417/12/15/7829

_version_	1797442774879961088
author	Xinyu Zhang Ian Colbert Srinjoy Das
author_facet	Xinyu Zhang Ian Colbert Srinjoy Das
author_sort	Xinyu Zhang
collection	DOAJ
description	Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks.
first_indexed	2024-03-09T12:46:58Z
format	Article
id	doaj.art-7749a2fc55bd4415a03d5ee7d2d8e281
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T12:46:58Z
publishDate	2022-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-7749a2fc55bd4415a03d5ee7d2d8e2812023-11-30T22:11:37ZengMDPI AGApplied Sciences2076-34172022-08-011215782910.3390/app12157829Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform QuantizationXinyu Zhang0Ian Colbert1Srinjoy Das2Department of Computer Science, Rutgers University, New Brunswick, NJ 08901, USADepartment of Electrical and Computer Engineering, University of California, San Diego, CA 92093, USASchool of Mathematical and Data Sciences, West Virginia University, Morgantown, WV 26506, USAPruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks.https://www.mdpi.com/2076-3417/12/15/7829channel pruninglayerwise pruningquantizationjoint pruningquantization
spellingShingle	Xinyu Zhang Ian Colbert Srinjoy Das Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization Applied Sciences channel pruning layerwise pruning quantization joint pruning quantization
title	Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_full	Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_fullStr	Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_full_unstemmed	Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_short	Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization
title_sort	learning low precision structured subnetworks using joint layerwise channel pruning and uniform quantization
topic	channel pruning layerwise pruning quantization joint pruning quantization
url	https://www.mdpi.com/2076-3417/12/15/7829
work_keys_str_mv	AT xinyuzhang learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization AT iancolbert learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization AT srinjoydas learninglowprecisionstructuredsubnetworksusingjointlayerwisechannelpruninganduniformquantization

Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization

Similar Items