PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture

Pruning and quantization are two commonly used approaches to accelerate the LSTM (Long Short-Term Memory) model. However, the traditional linear quantization usually suffers from the problem of gradient vanishing, and the existing pruning methods all have the problem of producing undesired irregular...

Full description

Bibliographic Details
Main Authors:	Yong Zheng, Haigang Yang, Yiping Jia, Zhihong Huang
Format:	Article
Language:	English
Published:	MDPI AG 2021-04-01
Series:	Electronics
Subjects:	LSTM pruning quantization sparse matrix–vector multiplication
Online Access:	https://www.mdpi.com/2079-9292/10/8/882

_version_	1797538466296233984
author	Yong Zheng Haigang Yang Yiping Jia Zhihong Huang
author_facet	Yong Zheng Haigang Yang Yiping Jia Zhihong Huang
author_sort	Yong Zheng
collection	DOAJ
description	Pruning and quantization are two commonly used approaches to accelerate the LSTM (Long Short-Term Memory) model. However, the traditional linear quantization usually suffers from the problem of gradient vanishing, and the existing pruning methods all have the problem of producing undesired irregular sparsity or large indexing overhead. To alleviate the problem of vanishing gradient, this work proposed a normalized linear quantization approach, which first normalize operands regionally and then quantize them in a local mix-max range. To overcome the problem of irregular sparsity and large indexing overhead, this work adopts the permuted block diagonal mask matrices to generate the sparse model. Due to the sparse model being highly regular, the position of non-zero weights can be obtained by a simple calculation, thus avoiding the large indexing overhead. Based on the sparse LSTM model generated from the permuted block diagonal mask matrices, this paper also proposed a high energy-efficiency accelerator, PermLSTM that comprehensively exploits the sparsity of weights, activations, and products regarding the matrix–vector multiplications, resulting in a 55.1% reduction in power consumption. The accelerator has been realized on Arria-10 FPGAs running at 150 MHz and achieved <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>2.19</mn><mo>×</mo></mrow></semantics></math></inline-formula>∼<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>24.4</mn><mo>×</mo></mrow></semantics></math></inline-formula> energy efficiency compared with the other FPGA-based LSTM accelerators previously reported.
first_indexed	2024-03-10T12:30:49Z
format	Article
id	doaj.art-4ee34c6481e7414697fc57ae1c605c35
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T12:30:49Z
publishDate	2021-04-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-4ee34c6481e7414697fc57ae1c605c352023-11-21T14:38:15ZengMDPI AGElectronics2079-92922021-04-0110888210.3390/electronics10080882PermLSTM: A High Energy-Efficiency LSTM Accelerator ArchitectureYong Zheng0Haigang Yang1Yiping Jia2Zhihong Huang3Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaPruning and quantization are two commonly used approaches to accelerate the LSTM (Long Short-Term Memory) model. However, the traditional linear quantization usually suffers from the problem of gradient vanishing, and the existing pruning methods all have the problem of producing undesired irregular sparsity or large indexing overhead. To alleviate the problem of vanishing gradient, this work proposed a normalized linear quantization approach, which first normalize operands regionally and then quantize them in a local mix-max range. To overcome the problem of irregular sparsity and large indexing overhead, this work adopts the permuted block diagonal mask matrices to generate the sparse model. Due to the sparse model being highly regular, the position of non-zero weights can be obtained by a simple calculation, thus avoiding the large indexing overhead. Based on the sparse LSTM model generated from the permuted block diagonal mask matrices, this paper also proposed a high energy-efficiency accelerator, PermLSTM that comprehensively exploits the sparsity of weights, activations, and products regarding the matrix–vector multiplications, resulting in a 55.1% reduction in power consumption. The accelerator has been realized on Arria-10 FPGAs running at 150 MHz and achieved <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>2.19</mn><mo>×</mo></mrow></semantics></math></inline-formula>∼<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>24.4</mn><mo>×</mo></mrow></semantics></math></inline-formula> energy efficiency compared with the other FPGA-based LSTM accelerators previously reported.https://www.mdpi.com/2079-9292/10/8/882LSTMpruningquantizationsparse matrix–vector multiplication
spellingShingle	Yong Zheng Haigang Yang Yiping Jia Zhihong Huang PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture Electronics LSTM pruning quantization sparse matrix–vector multiplication
title	PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture
title_full	PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture
title_fullStr	PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture
title_full_unstemmed	PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture
title_short	PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture
title_sort	permlstm a high energy efficiency lstm accelerator architecture
topic	LSTM pruning quantization sparse matrix–vector multiplication
url	https://www.mdpi.com/2079-9292/10/8/882
work_keys_str_mv	AT yongzheng permlstmahighenergyefficiencylstmacceleratorarchitecture AT haigangyang permlstmahighenergyefficiencylstmacceleratorarchitecture AT yipingjia permlstmahighenergyefficiencylstmacceleratorarchitecture AT zhihonghuang permlstmahighenergyefficiencylstmacceleratorarchitecture

PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture

Similar Items