ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks

Various pruning skills and network compression methods make modern neural networks sparse for both weights and activations. However, GPUs (graphics processing units) and most of the customized CNN (convolutional neural networks) accelerators have not taken advantage of the sparsity of neural network...

Full description

Bibliographic Details
Main Authors: Chenxiao Lin, Yuezhong Liu, Delong Shang
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10114404/
_version_ 1797805960756985856
author Chenxiao Lin
Yuezhong Liu
Delong Shang
author_facet Chenxiao Lin
Yuezhong Liu
Delong Shang
author_sort Chenxiao Lin
collection DOAJ
description Various pruning skills and network compression methods make modern neural networks sparse for both weights and activations. However, GPUs (graphics processing units) and most of the customized CNN (convolutional neural networks) accelerators have not taken advantage of the sparsity of neural networks, and the accelerators for sparse neural networks in these years are suffering from low computational resource utilization. This paper first proposes an output row-stationary dataflow that exploits the sparsity of both weights and activations. It allows the accelerator to process weights and activations in their compressed form, leading to the high utilization of computational resources, i.e. multipliers. Besides, a low-cost compression algorithm is adopted for both weights and input activations to reduce the power consumption of data access. Secondly, the Y-buffer is proposed to eliminate repeated reading of input activations caused by halo effects, which emerge when tiling large-size input feature maps (ifmaps). Third, an interleaved broadcasting mechanism is introduced to alleviate the imbalance problems caused by the irregularity of sparse data. Finally, a prototype design called ORSAS is synthesized and placed and routed with a SMIC 55 nm process. The evaluation results show that ORSAS occupies the smallest logical cell area, and has the highest multiplier utilization among the peers’ works when the sparsity is large. ORSAS keeps the utilization of multipliers in a range from 60% to 90% over the convolutional layers of popular sparse CNNs and achieves ultra-low power consumption and highest efficiency.
first_indexed 2024-03-13T06:00:40Z
format Article
id doaj.art-f00b16d277334edf9c8515fa2d510068
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T06:00:40Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f00b16d277334edf9c8515fa2d5100682023-06-12T23:01:42ZengIEEEIEEE Access2169-35362023-01-0111441234413510.1109/ACCESS.2023.327256410114404ORSAS: An Output Row-Stationary Accelerator for Sparse Neural NetworksChenxiao Lin0https://orcid.org/0000-0001-8109-5328Yuezhong Liu1https://orcid.org/0009-0007-1875-7262Delong Shang2Institute of Microelectronics of the Chinese Academy of Sciences, Beijing, Chaoyang, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing, Chaoyang, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing, Chaoyang, ChinaVarious pruning skills and network compression methods make modern neural networks sparse for both weights and activations. However, GPUs (graphics processing units) and most of the customized CNN (convolutional neural networks) accelerators have not taken advantage of the sparsity of neural networks, and the accelerators for sparse neural networks in these years are suffering from low computational resource utilization. This paper first proposes an output row-stationary dataflow that exploits the sparsity of both weights and activations. It allows the accelerator to process weights and activations in their compressed form, leading to the high utilization of computational resources, i.e. multipliers. Besides, a low-cost compression algorithm is adopted for both weights and input activations to reduce the power consumption of data access. Secondly, the Y-buffer is proposed to eliminate repeated reading of input activations caused by halo effects, which emerge when tiling large-size input feature maps (ifmaps). Third, an interleaved broadcasting mechanism is introduced to alleviate the imbalance problems caused by the irregularity of sparse data. Finally, a prototype design called ORSAS is synthesized and placed and routed with a SMIC 55 nm process. The evaluation results show that ORSAS occupies the smallest logical cell area, and has the highest multiplier utilization among the peers’ works when the sparsity is large. ORSAS keeps the utilization of multipliers in a range from 60% to 90% over the convolutional layers of popular sparse CNNs and achieves ultra-low power consumption and highest efficiency.https://ieeexplore.ieee.org/document/10114404/Convolution dataflowdata compressionimbalancehalo effectshigh utilizationsparse neural network accelerator
spellingShingle Chenxiao Lin
Yuezhong Liu
Delong Shang
ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
IEEE Access
Convolution dataflow
data compression
imbalance
halo effects
high utilization
sparse neural network accelerator
title ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
title_full ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
title_fullStr ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
title_full_unstemmed ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
title_short ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
title_sort orsas an output row stationary accelerator for sparse neural networks
topic Convolution dataflow
data compression
imbalance
halo effects
high utilization
sparse neural network accelerator
url https://ieeexplore.ieee.org/document/10114404/
work_keys_str_mv AT chenxiaolin orsasanoutputrowstationaryacceleratorforsparseneuralnetworks
AT yuezhongliu orsasanoutputrowstationaryacceleratorforsparseneuralnetworks
AT delongshang orsasanoutputrowstationaryacceleratorforsparseneuralnetworks