iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product

Ternary Neural Networks (TNNs) achieve an excellent trade-off between model size, speed, and accuracy, quantizing weights and activations into ternary values {+1, 0, -1}. The ternary multiplication operations in TNNs equal light-weight bitwise operations, favorably in In-Memory Computing (IMC) platf...

Full description

Bibliographic Details
Main Authors:	Zhu, Shien, Huai, Shuo, Xiong, Guochu, Liu, Weichen
Other Authors:	School of Computer Science and Engineering
Format:	Conference Paper
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Engineering::Electrical and electronic engineering::Computer hardware, software and systems In-Memory Computing Ternary Neural Network Software-Hardware Co-Design
Online Access:	https://hdl.handle.net/10356/170218

_version_	1824453504654114816
author	Zhu, Shien Huai, Shuo Xiong, Guochu Liu, Weichen
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zhu, Shien Huai, Shuo Xiong, Guochu Liu, Weichen
author_sort	Zhu, Shien
collection	NTU
description	Ternary Neural Networks (TNNs) achieve an excellent trade-off between model size, speed, and accuracy, quantizing weights and activations into ternary values {+1, 0, -1}. The ternary multiplication operations in TNNs equal light-weight bitwise operations, favorably in In-Memory Computing (IMC) platforms. Therefore, many IMC-based TNN accelerators have been proposed. They build dedicated ternary multiplication cells or utilize efficient bitwise operations on IMC architectures. However, existing ternary value accumulation schemes on IMC architectures are inefficient. They extend the sign bit of integer operands or conduct two-round accumulation with specially designed encoding, bringing long latency and extra memory write overhead. Moreover, existing IMC-based TNN accelerators overlook TNNs' sparsity and conduct operations on zero weights, resulting in unnecessary power consumption and latency. In this paper, we propose iMAT to accelerate TNNs with operator-, architecture- and layer-level optimizations. First, we propose a single-round Ternary Variable-Bitwidth Accumulation scheme, which efficiently extends the addition result sign bit without extra memory write overhead. Second, we propose an in-memory accelerator with enhanced sensing circuits for the accumulation scheme and a Sparse Dot Product Unit to exploit TNNs' weight sparsity, utilizing zero weights to skip unnecessary operations. Further, we propose Fused Scaling Functions which combine the scaling, activation, normalization, and quantization layers to reduce the hardware complexity without affecting the model accuracy. Simulation results show that compared with dense in-memory TNN accelerators, our iMAT achieves up to 2.7X speedup and 3.7X energy efficiency on ternary ResNet-18.
first_indexed	2025-02-19T03:07:28Z
format	Conference Paper
id	ntu-10356/170218
institution	Nanyang Technological University
language	English
last_indexed	2025-02-19T03:07:28Z
publishDate	2023
record_format	dspace
spelling	ntu-10356/1702182023-12-19T02:48:00Z iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product Zhu, Shien Huai, Shuo Xiong, Guochu Liu, Weichen School of Computer Science and Engineering 2023 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) Parallel and Distributed Computing Centre Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Engineering::Electrical and electronic engineering::Computer hardware, software and systems In-Memory Computing Ternary Neural Network Software-Hardware Co-Design Ternary Neural Networks (TNNs) achieve an excellent trade-off between model size, speed, and accuracy, quantizing weights and activations into ternary values {+1, 0, -1}. The ternary multiplication operations in TNNs equal light-weight bitwise operations, favorably in In-Memory Computing (IMC) platforms. Therefore, many IMC-based TNN accelerators have been proposed. They build dedicated ternary multiplication cells or utilize efficient bitwise operations on IMC architectures. However, existing ternary value accumulation schemes on IMC architectures are inefficient. They extend the sign bit of integer operands or conduct two-round accumulation with specially designed encoding, bringing long latency and extra memory write overhead. Moreover, existing IMC-based TNN accelerators overlook TNNs' sparsity and conduct operations on zero weights, resulting in unnecessary power consumption and latency. In this paper, we propose iMAT to accelerate TNNs with operator-, architecture- and layer-level optimizations. First, we propose a single-round Ternary Variable-Bitwidth Accumulation scheme, which efficiently extends the addition result sign bit without extra memory write overhead. Second, we propose an in-memory accelerator with enhanced sensing circuits for the accumulation scheme and a Sparse Dot Product Unit to exploit TNNs' weight sparsity, utilizing zero weights to skip unnecessary operations. Further, we propose Fused Scaling Functions which combine the scaling, activation, normalization, and quantization layers to reduce the hardware complexity without affecting the model accuracy. Simulation results show that compared with dense in-memory TNN accelerators, our iMAT achieves up to 2.7X speedup and 3.7X energy efficiency on ternary ResNet-18. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071), and Nanyang Technological University, Singapore, under its NAP (M4082282/ 04INS000515C130). 2023-09-25T07:12:01Z 2023-09-25T07:12:01Z 2023 Conference Paper Zhu, S., Huai, S., Xiong, G. & Liu, W. (2023). iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product. 2023 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). https://dx.doi.org/10.1109/ISLPED58423.2023.10244333 https://hdl.handle.net/10356/170218 10.1109/ISLPED58423.2023.10244333 en MOE2019-T2-1-071 NAP (M4082282/04INS000515C130) 10.21979/N9/B4SIIN © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/ISLPED58423.2023.10244333. application/pdf
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Engineering::Electrical and electronic engineering::Computer hardware, software and systems In-Memory Computing Ternary Neural Network Software-Hardware Co-Design Zhu, Shien Huai, Shuo Xiong, Guochu Liu, Weichen iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
title	iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
title_full	iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
title_fullStr	iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
title_full_unstemmed	iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
title_short	iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
title_sort	imat energy efficient in memory acceleration for ternary neural networks with sparse dot product
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Engineering::Electrical and electronic engineering::Computer hardware, software and systems In-Memory Computing Ternary Neural Network Software-Hardware Co-Design
url	https://hdl.handle.net/10356/170218
work_keys_str_mv	AT zhushien imatenergyefficientinmemoryaccelerationforternaryneuralnetworkswithsparsedotproduct AT huaishuo imatenergyefficientinmemoryaccelerationforternaryneuralnetworkswithsparsedotproduct AT xiongguochu imatenergyefficientinmemoryaccelerationforternaryneuralnetworkswithsparsedotproduct AT liuweichen imatenergyefficientinmemoryaccelerationforternaryneuralnetworkswithsparsedotproduct

iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product

Similar Items