Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic

Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/ene...

Full description

Bibliographic Details
Main Author:	Andrulis, Tanner
Other Authors:	Emer, Joel S.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/151461

_version_	1826213829217878016
author	Andrulis, Tanner
author2	Emer, Joel S.
author_facet	Emer, Joel S. Andrulis, Tanner
author_sort	Andrulis, Tanner
collection	MIT
description	Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC area/energy do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These approaches harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, this thesis explores tradeoffs around ADC area/energy and develops optimizations that can reduce ADC area/energy without retraining DNNs. We use these optimizations to develop a new PIM accelerator, RAELLA, which can adapt the architecture to each DNN. RAELLA lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9x and throughput by up to 3.3x. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining.
first_indexed	2024-09-23T15:55:29Z
format	Thesis
id	mit-1721.1/151461
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T15:55:29Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1514612023-08-01T03:08:18Z Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic Andrulis, Tanner Emer, Joel S. Sze, Vivienne Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC area/energy do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These approaches harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, this thesis explores tradeoffs around ADC area/energy and develops optimizations that can reduce ADC area/energy without retraining DNNs. We use these optimizations to develop a new PIM accelerator, RAELLA, which can adapt the architecture to each DNN. RAELLA lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9x and throughput by up to 3.3x. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining. S.M. 2023-07-31T19:41:27Z 2023-07-31T19:41:27Z 2023-06 2023-07-13T14:13:58.566Z Thesis https://hdl.handle.net/1721.1/151461 0000-0002-3168-9862 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Andrulis, Tanner Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title	Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_full	Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_fullStr	Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_full_unstemmed	Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_short	Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_sort	efficient accurate and flexible pim inference through adaptable low resolution arithmetic
url	https://hdl.handle.net/1721.1/151461
work_keys_str_mv	AT andrulistanner efficientaccurateandflexiblepiminferencethroughadaptablelowresolutionarithmetic

Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic

Similar Items