Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic

Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/ene...

Full description

Bibliographic Details
Main Author: Andrulis, Tanner
Other Authors: Emer, Joel S.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151461
_version_ 1826213829217878016
author Andrulis, Tanner
author2 Emer, Joel S.
author_facet Emer, Joel S.
Andrulis, Tanner
author_sort Andrulis, Tanner
collection MIT
description Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC area/energy do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These approaches harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, this thesis explores tradeoffs around ADC area/energy and develops optimizations that can reduce ADC area/energy without retraining DNNs. We use these optimizations to develop a new PIM accelerator, RAELLA, which can adapt the architecture to each DNN. RAELLA lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9x and throughput by up to 3.3x. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining.
first_indexed 2024-09-23T15:55:29Z
format Thesis
id mit-1721.1/151461
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:55:29Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1514612023-08-01T03:08:18Z Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic Andrulis, Tanner Emer, Joel S. Sze, Vivienne Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC area/energy do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These approaches harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, this thesis explores tradeoffs around ADC area/energy and develops optimizations that can reduce ADC area/energy without retraining DNNs. We use these optimizations to develop a new PIM accelerator, RAELLA, which can adapt the architecture to each DNN. RAELLA lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9x and throughput by up to 3.3x. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining. S.M. 2023-07-31T19:41:27Z 2023-07-31T19:41:27Z 2023-06 2023-07-13T14:13:58.566Z Thesis https://hdl.handle.net/1721.1/151461 0000-0002-3168-9862 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Andrulis, Tanner
Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_full Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_fullStr Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_full_unstemmed Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_short Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
title_sort efficient accurate and flexible pim inference through adaptable low resolution arithmetic
url https://hdl.handle.net/1721.1/151461
work_keys_str_mv AT andrulistanner efficientaccurateandflexiblepiminferencethroughadaptablelowresolutionarithmetic