All Analog CNN Accelerator with RRAMs for Fast Inference

As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thous...

Full description

Bibliographic Details
Main Author:	Chao, Minghan
Other Authors:	Shulaker, Max
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/146297

_version_	1826210639322808320
author	Chao, Minghan
author2	Shulaker, Max
author_facet	Shulaker, Max Chao, Minghan
author_sort	Chao, Minghan
collection	MIT
description	As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thousands of times. However, while re-using the same hardware to perform the same computation repeatedly saves area, it comes at the expense of execution time. This presents another critical obstacle, as the need for real-data and rapid AI requires a fundamentally faster approach to implementing neural networks. The focus of this thesis is to duplicate the key operation – multiply and accumulate (MAC) computation units, in the hardware so that there is no hardware re-use, enabling the entire neural network to be physically fabricated on a single chip. As neural networks today often require hundreds of thousands to tens of millions of MAC computation units, this requires designing the smallest MAC computation units to fit all of the operations on chip. Here, we present initial analysis on a convolutional neural network (CNN) accelerator that implements such a system, optimizing for inference speed. The accelerator duplicates all of the computation hardware, thus eliminating the need to fetch data back and forth while reusing the same hardware. We propose a novel design for memory cells using resistive random access memory (RRAM) and computation units utilizing the analog behavior of transistors. This circuit classifies one Cifar-10 dataset image in 6µs (160k frames/s) with 2.4µJ energy per classification with an accuracy of 85%. It contains 7.5 million MAC units and achieves 5 million MAC/mm².
first_indexed	2024-09-23T14:53:09Z
format	Thesis
id	mit-1721.1/146297
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:53:09Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1462972022-11-11T03:11:24Z All Analog CNN Accelerator with RRAMs for Fast Inference Chao, Minghan Shulaker, Max Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thousands of times. However, while re-using the same hardware to perform the same computation repeatedly saves area, it comes at the expense of execution time. This presents another critical obstacle, as the need for real-data and rapid AI requires a fundamentally faster approach to implementing neural networks. The focus of this thesis is to duplicate the key operation – multiply and accumulate (MAC) computation units, in the hardware so that there is no hardware re-use, enabling the entire neural network to be physically fabricated on a single chip. As neural networks today often require hundreds of thousands to tens of millions of MAC computation units, this requires designing the smallest MAC computation units to fit all of the operations on chip. Here, we present initial analysis on a convolutional neural network (CNN) accelerator that implements such a system, optimizing for inference speed. The accelerator duplicates all of the computation hardware, thus eliminating the need to fetch data back and forth while reusing the same hardware. We propose a novel design for memory cells using resistive random access memory (RRAM) and computation units utilizing the analog behavior of transistors. This circuit classifies one Cifar-10 dataset image in 6µs (160k frames/s) with 2.4µJ energy per classification with an accuracy of 85%. It contains 7.5 million MAC units and achieves 5 million MAC/mm². S.M. 2022-11-10T14:05:03Z 2022-11-10T14:05:03Z 2022-02 2022-03-04T20:59:41.259Z Thesis https://hdl.handle.net/1721.1/146297 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle	Chao, Minghan All Analog CNN Accelerator with RRAMs for Fast Inference
title	All Analog CNN Accelerator with RRAMs for Fast Inference
title_full	All Analog CNN Accelerator with RRAMs for Fast Inference
title_fullStr	All Analog CNN Accelerator with RRAMs for Fast Inference
title_full_unstemmed	All Analog CNN Accelerator with RRAMs for Fast Inference
title_short	All Analog CNN Accelerator with RRAMs for Fast Inference
title_sort	all analog cnn accelerator with rrams for fast inference
url	https://hdl.handle.net/1721.1/146297
work_keys_str_mv	AT chaominghan allanalogcnnacceleratorwithrramsforfastinference

All Analog CNN Accelerator with RRAMs for Fast Inference

Similar Items