Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vani...

Full description

Bibliographic Details
Main Authors:	Junyun Zhao, Siyuan Huang, Osama Yousuf, Yutong Gao, Brian D. Hoskins, Gina C. Adam
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2021-11-01
Series:	Frontiers in Neuroscience
Subjects:	non-negative matrix factorization gradient data decomposition principal component analysis memristor non-idealities ReRAM
Online Access:	https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/full

_version_	1818837707152949248
author	Junyun Zhao Siyuan Huang Osama Yousuf Yutong Gao Brian D. Hoskins Gina C. Adam
author_facet	Junyun Zhao Siyuan Huang Osama Yousuf Yutong Gao Brian D. Hoskins Gina C. Adam
author_sort	Junyun Zhao
collection	DOAJ
description	While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.
first_indexed	2024-12-19T03:26:47Z
format	Article
id	doaj.art-e757422268bc4a378caf150d006973d5
institution	Directory Open Access Journal
issn	1662-453X
language	English
last_indexed	2024-12-19T03:26:47Z
publishDate	2021-11-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neuroscience
spelling	doaj.art-e757422268bc4a378caf150d006973d52022-12-21T20:37:36ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2021-11-011510.3389/fnins.2021.749811749811Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic DevicesJunyun Zhao0Siyuan Huang1Osama Yousuf2Yutong Gao3Brian D. Hoskins4Gina C. Adam5Department of Computer Science, George Washington University, Washington, DC, United StatesDepartment of Computer Science, George Washington University, Washington, DC, United StatesDepartment of Electrical and Computer Engineering, George Washington University, Washington, DC, United StatesDepartment of Computer Science, George Washington University, Washington, DC, United StatesPhysical Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United StatesDepartment of Electrical and Computer Engineering, George Washington University, Washington, DC, United StatesWhile promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/fullnon-negative matrix factorizationgradient data decompositionprincipal component analysismemristornon-idealitiesReRAM
spellingShingle	Junyun Zhao Siyuan Huang Osama Yousuf Yutong Gao Brian D. Hoskins Gina C. Adam Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices Frontiers in Neuroscience non-negative matrix factorization gradient data decomposition principal component analysis memristor non-idealities ReRAM
title	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_fullStr	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full_unstemmed	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_short	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_sort	gradient decomposition methods for training neural networks with non ideal synaptic devices
topic	non-negative matrix factorization gradient data decomposition principal component analysis memristor non-idealities ReRAM
url	https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/full
work_keys_str_mv	AT junyunzhao gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT siyuanhuang gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT osamayousuf gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT yutonggao gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT briandhoskins gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT ginacadam gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Similar Items