Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vani...

Full description

Bibliographic Details
Main Authors: Junyun Zhao, Siyuan Huang, Osama Yousuf, Yutong Gao, Brian D. Hoskins, Gina C. Adam
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-11-01
Series:Frontiers in Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/full
_version_ 1818837707152949248
author Junyun Zhao
Siyuan Huang
Osama Yousuf
Yutong Gao
Brian D. Hoskins
Gina C. Adam
author_facet Junyun Zhao
Siyuan Huang
Osama Yousuf
Yutong Gao
Brian D. Hoskins
Gina C. Adam
author_sort Junyun Zhao
collection DOAJ
description While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.
first_indexed 2024-12-19T03:26:47Z
format Article
id doaj.art-e757422268bc4a378caf150d006973d5
institution Directory Open Access Journal
issn 1662-453X
language English
last_indexed 2024-12-19T03:26:47Z
publishDate 2021-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neuroscience
spelling doaj.art-e757422268bc4a378caf150d006973d52022-12-21T20:37:36ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2021-11-011510.3389/fnins.2021.749811749811Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic DevicesJunyun Zhao0Siyuan Huang1Osama Yousuf2Yutong Gao3Brian D. Hoskins4Gina C. Adam5Department of Computer Science, George Washington University, Washington, DC, United StatesDepartment of Computer Science, George Washington University, Washington, DC, United StatesDepartment of Electrical and Computer Engineering, George Washington University, Washington, DC, United StatesDepartment of Computer Science, George Washington University, Washington, DC, United StatesPhysical Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United StatesDepartment of Electrical and Computer Engineering, George Washington University, Washington, DC, United StatesWhile promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/fullnon-negative matrix factorizationgradient data decompositionprincipal component analysismemristornon-idealitiesReRAM
spellingShingle Junyun Zhao
Siyuan Huang
Osama Yousuf
Yutong Gao
Brian D. Hoskins
Gina C. Adam
Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
Frontiers in Neuroscience
non-negative matrix factorization
gradient data decomposition
principal component analysis
memristor
non-idealities
ReRAM
title Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_fullStr Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full_unstemmed Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_short Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_sort gradient decomposition methods for training neural networks with non ideal synaptic devices
topic non-negative matrix factorization
gradient data decomposition
principal component analysis
memristor
non-idealities
ReRAM
url https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/full
work_keys_str_mv AT junyunzhao gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT siyuanhuang gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT osamayousuf gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT yutonggao gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT briandhoskins gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT ginacadam gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices