Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vani...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-11-01
|
Series: | Frontiers in Neuroscience |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/full |
_version_ | 1818837707152949248 |
---|---|
author | Junyun Zhao Siyuan Huang Osama Yousuf Yutong Gao Brian D. Hoskins Gina C. Adam |
author_facet | Junyun Zhao Siyuan Huang Osama Yousuf Yutong Gao Brian D. Hoskins Gina C. Adam |
author_sort | Junyun Zhao |
collection | DOAJ |
description | While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators. |
first_indexed | 2024-12-19T03:26:47Z |
format | Article |
id | doaj.art-e757422268bc4a378caf150d006973d5 |
institution | Directory Open Access Journal |
issn | 1662-453X |
language | English |
last_indexed | 2024-12-19T03:26:47Z |
publishDate | 2021-11-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroscience |
spelling | doaj.art-e757422268bc4a378caf150d006973d52022-12-21T20:37:36ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2021-11-011510.3389/fnins.2021.749811749811Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic DevicesJunyun Zhao0Siyuan Huang1Osama Yousuf2Yutong Gao3Brian D. Hoskins4Gina C. Adam5Department of Computer Science, George Washington University, Washington, DC, United StatesDepartment of Computer Science, George Washington University, Washington, DC, United StatesDepartment of Electrical and Computer Engineering, George Washington University, Washington, DC, United StatesDepartment of Computer Science, George Washington University, Washington, DC, United StatesPhysical Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United StatesDepartment of Electrical and Computer Engineering, George Washington University, Washington, DC, United StatesWhile promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/fullnon-negative matrix factorizationgradient data decompositionprincipal component analysismemristornon-idealitiesReRAM |
spellingShingle | Junyun Zhao Siyuan Huang Osama Yousuf Yutong Gao Brian D. Hoskins Gina C. Adam Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices Frontiers in Neuroscience non-negative matrix factorization gradient data decomposition principal component analysis memristor non-idealities ReRAM |
title | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_full | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_fullStr | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_full_unstemmed | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_short | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_sort | gradient decomposition methods for training neural networks with non ideal synaptic devices |
topic | non-negative matrix factorization gradient data decomposition principal component analysis memristor non-idealities ReRAM |
url | https://www.frontiersin.org/articles/10.3389/fnins.2021.749811/full |
work_keys_str_mv | AT junyunzhao gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT siyuanhuang gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT osamayousuf gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT yutonggao gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT briandhoskins gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT ginacadam gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices |