Accelerating DNN Training Through Selective Localized Learning

Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stoch...

Full description

Bibliographic Details
Main Authors:	Sarada Krithivasan, Sanchari Sen, Swagath Venkataramani, Anand Raghunathan
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2022-01-01
Series:	Frontiers in Neuroscience
Subjects:	Deep Neural Networks (DNNs) localized learning runtime efficiency graphics process unit (GPU) stochastic gradient decent algorithm
Online Access:	https://www.frontiersin.org/articles/10.3389/fnins.2021.759807/full

_version_	1818943671329882112
author	Sarada Krithivasan Sanchari Sen Swagath Venkataramani Anand Raghunathan
author_facet	Sarada Krithivasan Sanchari Sen Swagath Venkataramani Anand Raghunathan
author_sort	Sarada Krithivasan
collection	DOAJ
description	Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stochastic Gradient Descent (SGD) based training framework. Back-propagation is a computationally expensive process that requires 2 Generalized Matrix Multiply (GEMM) operations to compute the error and weight gradients for each layer. We alleviate this by selectively updating some layers' weights using localized learning rules that require only 1 GEMM operation per layer. Further, since localized weight updates are performed during the forward pass itself, the layer activations for such layers do not need to be stored until the backward pass, resulting in a reduced memory footprint. Localized updates can substantially boost training speed, but need to be used judiciously in order to preserve accuracy and convergence. We address this challenge through a Learning Mode Selection Algorithm, which gradually selects and moves layers to localized learning as training progresses. Specifically, for each epoch, the algorithm identifies a Localized→SGD transition layer that delineates the network into two regions. Layers before the transition layer use localized updates, while the transition layer and later layers use gradient-based updates. We propose both static and dynamic approaches to the design of the learning mode selection algorithm. The static algorithm utilizes a pre-defined scheduler function to identify the position of the transition layer, while the dynamic algorithm analyzes the dynamics of the weight updates made to the transition layer to determine how the boundary between SGD and localized updates is shifted in future epochs. We also propose a low-cost weak supervision mechanism that controls the learning rate of localized updates based on the overall training loss. We applied LoCal+SGD to 8 image recognition CNNs (including ResNet50 and MobileNetV2) across 3 datasets (Cifar10, Cifar100, and ImageNet). Our measurements on an Nvidia GTX 1080Ti GPU demonstrate upto 1.5× improvement in end-to-end training time with ~0.5% loss in Top-1 classification accuracy.
first_indexed	2024-12-20T07:31:02Z
format	Article
id	doaj.art-4ab4bcbd8a7545399351996388c0bb92
institution	Directory Open Access Journal
issn	1662-453X
language	English
last_indexed	2024-12-20T07:31:02Z
publishDate	2022-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neuroscience
spelling	doaj.art-4ab4bcbd8a7545399351996388c0bb922022-12-21T19:48:26ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2022-01-011510.3389/fnins.2021.759807759807Accelerating DNN Training Through Selective Localized LearningSarada Krithivasan0Sanchari Sen1Swagath Venkataramani2Anand Raghunathan3Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United StatesIBM Research, Yorktown Heights, NY, United StatesIBM Research, Yorktown Heights, NY, United StatesDepartment of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United StatesTraining Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stochastic Gradient Descent (SGD) based training framework. Back-propagation is a computationally expensive process that requires 2 Generalized Matrix Multiply (GEMM) operations to compute the error and weight gradients for each layer. We alleviate this by selectively updating some layers' weights using localized learning rules that require only 1 GEMM operation per layer. Further, since localized weight updates are performed during the forward pass itself, the layer activations for such layers do not need to be stored until the backward pass, resulting in a reduced memory footprint. Localized updates can substantially boost training speed, but need to be used judiciously in order to preserve accuracy and convergence. We address this challenge through a Learning Mode Selection Algorithm, which gradually selects and moves layers to localized learning as training progresses. Specifically, for each epoch, the algorithm identifies a Localized→SGD transition layer that delineates the network into two regions. Layers before the transition layer use localized updates, while the transition layer and later layers use gradient-based updates. We propose both static and dynamic approaches to the design of the learning mode selection algorithm. The static algorithm utilizes a pre-defined scheduler function to identify the position of the transition layer, while the dynamic algorithm analyzes the dynamics of the weight updates made to the transition layer to determine how the boundary between SGD and localized updates is shifted in future epochs. We also propose a low-cost weak supervision mechanism that controls the learning rate of localized updates based on the overall training loss. We applied LoCal+SGD to 8 image recognition CNNs (including ResNet50 and MobileNetV2) across 3 datasets (Cifar10, Cifar100, and ImageNet). Our measurements on an Nvidia GTX 1080Ti GPU demonstrate upto 1.5× improvement in end-to-end training time with ~0.5% loss in Top-1 classification accuracy.https://www.frontiersin.org/articles/10.3389/fnins.2021.759807/fullDeep Neural Networks (DNNs)localized learningruntime efficiencygraphics process unit (GPU)stochastic gradient decent algorithm
spellingShingle	Sarada Krithivasan Sanchari Sen Swagath Venkataramani Anand Raghunathan Accelerating DNN Training Through Selective Localized Learning Frontiers in Neuroscience Deep Neural Networks (DNNs) localized learning runtime efficiency graphics process unit (GPU) stochastic gradient decent algorithm
title	Accelerating DNN Training Through Selective Localized Learning
title_full	Accelerating DNN Training Through Selective Localized Learning
title_fullStr	Accelerating DNN Training Through Selective Localized Learning
title_full_unstemmed	Accelerating DNN Training Through Selective Localized Learning
title_short	Accelerating DNN Training Through Selective Localized Learning
title_sort	accelerating dnn training through selective localized learning
topic	Deep Neural Networks (DNNs) localized learning runtime efficiency graphics process unit (GPU) stochastic gradient decent algorithm
url	https://www.frontiersin.org/articles/10.3389/fnins.2021.759807/full
work_keys_str_mv	AT saradakrithivasan acceleratingdnntrainingthroughselectivelocalizedlearning AT sancharisen acceleratingdnntrainingthroughselectivelocalizedlearning AT swagathvenkataramani acceleratingdnntrainingthroughselectivelocalizedlearning AT anandraghunathan acceleratingdnntrainingthroughselectivelocalizedlearning

Accelerating DNN Training Through Selective Localized Learning

Similar Items