Understanding the effects of data parallelism and sparsity on neural network training

We study two factors in neural network training: data parallelism and sparsity; here, data parallelism means processing training data in parallel using distributed systems (or equivalently increasing batch size), so that training can be accelerated; for sparsity, we refer to pruning parameters in a...

Full description

Bibliographic Details
Main Authors:	Lee, N, Ajanthan, T, Torr, PHS, Jaggi, M
Format:	Conference item
Language:	English
Published:	OpenReview 2020

_version_	1797101923411689472
author	Lee, N Ajanthan, T Torr, PHS Jaggi, M
author_facet	Lee, N Ajanthan, T Torr, PHS Jaggi, M
author_sort	Lee, N
collection	OXFORD
description	We study two factors in neural network training: data parallelism and sparsity; here, data parallelism means processing training data in parallel using distributed systems (or equivalently increasing batch size), so that training can be accelerated; for sparsity, we refer to pruning parameters in a neural network model, so as to reduce computational and memory cost. Despite their promising benefits, however, understanding of their effects on neural network training remains elusive. In this work, we first measure these effects rigorously by conducting extensive experiments while tuning all metaparameters involved in the optimization. As a result, we find across various workloads of data set, network model, and optimization algorithm that there exists a general scaling trend between batch size and number of training steps to convergence for the effect of data parallelism, and further, difficulty of training under sparsity. Then, we develop a theoretical analysis based on the convergence properties of stochastic gradient methods and smoothness of the optimization landscape, which illustrates the observed phenomena precisely and generally, establishing a better account of the effects of data parallelism and sparsity on neural network training.
first_indexed	2024-03-07T05:58:44Z
format	Conference item
id	oxford-uuid:eb718cbf-3b31-4f74-a69f-299503936a13
institution	University of Oxford
language	English
last_indexed	2024-03-07T05:58:44Z
publishDate	2020
publisher	OpenReview
record_format	dspace
spelling	oxford-uuid:eb718cbf-3b31-4f74-a69f-299503936a132022-03-27T11:09:39ZUnderstanding the effects of data parallelism and sparsity on neural network trainingConference itemhttp://purl.org/coar/resource_type/c_5794uuid:eb718cbf-3b31-4f74-a69f-299503936a13EnglishSymplectic ElementsOpenReview2020Lee, NAjanthan, TTorr, PHSJaggi, MWe study two factors in neural network training: data parallelism and sparsity; here, data parallelism means processing training data in parallel using distributed systems (or equivalently increasing batch size), so that training can be accelerated; for sparsity, we refer to pruning parameters in a neural network model, so as to reduce computational and memory cost. Despite their promising benefits, however, understanding of their effects on neural network training remains elusive. In this work, we first measure these effects rigorously by conducting extensive experiments while tuning all metaparameters involved in the optimization. As a result, we find across various workloads of data set, network model, and optimization algorithm that there exists a general scaling trend between batch size and number of training steps to convergence for the effect of data parallelism, and further, difficulty of training under sparsity. Then, we develop a theoretical analysis based on the convergence properties of stochastic gradient methods and smoothness of the optimization landscape, which illustrates the observed phenomena precisely and generally, establishing a better account of the effects of data parallelism and sparsity on neural network training.
spellingShingle	Lee, N Ajanthan, T Torr, PHS Jaggi, M Understanding the effects of data parallelism and sparsity on neural network training
title	Understanding the effects of data parallelism and sparsity on neural network training
title_full	Understanding the effects of data parallelism and sparsity on neural network training
title_fullStr	Understanding the effects of data parallelism and sparsity on neural network training
title_full_unstemmed	Understanding the effects of data parallelism and sparsity on neural network training
title_short	Understanding the effects of data parallelism and sparsity on neural network training
title_sort	understanding the effects of data parallelism and sparsity on neural network training
work_keys_str_mv	AT leen understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining AT ajanthant understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining AT torrphs understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining AT jaggim understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining

Understanding the effects of data parallelism and sparsity on neural network training

Similar Items