Understanding the effects of data parallelism and sparsity on neural network training

We study two factors in neural network training: data parallelism and sparsity; here, data parallelism means processing training data in parallel using distributed systems (or equivalently increasing batch size), so that training can be accelerated; for sparsity, we refer to pruning parameters in a...

Full description

Bibliographic Details
Main Authors: Lee, N, Ajanthan, T, Torr, PHS, Jaggi, M
Format: Conference item
Language:English
Published: OpenReview 2020
_version_ 1797101923411689472
author Lee, N
Ajanthan, T
Torr, PHS
Jaggi, M
author_facet Lee, N
Ajanthan, T
Torr, PHS
Jaggi, M
author_sort Lee, N
collection OXFORD
description We study two factors in neural network training: data parallelism and sparsity; here, data parallelism means processing training data in parallel using distributed systems (or equivalently increasing batch size), so that training can be accelerated; for sparsity, we refer to pruning parameters in a neural network model, so as to reduce computational and memory cost. Despite their promising benefits, however, understanding of their effects on neural network training remains elusive. In this work, we first measure these effects rigorously by conducting extensive experiments while tuning all metaparameters involved in the optimization. As a result, we find across various workloads of data set, network model, and optimization algorithm that there exists a general scaling trend between batch size and number of training steps to convergence for the effect of data parallelism, and further, difficulty of training under sparsity. Then, we develop a theoretical analysis based on the convergence properties of stochastic gradient methods and smoothness of the optimization landscape, which illustrates the observed phenomena precisely and generally, establishing a better account of the effects of data parallelism and sparsity on neural network training.
first_indexed 2024-03-07T05:58:44Z
format Conference item
id oxford-uuid:eb718cbf-3b31-4f74-a69f-299503936a13
institution University of Oxford
language English
last_indexed 2024-03-07T05:58:44Z
publishDate 2020
publisher OpenReview
record_format dspace
spelling oxford-uuid:eb718cbf-3b31-4f74-a69f-299503936a132022-03-27T11:09:39ZUnderstanding the effects of data parallelism and sparsity on neural network trainingConference itemhttp://purl.org/coar/resource_type/c_5794uuid:eb718cbf-3b31-4f74-a69f-299503936a13EnglishSymplectic ElementsOpenReview2020Lee, NAjanthan, TTorr, PHSJaggi, MWe study two factors in neural network training: data parallelism and sparsity; here, data parallelism means processing training data in parallel using distributed systems (or equivalently increasing batch size), so that training can be accelerated; for sparsity, we refer to pruning parameters in a neural network model, so as to reduce computational and memory cost. Despite their promising benefits, however, understanding of their effects on neural network training remains elusive. In this work, we first measure these effects rigorously by conducting extensive experiments while tuning all metaparameters involved in the optimization. As a result, we find across various workloads of data set, network model, and optimization algorithm that there exists a general scaling trend between batch size and number of training steps to convergence for the effect of data parallelism, and further, difficulty of training under sparsity. Then, we develop a theoretical analysis based on the convergence properties of stochastic gradient methods and smoothness of the optimization landscape, which illustrates the observed phenomena precisely and generally, establishing a better account of the effects of data parallelism and sparsity on neural network training.
spellingShingle Lee, N
Ajanthan, T
Torr, PHS
Jaggi, M
Understanding the effects of data parallelism and sparsity on neural network training
title Understanding the effects of data parallelism and sparsity on neural network training
title_full Understanding the effects of data parallelism and sparsity on neural network training
title_fullStr Understanding the effects of data parallelism and sparsity on neural network training
title_full_unstemmed Understanding the effects of data parallelism and sparsity on neural network training
title_short Understanding the effects of data parallelism and sparsity on neural network training
title_sort understanding the effects of data parallelism and sparsity on neural network training
work_keys_str_mv AT leen understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining
AT ajanthant understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining
AT torrphs understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining
AT jaggim understandingtheeffectsofdataparallelismandsparsityonneuralnetworktraining