Towards interpretable deep local learning with successive gradient reconciliation
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural netw...
Hlavní autoři: | , , , , , , |
---|---|
Médium: | Conference item |
Jazyk: | English |
Vydáno: |
PMLR
2024
|
_version_ | 1826316090413678592 |
---|---|
author | Yang, Y Li, X Alfarra, M Hammoud, H Bibi, A Torr, P Ghanem, B |
author_facet | Yang, Y Li, X Alfarra, M Hammoud, H Bibi, A Torr, P Ghanem, B |
author_sort | Yang, Y |
collection | OXFORD |
description | Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the reconciliation among local errors has never been investigated. In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without breaking gradient isolation or introducing any learnable parameters. Our method can be integrated into both local-BP and BP-free settings. In experiments, we achieve significant performance improvements compared to previous methods. Particularly, our method for CNN and Transformer architectures on ImageNet is able to attain a competitive performance with global BP, saving more than 40% memory consumption. |
first_indexed | 2024-12-09T03:37:35Z |
format | Conference item |
id | oxford-uuid:a00caa99-e242-4a85-93b5-90890d1662c5 |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:37:35Z |
publishDate | 2024 |
publisher | PMLR |
record_format | dspace |
spelling | oxford-uuid:a00caa99-e242-4a85-93b5-90890d1662c52024-12-02T15:20:15ZTowards interpretable deep local learning with successive gradient reconciliationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:a00caa99-e242-4a85-93b5-90890d1662c5EnglishSymplectic ElementsPMLR2024Yang, YLi, XAlfarra, MHammoud, HBibi, ATorr, PGhanem, BRelieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the reconciliation among local errors has never been investigated. In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without breaking gradient isolation or introducing any learnable parameters. Our method can be integrated into both local-BP and BP-free settings. In experiments, we achieve significant performance improvements compared to previous methods. Particularly, our method for CNN and Transformer architectures on ImageNet is able to attain a competitive performance with global BP, saving more than 40% memory consumption. |
spellingShingle | Yang, Y Li, X Alfarra, M Hammoud, H Bibi, A Torr, P Ghanem, B Towards interpretable deep local learning with successive gradient reconciliation |
title | Towards interpretable deep local learning with successive gradient reconciliation |
title_full | Towards interpretable deep local learning with successive gradient reconciliation |
title_fullStr | Towards interpretable deep local learning with successive gradient reconciliation |
title_full_unstemmed | Towards interpretable deep local learning with successive gradient reconciliation |
title_short | Towards interpretable deep local learning with successive gradient reconciliation |
title_sort | towards interpretable deep local learning with successive gradient reconciliation |
work_keys_str_mv | AT yangy towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation AT lix towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation AT alfarram towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation AT hammoudh towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation AT bibia towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation AT torrp towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation AT ghanemb towardsinterpretabledeeplocallearningwithsuccessivegradientreconciliation |