Statistical physics of deep neural networks: Initialization toward optimal channels

In deep learning, neural networks serve as noisy channels between input data and its latent representation. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts a...

Full description

Bibliographic Details
Main Authors:	Kangyu Weng, Aohua Cheng, Ziyang Zhang, Pei Sun, Yang Tian
Format:	Article
Language:	English
Published:	American Physical Society 2023-04-01
Series:	Physical Review Research
Online Access:	http://doi.org/10.1103/PhysRevResearch.5.023023

_version_	1797210490804371456
author	Kangyu Weng Aohua Cheng Ziyang Zhang Pei Sun Yang Tian
author_facet	Kangyu Weng Aohua Cheng Ziyang Zhang Pei Sun Yang Tian
author_sort	Kangyu Weng
collection	DOAJ
description	In deep learning, neural networks serve as noisy channels between input data and its latent representation. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts are concentrated on realizing optimal channel properties during network optimization, we study a frequently overlooked possibility that neural networks can be initialized toward optimal channels. Our theory, consistent with experimental validation, identifies primary mechanics underlying this unknown possibility and suggests intrinsic connections between statistical physics and deep learning. Unlike the conventional theories that characterize neural networks applying the classic mean-field approximation, we offer analytic proof that this extensively applied simplification scheme is not appropriate in studying neural networks as information channels. To fill this gap, we develop a restricted mean-field framework applicable for characterizing the limiting behaviors of information propagation in neural networks without strong assumptions on inputs. Based on it, we propose an analytic theory to prove that mutual information maximization is realized between inputs and propagated signals when neural networks are initialized at dynamic isometry, a case where information transmits via norm-preserving mappings. These theoretical predictions are validated by experiments on real neural networks, suggesting the robustness of our theory against finite-size effects. Finally, we analyze our findings with information bottleneck theory to confirm the precise relations among dynamic isometry, mutual information maximization, and optimal channel properties in deep learning. Our work may lay a cornerstone for promoting deep learning in terms of network initialization and suggest general statistical physics mechanisms underlying diverse deep learning techniques.
first_indexed	2024-04-24T10:11:26Z
format	Article
id	doaj.art-17c6a9478d084b8ca351309039ffd2f6
institution	Directory Open Access Journal
issn	2643-1564
language	English
last_indexed	2024-04-24T10:11:26Z
publishDate	2023-04-01
publisher	American Physical Society
record_format	Article
series	Physical Review Research
spelling	doaj.art-17c6a9478d084b8ca351309039ffd2f62024-04-12T17:30:08ZengAmerican Physical SocietyPhysical Review Research2643-15642023-04-015202302310.1103/PhysRevResearch.5.023023Statistical physics of deep neural networks: Initialization toward optimal channelsKangyu WengAohua ChengZiyang ZhangPei SunYang TianIn deep learning, neural networks serve as noisy channels between input data and its latent representation. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts are concentrated on realizing optimal channel properties during network optimization, we study a frequently overlooked possibility that neural networks can be initialized toward optimal channels. Our theory, consistent with experimental validation, identifies primary mechanics underlying this unknown possibility and suggests intrinsic connections between statistical physics and deep learning. Unlike the conventional theories that characterize neural networks applying the classic mean-field approximation, we offer analytic proof that this extensively applied simplification scheme is not appropriate in studying neural networks as information channels. To fill this gap, we develop a restricted mean-field framework applicable for characterizing the limiting behaviors of information propagation in neural networks without strong assumptions on inputs. Based on it, we propose an analytic theory to prove that mutual information maximization is realized between inputs and propagated signals when neural networks are initialized at dynamic isometry, a case where information transmits via norm-preserving mappings. These theoretical predictions are validated by experiments on real neural networks, suggesting the robustness of our theory against finite-size effects. Finally, we analyze our findings with information bottleneck theory to confirm the precise relations among dynamic isometry, mutual information maximization, and optimal channel properties in deep learning. Our work may lay a cornerstone for promoting deep learning in terms of network initialization and suggest general statistical physics mechanisms underlying diverse deep learning techniques.http://doi.org/10.1103/PhysRevResearch.5.023023
spellingShingle	Kangyu Weng Aohua Cheng Ziyang Zhang Pei Sun Yang Tian Statistical physics of deep neural networks: Initialization toward optimal channels Physical Review Research
title	Statistical physics of deep neural networks: Initialization toward optimal channels
title_full	Statistical physics of deep neural networks: Initialization toward optimal channels
title_fullStr	Statistical physics of deep neural networks: Initialization toward optimal channels
title_full_unstemmed	Statistical physics of deep neural networks: Initialization toward optimal channels
title_short	Statistical physics of deep neural networks: Initialization toward optimal channels
title_sort	statistical physics of deep neural networks initialization toward optimal channels
url	http://doi.org/10.1103/PhysRevResearch.5.023023
work_keys_str_mv	AT kangyuweng statisticalphysicsofdeepneuralnetworksinitializationtowardoptimalchannels AT aohuacheng statisticalphysicsofdeepneuralnetworksinitializationtowardoptimalchannels AT ziyangzhang statisticalphysicsofdeepneuralnetworksinitializationtowardoptimalchannels AT peisun statisticalphysicsofdeepneuralnetworksinitializationtowardoptimalchannels AT yangtian statisticalphysicsofdeepneuralnetworksinitializationtowardoptimalchannels

Statistical physics of deep neural networks: Initialization toward optimal channels

Similar Items