AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior

Overfitting of neural networks to training data is one of the most significant problems in machine learning. Bayesian neural networks (BNNs) are known to be robust against overfitting owing to their ability to model parameter uncertainty. Bayes by Backprop (BBB), a simple variational inference appro...

Full description

Bibliographic Details
Main Authors: Keigo Nishida, Makoto Taiji
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9874837/
_version_ 1811211428358520832
author Keigo Nishida
Makoto Taiji
author_facet Keigo Nishida
Makoto Taiji
author_sort Keigo Nishida
collection DOAJ
description Overfitting of neural networks to training data is one of the most significant problems in machine learning. Bayesian neural networks (BNNs) are known to be robust against overfitting owing to their ability to model parameter uncertainty. Bayes by Backprop (BBB), a simple variational inference approach that optimizes variational parameters by backpropagation, has been proposed to train BNNs. However, many studies have encountered challenges in terms of variational inference for large-scale models, such as deep learning. Thus, this study proposed Adam with decoupled Bayes by Backprop (AdamB) to stabilize the training of BNNs through the application of the Adam estimator evaluation to the gradient of the neural network. The proposed approach stabilized the noisy gradient of the BBB and mitigated excess changes in the parameters. In addition, AdamB combined with a Gaussian scale mixture as a prior distribution can suppress the intrinsic increase in variational parameters. The proposed AdamB exhibited superior stability compared to training using Adam with vanilla BBB. Further, the covariate shift benchmark using image classification tasks indicated the higher reliability of AdamB than deep ensembles in the case of noise-type covariate shifts. The considerations for stable learning of BNNs by AdamB shown in image classification tasks are expected to be important insights for application to other domains.
first_indexed 2024-04-12T05:12:52Z
format Article
id doaj.art-b5af2bf7c60747b2823443506a48088f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T05:12:52Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b5af2bf7c60747b2823443506a48088f2022-12-22T03:46:42ZengIEEEIEEE Access2169-35362022-01-0110929599297010.1109/ACCESS.2022.32034849874837AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture PriorKeigo Nishida0https://orcid.org/0000-0001-9262-3392Makoto Taiji1Graduate School of Frontier Biosciences, Osaka University, Suita, JapanRIKEN Center for Biosystems Dynamics Research (BDR), Suita, JapanOverfitting of neural networks to training data is one of the most significant problems in machine learning. Bayesian neural networks (BNNs) are known to be robust against overfitting owing to their ability to model parameter uncertainty. Bayes by Backprop (BBB), a simple variational inference approach that optimizes variational parameters by backpropagation, has been proposed to train BNNs. However, many studies have encountered challenges in terms of variational inference for large-scale models, such as deep learning. Thus, this study proposed Adam with decoupled Bayes by Backprop (AdamB) to stabilize the training of BNNs through the application of the Adam estimator evaluation to the gradient of the neural network. The proposed approach stabilized the noisy gradient of the BBB and mitigated excess changes in the parameters. In addition, AdamB combined with a Gaussian scale mixture as a prior distribution can suppress the intrinsic increase in variational parameters. The proposed AdamB exhibited superior stability compared to training using Adam with vanilla BBB. Further, the covariate shift benchmark using image classification tasks indicated the higher reliability of AdamB than deep ensembles in the case of noise-type covariate shifts. The considerations for stable learning of BNNs by AdamB shown in image classification tasks are expected to be important insights for application to other domains.https://ieeexplore.ieee.org/document/9874837/Bayesian neural networkscovariate shiftdecoupled weight decaydeep neural networksuncertaintyvariational inference
spellingShingle Keigo Nishida
Makoto Taiji
AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
IEEE Access
Bayesian neural networks
covariate shift
decoupled weight decay
deep neural networks
uncertainty
variational inference
title AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_full AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_fullStr AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_full_unstemmed AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_short AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_sort adamb decoupled bayes by backprop with gaussian scale mixture prior
topic Bayesian neural networks
covariate shift
decoupled weight decay
deep neural networks
uncertainty
variational inference
url https://ieeexplore.ieee.org/document/9874837/
work_keys_str_mv AT keigonishida adambdecoupledbayesbybackpropwithgaussianscalemixtureprior
AT makototaiji adambdecoupledbayesbybackpropwithgaussianscalemixtureprior