On Sequential Bayesian Inference for Continual Learning
Sequential Bayesian inference can be used for <i>continual learning</i> to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and assess whether using the previous task’s posterior as a prior fo...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/25/6/884 |
_version_ | 1797594945492615168 |
---|---|
author | Samuel Kessler Adam Cobb Tim G. J. Rudner Stefan Zohren Stephen J. Roberts |
author_facet | Samuel Kessler Adam Cobb Tim G. J. Rudner Stefan Zohren Stephen J. Roberts |
author_sort | Samuel Kessler |
collection | DOAJ |
description | Sequential Bayesian inference can be used for <i>continual learning</i> to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and assess whether using the previous task’s posterior as a prior for a new task can prevent catastrophic forgetting in Bayesian neural networks. Our first contribution is to perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by approximating the posterior via fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting, demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there, we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification, which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. Our final contribution is to propose a simple baseline called <i>Prototypical Bayesian Continual Learning</i>, which is competitive with the best performing Bayesian continual learning methods on class incremental continual learning computer vision benchmarks. |
first_indexed | 2024-03-11T02:29:49Z |
format | Article |
id | doaj.art-8aa5faa53f174985834dac771fb772be |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-11T02:29:49Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-8aa5faa53f174985834dac771fb772be2023-11-18T10:17:47ZengMDPI AGEntropy1099-43002023-05-0125688410.3390/e25060884On Sequential Bayesian Inference for Continual LearningSamuel Kessler0Adam Cobb1Tim G. J. Rudner2Stefan Zohren3Stephen J. Roberts4Department of Engineering Science, University of Oxford, Oxford OX2 6ED, UKSRI International, Arlington, VA 22209, USADepartment of Computer Science, University of Oxford, Oxford OX1 3QG, UKDepartment of Engineering Science, University of Oxford, Oxford OX2 6ED, UKDepartment of Engineering Science, University of Oxford, Oxford OX2 6ED, UKSequential Bayesian inference can be used for <i>continual learning</i> to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and assess whether using the previous task’s posterior as a prior for a new task can prevent catastrophic forgetting in Bayesian neural networks. Our first contribution is to perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by approximating the posterior via fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting, demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there, we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification, which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. Our final contribution is to propose a simple baseline called <i>Prototypical Bayesian Continual Learning</i>, which is competitive with the best performing Bayesian continual learning methods on class incremental continual learning computer vision benchmarks.https://www.mdpi.com/1099-4300/25/6/884continual learninglifelong learningsequential Bayesian inferenceBayesian deep learningBayesian neural networks |
spellingShingle | Samuel Kessler Adam Cobb Tim G. J. Rudner Stefan Zohren Stephen J. Roberts On Sequential Bayesian Inference for Continual Learning Entropy continual learning lifelong learning sequential Bayesian inference Bayesian deep learning Bayesian neural networks |
title | On Sequential Bayesian Inference for Continual Learning |
title_full | On Sequential Bayesian Inference for Continual Learning |
title_fullStr | On Sequential Bayesian Inference for Continual Learning |
title_full_unstemmed | On Sequential Bayesian Inference for Continual Learning |
title_short | On Sequential Bayesian Inference for Continual Learning |
title_sort | on sequential bayesian inference for continual learning |
topic | continual learning lifelong learning sequential Bayesian inference Bayesian deep learning Bayesian neural networks |
url | https://www.mdpi.com/1099-4300/25/6/884 |
work_keys_str_mv | AT samuelkessler onsequentialbayesianinferenceforcontinuallearning AT adamcobb onsequentialbayesianinferenceforcontinuallearning AT timgjrudner onsequentialbayesianinferenceforcontinuallearning AT stefanzohren onsequentialbayesianinferenceforcontinuallearning AT stephenjroberts onsequentialbayesianinferenceforcontinuallearning |