Learning dynamics and generalization in reinforcement learning
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension betwee...
Main Authors: | , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Journal of Machine Learning Research
2022
|
_version_ | 1826308536729075712 |
---|---|
author | Lyle, C Rowland, M Dabney, W Kwiatkowska, M Gal, Y |
author_facet | Lyle, C Rowland, M Dabney, W Kwiatkowska, M Gal, Y |
author_sort | Lyle, C |
collection | OXFORD |
description | Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning en- courages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that it is the nature of the TD targets themselves that discourages generalization. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization performance to novel environments in the ProcGen suite and improves robustness to input perturbations. |
first_indexed | 2024-03-07T07:20:56Z |
format | Conference item |
id | oxford-uuid:2401c469-e66a-4e03-a8d4-bacc35bb4a2e |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:20:56Z |
publishDate | 2022 |
publisher | Journal of Machine Learning Research |
record_format | dspace |
spelling | oxford-uuid:2401c469-e66a-4e03-a8d4-bacc35bb4a2e2022-10-18T15:37:10ZLearning dynamics and generalization in reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:2401c469-e66a-4e03-a8d4-bacc35bb4a2eEnglishSymplectic ElementsJournal of Machine Learning Research2022Lyle, CRowland, MDabney, WKwiatkowska, MGal, YSolving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning en- courages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that it is the nature of the TD targets themselves that discourages generalization. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization performance to novel environments in the ProcGen suite and improves robustness to input perturbations. |
spellingShingle | Lyle, C Rowland, M Dabney, W Kwiatkowska, M Gal, Y Learning dynamics and generalization in reinforcement learning |
title | Learning dynamics and generalization in reinforcement learning |
title_full | Learning dynamics and generalization in reinforcement learning |
title_fullStr | Learning dynamics and generalization in reinforcement learning |
title_full_unstemmed | Learning dynamics and generalization in reinforcement learning |
title_short | Learning dynamics and generalization in reinforcement learning |
title_sort | learning dynamics and generalization in reinforcement learning |
work_keys_str_mv | AT lylec learningdynamicsandgeneralizationinreinforcementlearning AT rowlandm learningdynamicsandgeneralizationinreinforcementlearning AT dabneyw learningdynamicsandgeneralizationinreinforcementlearning AT kwiatkowskam learningdynamicsandgeneralizationinreinforcementlearning AT galy learningdynamicsandgeneralizationinreinforcementlearning |