Learning dynamics and generalization in reinforcement learning

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension betwee...

Full description

Bibliographic Details
Main Authors: Lyle, C, Rowland, M, Dabney, W, Kwiatkowska, M, Gal, Y
Format: Conference item
Language:English
Published: Journal of Machine Learning Research 2022
_version_ 1826308536729075712
author Lyle, C
Rowland, M
Dabney, W
Kwiatkowska, M
Gal, Y
author_facet Lyle, C
Rowland, M
Dabney, W
Kwiatkowska, M
Gal, Y
author_sort Lyle, C
collection OXFORD
description Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning en- courages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that it is the nature of the TD targets themselves that discourages generalization. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization performance to novel environments in the ProcGen suite and improves robustness to input perturbations.
first_indexed 2024-03-07T07:20:56Z
format Conference item
id oxford-uuid:2401c469-e66a-4e03-a8d4-bacc35bb4a2e
institution University of Oxford
language English
last_indexed 2024-03-07T07:20:56Z
publishDate 2022
publisher Journal of Machine Learning Research
record_format dspace
spelling oxford-uuid:2401c469-e66a-4e03-a8d4-bacc35bb4a2e2022-10-18T15:37:10ZLearning dynamics and generalization in reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:2401c469-e66a-4e03-a8d4-bacc35bb4a2eEnglishSymplectic ElementsJournal of Machine Learning Research2022Lyle, CRowland, MDabney, WKwiatkowska, MGal, YSolving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning en- courages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that it is the nature of the TD targets themselves that discourages generalization. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization performance to novel environments in the ProcGen suite and improves robustness to input perturbations.
spellingShingle Lyle, C
Rowland, M
Dabney, W
Kwiatkowska, M
Gal, Y
Learning dynamics and generalization in reinforcement learning
title Learning dynamics and generalization in reinforcement learning
title_full Learning dynamics and generalization in reinforcement learning
title_fullStr Learning dynamics and generalization in reinforcement learning
title_full_unstemmed Learning dynamics and generalization in reinforcement learning
title_short Learning dynamics and generalization in reinforcement learning
title_sort learning dynamics and generalization in reinforcement learning
work_keys_str_mv AT lylec learningdynamicsandgeneralizationinreinforcementlearning
AT rowlandm learningdynamicsandgeneralizationinreinforcementlearning
AT dabneyw learningdynamicsandgeneralizationinreinforcementlearning
AT kwiatkowskam learningdynamicsandgeneralizationinreinforcementlearning
AT galy learningdynamicsandgeneralizationinreinforcementlearning