Measuring and characterizing generalization in deep reinforcement learning

Abstract Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is...

Full description

Bibliographic Details
Main Authors: Sam Witty, Jun K. Lee, Emma Tosch, Akanksha Atrey, Kaleigh Clary, Michael L. Littman, David Jensen
Format: Article
Language:English
Published: Wiley 2021-12-01
Series:Applied AI Letters
Subjects:
Online Access:https://doi.org/10.1002/ail2.45
_version_ 1798024303843737600
author Sam Witty
Jun K. Lee
Emma Tosch
Akanksha Atrey
Kaleigh Clary
Michael L. Littman
David Jensen
author_facet Sam Witty
Jun K. Lee
Emma Tosch
Akanksha Atrey
Kaleigh Clary
Michael L. Littman
David Jensen
author_sort Sam Witty
collection DOAJ
description Abstract Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on‐policy, off‐policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on‐policy states, even though those states are not selected adversarially. We focus our analyses on the deep Q‐networks (DQNs) that kicked off the modern era of deep RL. Taken together, these results call into question the extent to which DQNs learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.
first_indexed 2024-04-11T18:00:14Z
format Article
id doaj.art-dcc84e508f954f67ae0cf22ba146b1d4
institution Directory Open Access Journal
issn 2689-5595
language English
last_indexed 2024-04-11T18:00:14Z
publishDate 2021-12-01
publisher Wiley
record_format Article
series Applied AI Letters
spelling doaj.art-dcc84e508f954f67ae0cf22ba146b1d42022-12-22T04:10:32ZengWileyApplied AI Letters2689-55952021-12-0124n/an/a10.1002/ail2.45Measuring and characterizing generalization in deep reinforcement learningSam Witty0Jun K. Lee1Emma Tosch2Akanksha Atrey3Kaleigh Clary4Michael L. Littman5David Jensen6College of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USADepartment of Computer Science Brown University Providence Rhode Island USACollege of Engineering and Mathematical Sciences University of Vermont Burlington Vermont USACollege of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USACollege of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USADepartment of Computer Science Brown University Providence Rhode Island USACollege of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USAAbstract Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on‐policy, off‐policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on‐policy states, even though those states are not selected adversarially. We focus our analyses on the deep Q‐networks (DQNs) that kicked off the modern era of deep RL. Taken together, these results call into question the extent to which DQNs learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.https://doi.org/10.1002/ail2.45deep reinforcement learningempirical methodsgeneralizationQ‐networks
spellingShingle Sam Witty
Jun K. Lee
Emma Tosch
Akanksha Atrey
Kaleigh Clary
Michael L. Littman
David Jensen
Measuring and characterizing generalization in deep reinforcement learning
Applied AI Letters
deep reinforcement learning
empirical methods
generalization
Q‐networks
title Measuring and characterizing generalization in deep reinforcement learning
title_full Measuring and characterizing generalization in deep reinforcement learning
title_fullStr Measuring and characterizing generalization in deep reinforcement learning
title_full_unstemmed Measuring and characterizing generalization in deep reinforcement learning
title_short Measuring and characterizing generalization in deep reinforcement learning
title_sort measuring and characterizing generalization in deep reinforcement learning
topic deep reinforcement learning
empirical methods
generalization
Q‐networks
url https://doi.org/10.1002/ail2.45
work_keys_str_mv AT samwitty measuringandcharacterizinggeneralizationindeepreinforcementlearning
AT junklee measuringandcharacterizinggeneralizationindeepreinforcementlearning
AT emmatosch measuringandcharacterizinggeneralizationindeepreinforcementlearning
AT akankshaatrey measuringandcharacterizinggeneralizationindeepreinforcementlearning
AT kaleighclary measuringandcharacterizinggeneralizationindeepreinforcementlearning
AT michaelllittman measuringandcharacterizinggeneralizationindeepreinforcementlearning
AT davidjensen measuringandcharacterizinggeneralizationindeepreinforcementlearning