Measuring and characterizing generalization in deep reinforcement learning
Abstract Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-12-01
|
Series: | Applied AI Letters |
Subjects: | |
Online Access: | https://doi.org/10.1002/ail2.45 |
_version_ | 1798024303843737600 |
---|---|
author | Sam Witty Jun K. Lee Emma Tosch Akanksha Atrey Kaleigh Clary Michael L. Littman David Jensen |
author_facet | Sam Witty Jun K. Lee Emma Tosch Akanksha Atrey Kaleigh Clary Michael L. Littman David Jensen |
author_sort | Sam Witty |
collection | DOAJ |
description | Abstract Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on‐policy, off‐policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on‐policy states, even though those states are not selected adversarially. We focus our analyses on the deep Q‐networks (DQNs) that kicked off the modern era of deep RL. Taken together, these results call into question the extent to which DQNs learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported. |
first_indexed | 2024-04-11T18:00:14Z |
format | Article |
id | doaj.art-dcc84e508f954f67ae0cf22ba146b1d4 |
institution | Directory Open Access Journal |
issn | 2689-5595 |
language | English |
last_indexed | 2024-04-11T18:00:14Z |
publishDate | 2021-12-01 |
publisher | Wiley |
record_format | Article |
series | Applied AI Letters |
spelling | doaj.art-dcc84e508f954f67ae0cf22ba146b1d42022-12-22T04:10:32ZengWileyApplied AI Letters2689-55952021-12-0124n/an/a10.1002/ail2.45Measuring and characterizing generalization in deep reinforcement learningSam Witty0Jun K. Lee1Emma Tosch2Akanksha Atrey3Kaleigh Clary4Michael L. Littman5David Jensen6College of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USADepartment of Computer Science Brown University Providence Rhode Island USACollege of Engineering and Mathematical Sciences University of Vermont Burlington Vermont USACollege of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USACollege of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USADepartment of Computer Science Brown University Providence Rhode Island USACollege of Information and Computer Sciences University of Massachusetts Amherst Amherst Massachusetts USAAbstract Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on‐policy, off‐policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on‐policy states, even though those states are not selected adversarially. We focus our analyses on the deep Q‐networks (DQNs) that kicked off the modern era of deep RL. Taken together, these results call into question the extent to which DQNs learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.https://doi.org/10.1002/ail2.45deep reinforcement learningempirical methodsgeneralizationQ‐networks |
spellingShingle | Sam Witty Jun K. Lee Emma Tosch Akanksha Atrey Kaleigh Clary Michael L. Littman David Jensen Measuring and characterizing generalization in deep reinforcement learning Applied AI Letters deep reinforcement learning empirical methods generalization Q‐networks |
title | Measuring and characterizing generalization in deep reinforcement learning |
title_full | Measuring and characterizing generalization in deep reinforcement learning |
title_fullStr | Measuring and characterizing generalization in deep reinforcement learning |
title_full_unstemmed | Measuring and characterizing generalization in deep reinforcement learning |
title_short | Measuring and characterizing generalization in deep reinforcement learning |
title_sort | measuring and characterizing generalization in deep reinforcement learning |
topic | deep reinforcement learning empirical methods generalization Q‐networks |
url | https://doi.org/10.1002/ail2.45 |
work_keys_str_mv | AT samwitty measuringandcharacterizinggeneralizationindeepreinforcementlearning AT junklee measuringandcharacterizinggeneralizationindeepreinforcementlearning AT emmatosch measuringandcharacterizinggeneralizationindeepreinforcementlearning AT akankshaatrey measuringandcharacterizinggeneralizationindeepreinforcementlearning AT kaleighclary measuringandcharacterizinggeneralizationindeepreinforcementlearning AT michaelllittman measuringandcharacterizinggeneralizationindeepreinforcementlearning AT davidjensen measuringandcharacterizinggeneralizationindeepreinforcementlearning |