Representation Discovery for Kernel-Based Reinforcement Learning
Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest...
Main Authors: | , |
---|---|
Other Authors: | |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/100053 |
_version_ | 1811089012271611904 |
---|---|
author | Zewdie, Dawit H. Konidaris, George |
author2 | Leslie Kaelbling |
author_facet | Leslie Kaelbling Zewdie, Dawit H. Konidaris, George |
author_sort | Zewdie, Dawit H. |
collection | MIT |
description | Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest satisfy this requirement in their natural representation. In this paper we define Value-Consistent Pseudometric (VCPM), the distance function corresponding to a transformation of the domain into a space where the target function is maximally smooth and thus well-approximated by kernel regression. We then present DKBRL, an iterative batch RL algorithm interleaving steps of Kernel-Based Reinforcement Learning and distance metric adjustment. We evaluate its performance on Acrobot and PinBall, continuous-space reinforcement learning domains with discontinuous value functions. |
first_indexed | 2024-09-23T14:11:39Z |
id | mit-1721.1/100053 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:11:39Z |
publishDate | 2015 |
record_format | dspace |
spelling | mit-1721.1/1000532019-04-10T21:28:02Z Representation Discovery for Kernel-Based Reinforcement Learning Zewdie, Dawit H. Konidaris, George Leslie Kaelbling Learning and Intelligent Systems Metric learning Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest satisfy this requirement in their natural representation. In this paper we define Value-Consistent Pseudometric (VCPM), the distance function corresponding to a transformation of the domain into a space where the target function is maximally smooth and thus well-approximated by kernel regression. We then present DKBRL, an iterative batch RL algorithm interleaving steps of Kernel-Based Reinforcement Learning and distance metric adjustment. We evaluate its performance on Acrobot and PinBall, continuous-space reinforcement learning domains with discontinuous value functions. 2015-11-30T19:30:04Z 2015-11-30T19:30:04Z 2015-11-24 2015-11-30T19:30:04Z http://hdl.handle.net/1721.1/100053 MIT-CSAIL-TR-2015-032 Creative Commons Attribution-ShareAlike 4.0 International http://creativecommons.org/licenses/by-sa/4.0/ 16 p. application/pdf |
spellingShingle | Metric learning Zewdie, Dawit H. Konidaris, George Representation Discovery for Kernel-Based Reinforcement Learning |
title | Representation Discovery for Kernel-Based Reinforcement Learning |
title_full | Representation Discovery for Kernel-Based Reinforcement Learning |
title_fullStr | Representation Discovery for Kernel-Based Reinforcement Learning |
title_full_unstemmed | Representation Discovery for Kernel-Based Reinforcement Learning |
title_short | Representation Discovery for Kernel-Based Reinforcement Learning |
title_sort | representation discovery for kernel based reinforcement learning |
topic | Metric learning |
url | http://hdl.handle.net/1721.1/100053 |
work_keys_str_mv | AT zewdiedawith representationdiscoveryforkernelbasedreinforcementlearning AT konidarisgeorge representationdiscoveryforkernelbasedreinforcementlearning |