Representation Discovery for Kernel-Based Reinforcement Learning

Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest...

Full description

Bibliographic Details
Main Authors: Zewdie, Dawit H., Konidaris, George
Other Authors: Leslie Kaelbling
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/1721.1/100053
_version_ 1811089012271611904
author Zewdie, Dawit H.
Konidaris, George
author2 Leslie Kaelbling
author_facet Leslie Kaelbling
Zewdie, Dawit H.
Konidaris, George
author_sort Zewdie, Dawit H.
collection MIT
description Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest satisfy this requirement in their natural representation. In this paper we define Value-Consistent Pseudometric (VCPM), the distance function corresponding to a transformation of the domain into a space where the target function is maximally smooth and thus well-approximated by kernel regression. We then present DKBRL, an iterative batch RL algorithm interleaving steps of Kernel-Based Reinforcement Learning and distance metric adjustment. We evaluate its performance on Acrobot and PinBall, continuous-space reinforcement learning domains with discontinuous value functions.
first_indexed 2024-09-23T14:11:39Z
id mit-1721.1/100053
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:11:39Z
publishDate 2015
record_format dspace
spelling mit-1721.1/1000532019-04-10T21:28:02Z Representation Discovery for Kernel-Based Reinforcement Learning Zewdie, Dawit H. Konidaris, George Leslie Kaelbling Learning and Intelligent Systems Metric learning Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest satisfy this requirement in their natural representation. In this paper we define Value-Consistent Pseudometric (VCPM), the distance function corresponding to a transformation of the domain into a space where the target function is maximally smooth and thus well-approximated by kernel regression. We then present DKBRL, an iterative batch RL algorithm interleaving steps of Kernel-Based Reinforcement Learning and distance metric adjustment. We evaluate its performance on Acrobot and PinBall, continuous-space reinforcement learning domains with discontinuous value functions. 2015-11-30T19:30:04Z 2015-11-30T19:30:04Z 2015-11-24 2015-11-30T19:30:04Z http://hdl.handle.net/1721.1/100053 MIT-CSAIL-TR-2015-032 Creative Commons Attribution-ShareAlike 4.0 International http://creativecommons.org/licenses/by-sa/4.0/ 16 p. application/pdf
spellingShingle Metric learning
Zewdie, Dawit H.
Konidaris, George
Representation Discovery for Kernel-Based Reinforcement Learning
title Representation Discovery for Kernel-Based Reinforcement Learning
title_full Representation Discovery for Kernel-Based Reinforcement Learning
title_fullStr Representation Discovery for Kernel-Based Reinforcement Learning
title_full_unstemmed Representation Discovery for Kernel-Based Reinforcement Learning
title_short Representation Discovery for Kernel-Based Reinforcement Learning
title_sort representation discovery for kernel based reinforcement learning
topic Metric learning
url http://hdl.handle.net/1721.1/100053
work_keys_str_mv AT zewdiedawith representationdiscoveryforkernelbasedreinforcementlearning
AT konidarisgeorge representationdiscoveryforkernelbasedreinforcementlearning