Automatic shaping and decomposition of reward functions

This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we cons...

Full description

Bibliographic Details
Main Author: Marthi, Bhaskara
Other Authors: Leslie Kaelbling
Published: 2007
Online Access:http://hdl.handle.net/1721.1/35890
_version_ 1811075017923887104
author Marthi, Bhaskara
author2 Leslie Kaelbling
author_facet Leslie Kaelbling
Marthi, Bhaskara
author_sort Marthi, Bhaskara
collection MIT
description This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults.
first_indexed 2024-09-23T09:59:02Z
id mit-1721.1/35890
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:59:02Z
publishDate 2007
record_format dspace
spelling mit-1721.1/358902019-04-10T09:58:54Z Automatic shaping and decomposition of reward functions Marthi, Bhaskara Leslie Kaelbling Learning and Intelligent Systems This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults. 2007-02-13T19:01:57Z 2007-02-13T19:01:57Z 2007-02-13 MIT-CSAIL-TR-2007-010 http://hdl.handle.net/1721.1/35890 Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory 8 p. application/pdf application/postscript
spellingShingle Marthi, Bhaskara
Automatic shaping and decomposition of reward functions
title Automatic shaping and decomposition of reward functions
title_full Automatic shaping and decomposition of reward functions
title_fullStr Automatic shaping and decomposition of reward functions
title_full_unstemmed Automatic shaping and decomposition of reward functions
title_short Automatic shaping and decomposition of reward functions
title_sort automatic shaping and decomposition of reward functions
url http://hdl.handle.net/1721.1/35890
work_keys_str_mv AT marthibhaskara automaticshapinganddecompositionofrewardfunctions