Automatic shaping and decomposition of reward functions
This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we cons...
Main Author: | |
---|---|
Other Authors: | |
Published: |
2007
|
Online Access: | http://hdl.handle.net/1721.1/35890 |
_version_ | 1811075017923887104 |
---|---|
author | Marthi, Bhaskara |
author2 | Leslie Kaelbling |
author_facet | Leslie Kaelbling Marthi, Bhaskara |
author_sort | Marthi, Bhaskara |
collection | MIT |
description | This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults. |
first_indexed | 2024-09-23T09:59:02Z |
id | mit-1721.1/35890 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:59:02Z |
publishDate | 2007 |
record_format | dspace |
spelling | mit-1721.1/358902019-04-10T09:58:54Z Automatic shaping and decomposition of reward functions Marthi, Bhaskara Leslie Kaelbling Learning and Intelligent Systems This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults. 2007-02-13T19:01:57Z 2007-02-13T19:01:57Z 2007-02-13 MIT-CSAIL-TR-2007-010 http://hdl.handle.net/1721.1/35890 Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory 8 p. application/pdf application/postscript |
spellingShingle | Marthi, Bhaskara Automatic shaping and decomposition of reward functions |
title | Automatic shaping and decomposition of reward functions |
title_full | Automatic shaping and decomposition of reward functions |
title_fullStr | Automatic shaping and decomposition of reward functions |
title_full_unstemmed | Automatic shaping and decomposition of reward functions |
title_short | Automatic shaping and decomposition of reward functions |
title_sort | automatic shaping and decomposition of reward functions |
url | http://hdl.handle.net/1721.1/35890 |
work_keys_str_mv | AT marthibhaskara automaticshapinganddecompositionofrewardfunctions |