Automatic shaping and decomposition of reward functions

This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we cons...

Full description

Bibliographic Details
Main Author:	Marthi, Bhaskara
Other Authors:	Leslie Kaelbling
Published:	2007
Online Access:	http://hdl.handle.net/1721.1/35890

_version_	1811075017923887104
author	Marthi, Bhaskara
author2	Leslie Kaelbling
author_facet	Leslie Kaelbling Marthi, Bhaskara
author_sort	Marthi, Bhaskara
collection	MIT
description	This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults.
first_indexed	2024-09-23T09:59:02Z
id	mit-1721.1/35890
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T09:59:02Z
publishDate	2007
record_format	dspace
spelling	mit-1721.1/358902019-04-10T09:58:54Z Automatic shaping and decomposition of reward functions Marthi, Bhaskara Leslie Kaelbling Learning and Intelligent Systems This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults. 2007-02-13T19:01:57Z 2007-02-13T19:01:57Z 2007-02-13 MIT-CSAIL-TR-2007-010 http://hdl.handle.net/1721.1/35890 Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory 8 p. application/pdf application/postscript
spellingShingle	Marthi, Bhaskara Automatic shaping and decomposition of reward functions
title	Automatic shaping and decomposition of reward functions
title_full	Automatic shaping and decomposition of reward functions
title_fullStr	Automatic shaping and decomposition of reward functions
title_full_unstemmed	Automatic shaping and decomposition of reward functions
title_short	Automatic shaping and decomposition of reward functions
title_sort	automatic shaping and decomposition of reward functions
url	http://hdl.handle.net/1721.1/35890
work_keys_str_mv	AT marthibhaskara automaticshapinganddecompositionofrewardfunctions

Automatic shaping and decomposition of reward functions

Similar Items