Kernel-based approximate dynamic programming using Bellman residual elimination

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010.

Bibliografiske detaljer
Hovedforfatter: Bethke, Brett (Brett M.)
Andre forfattere: Jonathan P. How.
Format: Thesis
Sprog:eng
Udgivet: Massachusetts Institute of Technology 2010
Fag:
Online adgang:http://hdl.handle.net/1721.1/57544
_version_ 1826211046019301376
author Bethke, Brett (Brett M.)
author2 Jonathan P. How.
author_facet Jonathan P. How.
Bethke, Brett (Brett M.)
author_sort Bethke, Brett (Brett M.)
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010.
first_indexed 2024-09-23T15:00:01Z
format Thesis
id mit-1721.1/57544
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T15:00:01Z
publishDate 2010
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/575442019-04-12T22:07:41Z Kernel-based approximate dynamic programming using Bellman residual elimination Bethke, Brett (Brett M.) Jonathan P. How. Massachusetts Institute of Technology. Dept. of Aeronautics and Astronautics. Massachusetts Institute of Technology. Dept. of Aeronautics and Astronautics. Aeronautics and Astronautics. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student submitted PDF version of thesis. Includes bibliographical references (p. 207-221). Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable. (cont.) We present theoretical analysis of all BRE algorithms proving convergence to the optimal policy in the limit of sampling the entire state space, and show computational results on several benchmark problems. Another challenge in implementing control policies based on MDPs is that there may be parameters of the system model that are poorly known and/or vary with time as the system operates. System performance can suer if the model used to compute the policy differs from the true model. To address this challenge, we develop an adaptive architecture that allows for online MDP model learning and simultaneous re-computation of the policy. As a result, the adaptive architecture allows the system to continuously re-tune its control policy to account for better model information 3 obtained through observations of the actual system in operation, and react to changes in the model as they occur. Planning in complex, large-scale multi-agent robotic systems is another focus of the thesis. In particular, we investigate the persistent surveillance problem, in which one or more unmanned aerial vehicles (UAVs) and/or unmanned ground vehicles (UGVs) must provide sensor coverage over a designated location on a continuous basis. This continuous coverage must be maintained even in the event that agents suer failures over the course of the mission. The persistent surveillance problem is pertinent to a number of applications, including search and rescue, natural disaster relief operations, urban traffic monitoring, etc. (cont.) Using both simulations and actual flight experiments conducted in the MIT RAVEN indoor flight facility, we demonstrate the successful application of the BRE algorithms and the adaptive MDP architecture in achieving high mission performance despite the random occurrence of failures. Furthermore, we demonstrate performance benefits of our approach over a deterministic planning approach that does not account for these failures. by Brett M. Bethke. Ph.D. 2010-08-26T15:21:54Z 2010-08-26T15:21:54Z 2010 2010 Thesis http://hdl.handle.net/1721.1/57544 639230464 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 221 p. application/pdf Massachusetts Institute of Technology
spellingShingle Aeronautics and Astronautics.
Bethke, Brett (Brett M.)
Kernel-based approximate dynamic programming using Bellman residual elimination
title Kernel-based approximate dynamic programming using Bellman residual elimination
title_full Kernel-based approximate dynamic programming using Bellman residual elimination
title_fullStr Kernel-based approximate dynamic programming using Bellman residual elimination
title_full_unstemmed Kernel-based approximate dynamic programming using Bellman residual elimination
title_short Kernel-based approximate dynamic programming using Bellman residual elimination
title_sort kernel based approximate dynamic programming using bellman residual elimination
topic Aeronautics and Astronautics.
url http://hdl.handle.net/1721.1/57544
work_keys_str_mv AT bethkebrettbrettm kernelbasedapproximatedynamicprogrammingusingbellmanresidualelimination