Policy Improvement for POMDPs Using Normalized Importance Sampling

Policy Improvement for POMDPs Using Normalized Importance Sampling

We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimat...

Full description

Bibliographic Details
Main Author:	Shelton, Christian R.
Language:	en_US
Published:	2004
Online Access:	http://hdl.handle.net/1721.1/7218

Similar Items

Sampling-based algorithms for continuous-time POMDPs
by: Chaudhari, Pratik Anil, et al.
Published: (2013)

Stick-breaking policy learning in Dec-POMDPs
by: Amato, Christopher, et al.
Published: (2016)

Policy Evaluation in Decentralized POMDPs With Belief Sharing
by: Mert Kayaalp, et al.
Published: (2023-01-01)

An online algorithm for constrained POMDPs
by: Undurti, Aditya, et al.
Published: (2011)

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models
by: Kurniawati, Hanna, et al.
Published: (2019)

Diagnostic Policies Optimization for Chronic Diseases Based on POMDP Model
by: Wenqian Zhang, et al.
Published: (2022-02-01)

RAO*: an Algorithm for Chance-Constrained POMDP’s
by: Santana, Pedro, et al.
Published: (2016)

Monte-Carlo planning in large POMDPs
by: Silver, David, et al.
Published: (2015)

Planning with Macro-Actions in Decentralized POMDPs
by: Amato, Christopher, et al.
Published: (2016)

Personalized Cotesting Policies for Cervical Cancer Screening: A POMDP Approach
by: Malek Ebadi, et al.
Published: (2021-03-01)

Deep variational reinforcement learning for POMDPs
by: Igl, M, et al.
Published: (2018)

Improved Deep Recurrent Q-Network of POMDPs for Automated Penetration Testing
by: Yue Zhang, et al.
Published: (2022-10-01)

Modeling and Planning with Macro-Actions in Decentralized POMDPs
by: Amato, Christopher, et al.
Published: (2021)

Safe POMDP online planning via shielding
by: Sheng, S, et al.
Published: (2024)

Trust oriented decision making via POMDPs
by: Aravazhi Irissappane, Athirai
Published: (2016)

Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation
by: Theocharous, Georgios, et al.
Published: (2005)

Forward and Backward Bellman Equations Improve the Efficiency of the EM Algorithm for DEC-POMDP
by: Takehiro Tottori, et al.
Published: (2021-04-01)

Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs
by: Oliehoek, Frans A., et al.
Published: (2013)

DGA domain detection and botnet prevention using Q-learning for POMDP
by: Y. V. Bubnov, et al.
Published: (2021-03-01)

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
by: Pineau, Joelle, et al.
Published: (2017)

Multi-Agent Active Perception Based on Reinforcement Learning and POMDP
by: Tarik Selimovic, et al.
Published: (2024-01-01)

Efficient POMDP Forward Search by Predicting the Posterior Belief Distribution
by: Roy, Nicholas, et al.
Published: (2009)

Word-Based POMDP Dialog Management via Hybrid Learning
by: Shuyu Lei, et al.
Published: (2019-01-01)

Importance Sampling for Reinforcement Learning with Multiple Objectives
by: Shelton, Christian Robert
Published: (2004)

CAR-DESPOT: causally-informed online POMDP planning for robots in confounded environments
by: Cannizzaro, R, et al.
Published: (2023)

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs
by: Wang, Yunbo, et al.
Published: (2021)

A POMDP Framework for Coordinated Guidance of Autonomous UAVs for Multitarget Tracking
Published: (2009-03-01)

Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs
by: Agha-mohammadi, Ali-akbar, et al.
Published: (2016)

Importance sampling for reinforcement learning with multiple objectives
by: Shelton, Christian R. (Christian Robert), 1975-
Published: (2014)

Decentralized Coordination of Multi-Agent Systems Based on POMDPs and Consensus for Active Perception
by: Marijana Peti, et al.
Published: (2023-01-01)

Bottom-up learning of hierarchical models in a class of deterministic POMDP environments
by: Itoh Hideaki, et al.
Published: (2015-09-01)

A POMDP-Based Optimization Method for Sequential Diagnostic Strategy With Unreliable Tests
by: Yajun Liang, et al.
Published: (2019-01-01)

Safe POMDP online planning among dynamic agents via adaptive conformal prediction
by: Sheng, S, et al.
Published: (2024)

A Risk-Based Decision-Making Process for Autonomous Trains Using POMDP: Case of the Anti-Collision Function
by: Mohammed Chelouati, et al.
Published: (2024-01-01)

Sorting Objects from a Conveyor Belt Using POMDPs with Multiple-Object Observations and Information-Gain Rewards
by: Ady-Daniel Mezei, et al.
Published: (2020-04-01)

POMDP-Based Real-Time Path Planning for Manipulation of Multiple Microparticles via Optoelectronic Tweezers
by: Jiaxin Liu, et al.
Published: (2022-01-01)

Which states matter? An application of an intelligent discretization method to solve a continuous POMDP in conservation biology.
by: Sam Nicol, et al.
Published: (2012-01-01)

POMDP-Based Throughput Maximization for Cooperative Communications Networks with Energy-Constrained Relay under Attack in the Physical Layer
by: Hoang Thi Huong Giang, et al.
Published: (2018-10-01)

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems Part 2—Applications in Transportation, Industries, Communications and Networking and More Topics
by: Xuanchen Xiang, et al.
Published: (2021-10-01)

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing
by: Xuanchen Xiang, et al.
Published: (2021-07-01)