Average-reward off-policy policy evaluation with function approximation

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly tri...

Full description

Bibliographic Details
Main Authors: Zhang, S, Wan, Y, Sutton, RS, Whiteson, S
Format: Conference item
Language:English
Published: PMLR 2021