Average-reward off-policy policy evaluation with function approximation

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly tri...

Full description

Bibliographic Details
Main Authors:	Zhang, S, Wan, Y, Sutton, RS, Whiteson, S
Format:	Conference item
Language:	English
Published:	PMLR 2021

Average-reward off-policy policy evaluation with function approximation

Similar Items