Average-reward off-policy policy evaluation with function approximation
We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly tri...
Main Authors: | , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
PMLR
2021
|