Counterfactual off-policy evaluation with gumbel-max structural causal models

We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating co...

Full description

Bibliographic Details
Main Authors: Oberst, Michael, Sontag, David Alexander
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: MLResearch Press 2021
Online Access:https://hdl.handle.net/1721.1/130437