Counterfactual off-policy evaluation with gumbel-max structural causal models
We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating co...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
MLResearch Press
2021
|
Online Access: | https://hdl.handle.net/1721.1/130437 |