Conditionally elicitable dynamic risk measures for deep reinforcement learning
We propose a novel framework to solve risk-sensitive reinforcement learning problems where the agent optimizes time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penaliz...
Main Authors: | , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Society for Industrial and Applied Mathematics
2023
|