Conditionally elicitable dynamic risk measures for deep reinforcement learning
We propose a novel framework to solve risk-sensitive reinforcement learning problems where the agent optimizes time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penaliz...
Huvudupphovsmän: | , , |
---|---|
Materialtyp: | Journal article |
Språk: | English |
Publicerad: |
Society for Industrial and Applied Mathematics
2023
|