Conditionally elicitable dynamic risk measures for deep reinforcement learning
We propose a novel framework to solve risk-sensitive reinforcement learning problems where the agent optimizes time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penaliz...
Główni autorzy: | , , |
---|---|
Format: | Journal article |
Język: | English |
Wydane: |
Society for Industrial and Applied Mathematics
2023
|