Conditionally elicitable dynamic risk measures for deep reinforcement learning

We propose a novel framework to solve risk-sensitive reinforcement learning problems where the agent optimizes time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penaliz...

Full beskrivning

Bibliografiska uppgifter
Huvudupphovsmän: Coache, A, Jaimungal, S, Cartea, Á
Materialtyp: Journal article
Språk:English
Publicerad: Society for Industrial and Applied Mathematics 2023