Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards

Deep Reinforcement Learning (DRL) has had notable success in sequential learning tasks in applied settings involving high-dimensional state-action spaces, sparking the interest of the finance research community. DRL strategies have been applied to the classical portfolio optimization problem − a...

Full description

Bibliographic Details
Main Author: Lim, Magdalene Hui Qi
Other Authors: Patrick Pun Chi Seng
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175650
_version_ 1811694170937491456
author Lim, Magdalene Hui Qi
author2 Patrick Pun Chi Seng
author_facet Patrick Pun Chi Seng
Lim, Magdalene Hui Qi
author_sort Lim, Magdalene Hui Qi
collection NTU
description Deep Reinforcement Learning (DRL) has had notable success in sequential learning tasks in applied settings involving high-dimensional state-action spaces, sparking the interest of the finance research community. DRL strategies have been applied to the classical portfolio optimization problem − a dynamic, inter-temporal process of determining optimal portfolio allocations to maximize long-run returns. However, all existing DRL portfolio management strategies overlook the underlying interdependencies between subactions that exist in this specific task. We propose a unified framework of 2 existing concepts − autoregressive DRL architectures and learned intrinsic rewards − in order to integrate the benefits of modelling subaction dependencies and modifying the reward function to guide learning. We backtest our proposed strategy against 7 other benchmark strategies, and empirically demonstrate that ours achieves the best risk-adjusted returns. Most remarkably, from median testing results, our proposed strategy is 1 of only 2 approaches that beat market returns, while being exposed to less than a third of market risk. Moreover, we provide insights on the effects of learned intrinsic rewards against the backdrop of the autoregressive DRL architecture, which enables individual intrinsic rewards to be learned at the level of subactions, potentially addressing the credit assignment problem in RL.
first_indexed 2024-10-01T07:03:19Z
format Final Year Project (FYP)
id ntu-10356/175650
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:03:19Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1756502024-05-06T15:37:24Z Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards Lim, Magdalene Hui Qi Patrick Pun Chi Seng School of Physical and Mathematical Sciences Nixie Sapphira Lesmana cspun@ntu.edu.sg, nixiesap@nus.edu.sg Mathematical Sciences Reinforcement learning Portfolio optimization Optimal intrinsic rewards Autoregressive deep reinforcement learning Deep Reinforcement Learning (DRL) has had notable success in sequential learning tasks in applied settings involving high-dimensional state-action spaces, sparking the interest of the finance research community. DRL strategies have been applied to the classical portfolio optimization problem − a dynamic, inter-temporal process of determining optimal portfolio allocations to maximize long-run returns. However, all existing DRL portfolio management strategies overlook the underlying interdependencies between subactions that exist in this specific task. We propose a unified framework of 2 existing concepts − autoregressive DRL architectures and learned intrinsic rewards − in order to integrate the benefits of modelling subaction dependencies and modifying the reward function to guide learning. We backtest our proposed strategy against 7 other benchmark strategies, and empirically demonstrate that ours achieves the best risk-adjusted returns. Most remarkably, from median testing results, our proposed strategy is 1 of only 2 approaches that beat market returns, while being exposed to less than a third of market risk. Moreover, we provide insights on the effects of learned intrinsic rewards against the backdrop of the autoregressive DRL architecture, which enables individual intrinsic rewards to be learned at the level of subactions, potentially addressing the credit assignment problem in RL. Bachelor's degree 2024-05-02T05:49:05Z 2024-05-02T05:49:05Z 2024 Final Year Project (FYP) Lim, M. H. Q. (2024). Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175650 https://hdl.handle.net/10356/175650 en application/pdf Nanyang Technological University
spellingShingle Mathematical Sciences
Reinforcement learning
Portfolio optimization
Optimal intrinsic rewards
Autoregressive deep reinforcement learning
Lim, Magdalene Hui Qi
Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
title Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
title_full Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
title_fullStr Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
title_full_unstemmed Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
title_short Financial portfolio optimization: an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
title_sort financial portfolio optimization an autoregressive deep reinforcement learning algorithm with learned intrinsic rewards
topic Mathematical Sciences
Reinforcement learning
Portfolio optimization
Optimal intrinsic rewards
Autoregressive deep reinforcement learning
url https://hdl.handle.net/10356/175650
work_keys_str_mv AT limmagdalenehuiqi financialportfoliooptimizationanautoregressivedeepreinforcementlearningalgorithmwithlearnedintrinsicrewards