A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-l...
主要な著者: | , |
---|---|
フォーマット: | 論文 |
言語: | English |
出版事項: |
Frontiers Media S.A.
2022-03-01
|
シリーズ: | Frontiers in Robotics and AI |
主題: | |
オンライン・アクセス: | https://www.frontiersin.org/articles/10.3389/frobt.2022.797213/full |