Offline Reward Learning from Human Demonstrations and Feedback: A Linear Programming Approach
In many complex sequential decision-making tasks, there is often no known explicit reward function, and the only information available is human demonstrations and feedback data. To infer and shape the underlying reward function from this data, two key methodologies have emerged: inverse reinforcemen...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156337 |