Steerable Alignment with Conditional Multiobjective Preference Optimization
As the scale, capabilities and use-cases of large language models (LLMs) continue to grow, it is imperative that these systems are aligned with human preferences. Current state of the art strategies for alignment such as Reinforcement Learning from Human Feedback (RLHF) have provided useful paradigm...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156747 |