Steerable Alignment with Conditional Multiobjective Preference Optimization

As the scale, capabilities and use-cases of large language models (LLMs) continue to grow, it is imperative that these systems are aligned with human preferences. Current state of the art strategies for alignment such as Reinforcement Learning from Human Feedback (RLHF) have provided useful paradigm...

Full description

Bibliographic Details
Main Author: Manyika, Julian
Other Authors: Hadfield-Menell, Dylan
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156747