Addressing Misalignment in Language Model Deployments through Context-Specific Evaluations

Language model-based applications are increasingly being deployed in the real world across a variety of contexts. While their rapid success has realized benefits for society, ensuring that they are trained to perform according to societal values and expectations is imperative given their potential t...

Full description

Bibliographic Details
Main Author: Soni, Prajna
Other Authors: Hadfield-Menell, Dylan
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156962
https://orcid.org/0009-0005-3379-5334