Understanding Concept Representations and their Transformations in Transformer Models

As transformer language models continue to be more widely used in a variety of applications, developing methods to understand their internal reasoning processes becomes more critical. One category of such methods called neuron labeling identifies salient directions in the model’s internal representa...

Full description

Bibliographic Details
Main Author: Kearney, Matthew
Other Authors: Andreas, Jacob
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151276