Interpreting and Editing Memory in Large Transformer Language Models

This thesis investigates the mechanisms of factual recall in large language models. We first apply causal interventions to identify neuron activations that are decisive in a model’s factual predictions; surprisingly, we find that factual recall corresponds to a sparse, localizable computation in the...

Full description

Bibliographic Details
Main Author: Meng, Kevin
Other Authors: Andreas, Jacob D.
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156794

Similar Items