Interpreting and Editing Memory in Large Transformer Language Models
This thesis investigates the mechanisms of factual recall in large language models. We first apply causal interventions to identify neuron activations that are decisive in a model’s factual predictions; surprisingly, we find that factual recall corresponds to a sparse, localizable computation in the...
Main Author: | Meng, Kevin |
---|---|
Other Authors: | Andreas, Jacob D. |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156794 |
Similar Items
-
Transforming experiences: Neurobiology of memory updating/editing
by: Daniel Osorio-Gómez, et al.
Published: (2023-02-01) -
Instruction-guided image editing empowered by large language models
by: Wang, Yiying
Published: (2024) -
Augmenting interpretable models with large language models during training
by: Chandan Singh, et al.
Published: (2023-11-01) -
Leave It to Large Language Models! Correction and Planning with Memory Integration
by: Yuan Zhang, et al.
Published: (2024-01-01) -
Truthfulness in Large Language Models
by: Liu, Kevin
Published: (2023)