Towards interpretable sequence continuation: analyzing shared circuits in large language models

While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer transformer models into human-readable representations called circuits that implement algorithmic functions. We extend thi...

Full description

Bibliographic Details
Main Authors: Lan, M, Torr, P, Barez, F
Format: Conference item
Language:English
Published: Association for Computational Linguistics 2024