Towards interpretable sequence continuation: analyzing shared circuits in large language models
While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer transformer models into human-readable representations called circuits that implement algorithmic functions. We extend thi...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Association for Computational Linguistics
2024
|