Automated Mechanistic Interpretability for Neural Networks
Mechanistic interpretability research aims to deconstruct the underlying algorithms that neural networks use to perform computations, such that we can modify their components, causing them to change behavior in predictable and positive ways. This thesis details three novel methods for automating the...
Main Author: | Liao, Isaac C. |
---|---|
Other Authors: | Tegmark, Max |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156787 |
Similar Items
-
Automated interpretation of the coronary angioscopy with deep convolutional neural networks
by: Toru Miyoshi, et al.
Published: (2020-06-01) -
Convolutional Neural Networks for Mechanistic Driver Detection in Atrial Fibrillation
by: Gonzalo Ricardo Ríos-Muñoz, et al.
Published: (2022-04-01) -
Mechanistic Interpretability for Progress Towards Quantitative AI Safety
by: Lad, Vedang K.
Published: (2024) -
A mechanistic interpretation of relativistic rigid body rotation
by: Stefan Catheline
Published: (2023-06-01) -
Interpreting automated perimetry.
by: Shukla Yogesh
Published: (2001-01-01)