An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
-
Updated
Mar 2, 2025 - TeX
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Slides, videos and other potentially useful artifacts from various presentations on responsible machine learning.
Paper for 2018 Joint Statistical Meetings: https://ww2.amstat.org/meetings/jsm/2018/onlineprogram/AbstractDetails.cfm?abstractid=329539
Work for SPAR 2024, Circuit Phenomenology Using Sparse Autoencoders
Interpreting neural networks by reducing nonlinearities during training
AI Explainability 360 Toolkit for Time-Series and Industrial Use Cases
Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.
Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.
To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."