Advanced Machine Learning - Co-Teacher

MSc. Course, ITU Copenhagen, Computer Science, 2024

Lecture on Mechanistic Interpretability

The lecture focused on challenges related to interpreting deep neural networks (DNNs) and more specifically transformer-based large language models (LLMs).

The lecture covered a variety of topics related to interpretabilty of AI systems, including:

The difference between explainability and interpretability focused research
Feature visualization techniques
Attention visualization
Circuit analysis
Perspectives on the transformer archiecture (the residual stream view)
The problems of polysemanticity
- How to extract monosemantic features
Representation engineering

If interested I can provide slides and lecture notes from the presentation.

Share on

Bluesky LinkedIn

Bertram Højer

Lecture on Mechanistic Interpretability

Share on