Starting point: Work your way up to an induction circuit.
In a transformer that leverages attention layers with RoPE positional encodings:
- How to create a fully positional attention?
- How to create a "fully" (it will be mostly) semantic head (e.g., matching a specific token in the context)?
- How to wire such heads together to obtain an induction circuit?
Bonus: Go through our many papers and find other concepts to implement within your framework. E.g., circuits from Nikhil's entity binding work would be natural candidates to take this further.
Deliveries: Ideally some sort of a interactive notebook (jupyter) with blocks containing your mathematical derivations / your intuitions and cells testing them. Basically as is done here: https://colab.research.google.com/github/callummcdougall/arena-pragmatic-interp/blob/main/chapter1_transformer_interp/exercises/part1_transformer_from_scratch/1.1_Transformer_from_Scratch_exercises.ipynb?t=20260301
Additional resources:
- https://arxiv.org/abs/2410.06205
- https://wendlerc.github.io/notes/rope.html (my incomplete notes on this topic)
Starting point: Work your way up to an induction circuit.
In a transformer that leverages attention layers with RoPE positional encodings:
Bonus: Go through our many papers and find other concepts to implement within your framework. E.g., circuits from Nikhil's entity binding work would be natural candidates to take this further.
Deliveries: Ideally some sort of a interactive notebook (jupyter) with blocks containing your mathematical derivations / your intuitions and cells testing them. Basically as is done here: https://colab.research.google.com/github/callummcdougall/arena-pragmatic-interp/blob/main/chapter1_transformer_interp/exercises/part1_transformer_from_scratch/1.1_Transformer_from_Scratch_exercises.ipynb?t=20260301
Additional resources: