|
log.append(f"decoded {jj}th latent's attended tokens (top5): {attn_to_lats[jj][ii]}") |
It seems the token probing script doesn't implement reading from the attention map, and test_attention would just make the script fail. Is there a different version of it that was used for the figures in the paper?