https://github.com/saprmarks/feature-circuits/blob/main/attribution.py#L313-L314
vjv = (upstream_act.grad @ right_vec).to_tensor()
to_backprops[tuple(downstream_idx)].backward(retain_graph=True)
since this is looping over all downstream_idx, shouldn't we back prop first, and access the grad just in time for the downstream_idx (i.e., swap the order of these two lines)? it seems like the first iteration would be just all zeros? thanks.
https://github.com/saprmarks/feature-circuits/blob/main/attribution.py#L313-L314
since this is looping over all
downstream_idx, shouldn't we back prop first, and access the grad just in time for thedownstream_idx(i.e., swap the order of these two lines)? it seems like the first iteration would be just all zeros? thanks.