Add per-layer gradient norm logging

The simplest way is in the trainer: iterate model.variables(), group by the model.layers.N prefix, and log per-layer grad norms. That uses names and extract_layer_index in shared/modeling/src/distro.rs rather than block state. Keeping layer_idx inside the block only helps if you want to log forward activations or per-layer debug metrics inside the block; it doesn’t give you grad norms by itself.