[Bug Report] `load_and_process_state_dict` handles LayerNorm folding poorly

**Describe the bug**
If one attempts to load the `state_dict` of a model which was saved without folded `LayerNorms` (i.e. without `LayerNormPre`) calling `load_and_process_state_dict(state_dict, fold_ln=True)` fails due to the strict use of `load_state_dict`. This can be circumvented by instead doing:

```python
model = HookedTransformer(model_cfg)
model.load_and_process_state_dict(state_dict, fold_ln=False)
model.process_weights_(fold_ln=True)
model.setup()
```

Without calling `model.setup()` the `LayerNorm` hooks remain inside the model, but are not properly attached and thus suitable activations are not returned when doing `run_with_cache`, causing issues in `ActivationCache` manipulation helpers.

Additionally, if the original model _was_ saved with folded layernorms, calling `load_and_process_state_dict(state_dict, fold_ln=True)` raises an error as no layernorm parameters and located in the state dict.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] `load_and_process_state_dict` handles LayerNorm folding poorly #219

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] load_and_process_state_dict handles LayerNorm folding poorly #219

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug Report] `load_and_process_state_dict` handles LayerNorm folding poorly #219