Threshold is applied twice in the JumpReLU training

In the loss for the jumprelu: 
https://github.com/saprmarks/dictionary_learning/blob/60ec6bf5264944d64a4ca271f45a29ebfb9d4946/dictionary_learning/trainers/jumprelu.py#L156-L170

The threshold is applied twice: first in the line 156 and then in the line 170. I think that in the line 170 the StepFunction should be applied to the pre_jump value (this is also how it is done in [the collab linked in the docstring](https://[colab.research.google.com](https://colab.research.google.com/drive/1PlFzI_PWGTN9yCQLuBcSuPJUjgHL7GiD#scrollTo=yP828a6uIlSO)/drive/1PlFzI_PWGTN9yCQLuBcSuPJUjgHL7GiD#scrollTo=yP828a6uIlSO) as well as in the equation 10 [in the paper](https://arxiv.org/pdf/2407.14435). While in the forward pass it does not matter, it may affect the pseudoderivative.

	f = JumpReLUFunction.apply(pre_jump, self.ae.threshold, self.bandwidth)

	active_indices = f.sum(0) > 0
	did_fire = torch.zeros_like(self.num_tokens_since_fired, dtype=torch.bool)
	did_fire[active_indices] = True
	self.num_tokens_since_fired += x.size(0)
	self.num_tokens_since_fired[active_indices] = 0
	self.dead_features = (
	(self.num_tokens_since_fired > self.dead_feature_threshold).sum().item()
	)

	recon = self.ae.decode(f)

	recon_loss = (x - recon).pow(2).sum(dim=-1).mean()
	l0 = StepFunction.apply(f, self.ae.threshold, self.bandwidth).sum(dim=-1).mean()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threshold is applied twice in the JumpReLU training #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Threshold is applied twice in the JumpReLU training #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions