Question
Suppose ConstrainedAdam optimizer is not used
Wouldn't this allow the model to cheat the L_sparse in the next step?
For example:
- 1st step:
f_gate shrinks slightly because of L_sparse
- 2nd step:
L_recon increase because f_gate shrinks in the first step (affecting f), so now it tries to compensate by increasing decoder weights
- and the pattern continues
|
x_hat_gate = f_gate @ self.ae.decoder.weight.detach().T + self.ae.decoder_bias.detach() |
Question
Suppose
ConstrainedAdamoptimizer is not usedWouldn't this allow the model to cheat the
L_sparsein the next step?For example:
f_gateshrinks slightly because ofL_sparseL_reconincrease becausef_gateshrinks in the first step (affectingf), so now it tries to compensate by increasing decoder weightsdictionary_learning/dictionary_learning/trainers/gdm.py
Line 80 in 60ec6bf