-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
According to the code (https://github.com/openai/phasic-policy-gradient/blob/master/phasic_policy_gradient/train.py#L14), arch 'detach' seems corresponding to the single-network variant described in section 3.6 of the paper. According the paper and the comment in the code, the value function should not be detached from the encoder during aux phase. However, the value function (vfvec) seems always detached according to the code:
phasic-policy-gradient/phasic_policy_gradient/ppg.py
Lines 148 to 153 in 7295473
| for k in self.vf_keys: | |
| if self.detach_value_head: | |
| x_out[k] = x_out[k].detach() | |
| aux[k] = self.get_vhead(k)(x_out[k])[..., 0] | |
| vfvec = aux[self.true_vf_key] | |
| aux.update({"vpredaux": self.aux_vf_head(pi_x)[..., 0], "vpredtrue": vfvec}) |
Can you clarify whether it should be detached or not in the aux phase and whether it affects the results reported in the paper?
Thanks
Metadata
Metadata
Assignees
Labels
No labels