Question
In the GPT-OSS example configs under examples/post_training/modelopt/conf/openai/, the scripts currently use --window-size 128,0.
With the common window_size=(left,right) semantics in causal attention (right=0), the effective visible tokens count is left + 1 (including the current token). To match GPT-OSS sliding_window = 128 tokens (including current token), it seems the correct mapping should be --window-size 127,0.
I opened a minimal PR updating the two example scripts accordingly:
#2771
Could you confirm whether 127,0 is the intended setting for these GPT-OSS example scripts?