[QUESTION] GPT-OSS example configs: should --window-size be 127,0 to match sliding_window=128?

**Question**
In the GPT-OSS example configs under `examples/post_training/modelopt/conf/openai/`, the scripts currently use `--window-size 128,0`.

With the common `window_size=(left,right)` semantics in causal attention (right=0), the effective visible tokens count is `left + 1` (including the current token). To match GPT-OSS `sliding_window = 128` tokens (including current token), it seems the correct mapping should be `--window-size 127,0`.

I opened a minimal PR updating the two example scripts accordingly:
https://github.com/NVIDIA/Megatron-LM/pull/2771

Could you confirm whether `127,0` is the intended setting for these GPT-OSS example scripts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] GPT-OSS example configs: should --window-size be 127,0 to match sliding_window=128? #3690

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] GPT-OSS example configs: should --window-size be 127,0 to match sliding_window=128? #3690

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions