-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Dev] Add E2E support for THD format #2924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
73b4bbb
496431e
787d08e
fc75657
9b1ab25
eb7cee8
f6ccf6e
816fca1
2aae2d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -79,6 +79,9 @@ class GPTDatasetConfig(BlendedMegatronDatasetConfig): | |
| context_parallel_size: Optional[int] = None | ||
| """The size of the context parallel group. Needed for padding in packed sequences.""" | ||
|
|
||
| sft_mock_dataset_config_json: Optional[str] = None | ||
| """This config provides the necessary information for the mock dataset.""" | ||
|
Comment on lines
+82
to
+83
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anti-pattern? Optional dataset config within a dataset config? Why aren't we using inheritance here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The mock/SFT dataset config has a large number of parameters, and threading them one-by-one into the |
||
|
|
||
| def __post_init__(self) -> None: | ||
| """Do asserts and set fields post init""" | ||
| super().__post_init__() | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2076,6 +2076,40 @@ def __post_init__(self): | |
| self.attention_backend == AttnBackend.flash | ||
| ), "Batch invariant mode only supports FlashAttention" | ||
|
|
||
| if self.sequence_packing_scheduler is not None: | ||
| # Check TE version. | ||
| if not HAVE_PACKAGING: | ||
| raise ImportError( | ||
| "packaging is not installed. Please install it with `pip install packaging`." | ||
| ) | ||
| # TODO: remove this after we fix the convergence issue with TE < 2.9. | ||
| if not ( | ||
| is_te_min_version("2.9.0") or get_te_version() == PkgVersion("2.9.0.dev0+5b3092a") | ||
| ): | ||
| raise ValueError( | ||
| "SFT sequence packing requires Transformer Engine >= 2.9.0 " | ||
| f"but got {get_te_version()} (TE < 2.9.0 may have convergence issues)." | ||
| ) | ||
|
|
||
| # Needed for passing variable sequences between pp stages. | ||
| self.variable_seq_lengths = True | ||
|
|
||
| # TODO(tailaim): add support for other dispatcher types | ||
| assert self.moe_token_dispatcher_type == "alltoall", ( | ||
| f"sequence_packing only supports moe_token_dispatcher_type='alltoall', " | ||
| f"got '{self.moe_token_dispatcher_type}'" | ||
| ) | ||
|
|
||
| supported_schedulers = ['dp_balanced'] | ||
| if ( | ||
| self.sequence_packing_scheduler is not None | ||
| and self.sequence_packing_scheduler not in supported_schedulers | ||
| ): | ||
| raise ValueError( | ||
| f"Unsupported scheduler: {self.sequence_packing_scheduler}. " | ||
| f"Available schedulers: {supported_schedulers}" | ||
| ) | ||
|
Comment on lines
+2103
to
+2111
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to keep track of the supported schedulers here? Can we make it a part of the data class? nit: and in the error message, it's more that the scheduler isn't supported as opposed to being unknown
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, i'll make the changes. |
||
|
|
||
|
|
||
| @dataclass | ||
| @experimental_api | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.