Sandbox (+FA2 support, SharedContext for CompositeLoss, FP32 logits trainer side, ...)#17
Open
joanvelja wants to merge 47 commits intohallerite:masterfrom
Open
Sandbox (+FA2 support, SharedContext for CompositeLoss, FP32 logits trainer side, ...)#17joanvelja wants to merge 47 commits intohallerite:masterfrom
joanvelja wants to merge 47 commits intohallerite:masterfrom
Conversation
type token traces & rollout extras; add some docs
…o apps_batched
Finished Sandbox + QoL (FP32 logits trainer side; flash attention support; shared context for CompositeLoss)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Main contribution is Sandbox backend (stored in
src/ludic/envs/code_exec). Both the backend and the test file have readmes with explainer. Has been thoroughly tested on HPC (podman-hpcinstead ofdocker, a daemonless wrapper). TBT on docker-containing setups, although I do not expect crashes/problems since the whole orchestration logic is shared for arbitrary sandbox hosts. Details about Sandbox and how it works are found atexamples/code_exec/README.md.Backend code auto-detects sandbox type (docker, podman-hpc) and support for different sandbox types can be steadily extended (e.g., Singularity).
PR contains a few QoL improvements such as Flash-Attn support (automated, hardware aware), FP32 upcasted logits (trainer side; inference side requires patching vLLM or fetching someone else's patch—Minimax?), ScaleRL (Meta RL scaling laws objective), HybridCreditAssignment (within group mean calc, within batch std calc), and fixes a bug for composite losses that caused a spike in memory usage due to logprobs being computed for every LossFn in the composite loss object: added SharedContext to avoid recomputation of redundant objects.