Skip to content

Sandbox (+FA2 support, SharedContext for CompositeLoss, FP32 logits trainer side, ...)#17

Open
joanvelja wants to merge 47 commits intohallerite:masterfrom
joanvelja:master
Open

Sandbox (+FA2 support, SharedContext for CompositeLoss, FP32 logits trainer side, ...)#17
joanvelja wants to merge 47 commits intohallerite:masterfrom
joanvelja:master

Conversation

@joanvelja
Copy link
Contributor

Main contribution is Sandbox backend (stored in src/ludic/envs/code_exec). Both the backend and the test file have readmes with explainer. Has been thoroughly tested on HPC (podman-hpc instead of docker, a daemonless wrapper). TBT on docker-containing setups, although I do not expect crashes/problems since the whole orchestration logic is shared for arbitrary sandbox hosts. Details about Sandbox and how it works are found at examples/code_exec/README.md.

Backend code auto-detects sandbox type (docker, podman-hpc) and support for different sandbox types can be steadily extended (e.g., Singularity).

PR contains a few QoL improvements such as Flash-Attn support (automated, hardware aware), FP32 upcasted logits (trainer side; inference side requires patching vLLM or fetching someone else's patch—Minimax?), ScaleRL (Meta RL scaling laws objective), HybridCreditAssignment (within group mean calc, within batch std calc), and fixes a bug for composite losses that caused a spike in memory usage due to logprobs being computed for every LossFn in the composite loss object: added SharedContext to avoid recomputation of redundant objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant