unsloth-multi-gpu-vision

This is an extension from cwpeng's, you can now train Qwen3-VL with Unsloth on multi-GPU:

For the first run, you need to disable multi-gpu for Unsloth to compile into unsloth_compiled_cache
After that, add os.environ["UNSLOTH_COMPILE_DISABLE"] = "1", disable Unsloth compilation to avoid hanging <= I don't know why, this will reduce speed but my experiment is too small for noticeable effect
The root cause seems to be related to gradient checkpointing
Can run with DeepSpeed.
Tested of 2x L4
Working on GRPO for VL model.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
requirements.txt		requirements.txt
single-vision2.py		single-vision2.py

Provide feedback