Skip to content

Conversation

@CharlesCNorton
Copy link

Summary

  • Reorder state dict loading before module_ops in SingleGPUModelBuilder.build() to prevent segfault on Windows
  • Make triton import optional with fallback for Windows compatibility (triton is Linux-only)

Problem

On Windows, loading a large safetensors file (41GB BF16 model) crashes with segfault when Gemma is already loaded in memory. This appears to be a safetensors memory mapping issue specific to Windows.

Solution

Load the state dict before applying module_ops (which load Gemma). This ensures the safetensors file is opened before Gemma occupies memory, avoiding the crash.

Test plan

  • Tested on Windows 11 with RTX 6000 Ada (48GB VRAM), 137GB RAM
  • BF16 model (ltx-2-19b-dev.safetensors) now loads and runs successfully
  • FP8 model continues to work
  • Triton-dependent code paths gracefully fall back when triton unavailable

🤖 Generated with Claude Code

On Windows, loading large safetensors files (41GB BF16 model) crashes
with a segmentation fault when Gemma is already loaded in memory due
to a safetensors memory mapping issue.

Changes:
- Reorder SingleGPUModelBuilder.build() to load state dict BEFORE
  applying module_ops (which load Gemma)
- Make triton import conditional since it's Linux-only, with fallback
  for FP8 weight fusion on non-Linux systems
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant