Feature/image diffusion #230

opooladz · 2025-10-04T19:00:50Z

No description provided.

…stack Enhance GRPOTrainer with advanced RL features: - Add asymmetric clipping, per-token weighting, length shaping - Support multiple advantage estimators (mean, median, GAE) - Implement KL/entropy guards and reference model resets - Add group-normalized advantages and DAPO length shaping Improve DPO metrics logging: - Add KL divergence, entropy, and logit margin tracking - Ensure consistent dashboard metrics across trainers Documentation updates: - Add RL_ALGORITHMS.md documenting all RL algorithms - Add INFERENCE_ENGINES.md documenting vInference/vSurge/vWhisper - Add RL_UPDATES.md with release notes - Add VINEPPO_IMPLEMENTATION_PLAN.md for future work - Update docs/trainers/grpo.md with new configuration options Testing: - Add tests/trainers/grpo_utils_test.py for utility functions

Add complete image diffusion stack to EasyDeL with 4 architectures, 2 trainers, and MoE-based scaling following DeepSeek V2 patterns. ## New Modules (8,900+ lines) ### Architectures (4 models) - **DiT**: Diffusion Transformer with adaptive LayerNorm (879 lines) - **DiT-MoE**: Sparse MoE DiT with 64 routed + 2 shared experts (1,116 lines) - **VAE**: Variational autoencoder for latent diffusion (1,189 lines) - **UNet 2D**: Stable Diffusion UNet with cross-attention (2,186 lines) - **Flux**: State-of-the-art transformer with RoPE (1,353 lines) ### Trainers (2 implementations) - **Image Diffusion Trainer**: Rectified flow with velocity prediction (442 lines) - **Stable Diffusion Trainer**: Full SD pipeline with VAE + text (1,343 lines) ## Key Features ### DiT-MoE (New!) - Mixture of Experts following DeepSeek V2 architecture - 64 routed experts + 2 shared experts (configurable) - Top-k routing without auxiliary losses - Expert parallelism support via ExpertColumnWiseAlt sharding - 3x parameters with same compute as dense DiT ### Rectified Flow - Velocity prediction formulation: v = data - noise - Straight ODE paths for fast sampling - Min-SNR gamma weighting (γ=5.0) for training stability - Compatible with DDPM/DDIM schedulers ### Production Ready - Full Flax nnx implementation with EasyDeLBaseModule - @register_module and @register_config decorators - Partition rules for distributed training - Gradient checkpointing support ## Documentation - DIT_MOE_README.md: Complete MoE-DiT guide (524 lines) - DIFFUSION_COMPLETE_SUMMARY.md: Architecture overview (462 lines) - IMAGE_DIFFUSION_README.md: DiT training guide (369 lines) - examples/train_image_diffusion_dit.py: Training example (147 lines) ## Registry Updates - easydel/modules/__init__.py: Added dit, dit_moe, flux, unet2d, vae - easydel/trainers/__init__.py: Added image_diffusion and stable_diffusion trainers Total: 10,177 lines added across 32 files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Upgrade DiT-MoE to use DeepSeek V3's superior MoE design with major improvements: ## V3 Improvements over V2 ### Expert Scaling (4x more experts) - **256 routed experts** (vs 64 in V2) - **1 shared expert** (vs 2 in V2) - more capacity in routed experts - **8 experts per token** (vs 6 in V2) for better quality ### Routing Innovations - **Sigmoid scoring** (vs softmax in V2) for better expert utilization - **Token-choice routing** (`noaux_tc`) - tokens choose experts naturally - **Group-limited routing**: 8 expert groups with top-4 selection - **Higher scaling factor**: 2.5 (vs 1.0) for stronger expert contributions - **Normalized top-k probabilities** for balanced load ### Performance Impact - **3.1% sparsity**: Only 9/257 experts active (vs 12.1% in V2) - **Better load balancing** through group-limited routing - **No auxiliary losses** - V3's natural balance eliminates need for router losses ## Changes - easydel/modules/dit_moe/dit_moe_configuration.py: Update defaults to V3 - easydel/modules/dit_moe/modeling_dit_moe.py: Add sigmoid scoring + noaux_tc routing - DIT_MOE_README.md: Update documentation to reflect V3 architecture Total experts: 1 shared + 256 routed = 257 experts Active per token: 1 shared + 8 routed = 9 experts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Triton (from eformer) only has wheels for Linux x86_64. Use platform markers to auto-detect and conditionally install Triton based on the OS. ## Auto-Detection ```toml override-dependencies = [ "triton>=3.0.0; platform_system == 'Linux' and platform_machine == 'x86_64'", ] ``` ## Behavior - **Linux x86_64** (CUDA GPUs) → ✅ Triton installed - **macOS** (Apple Silicon/Intel) → ⏭️ Triton skipped - **Windows** → ⏭️ Triton skipped - **Linux ARM** (Raspberry Pi, etc.) → ⏭️ Triton skipped ## Installation Just run `uv sync` on any platform - it auto-detects: ```bash uv sync # Works on Mac, Linux, Windows ``` No manual platform selection needed! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add comprehensive image diffusion documentation showcasing new capabilities: ## Image Diffusion Section ### DiT-MoE Example - Training with 256 experts (DeepSeek V3 architecture) - Rectified flow with velocity prediction - Complete configuration example ### Stable Diffusion Example - Text-to-image training with frozen CLIP - VAE + UNet2D pipeline - SNR weighting configuration ### Supported Architectures - DiT: Patch-based transformer with adaptive LayerNorm - DiT-MoE: Sparse MoE (256 experts, 3.1% sparsity) - UNet2D: Classic SD with cross-attention - Flux: State-of-the-art with RoPE - VAE: Latent encoder/decoder (SD 1.x/2.x/SDXL) ### Key Features - Rectified Flow with straight ODE paths - Min-SNR weighting (γ=5.0) for stability - Expert parallelism for distributed training - Mixed precision (bfloat16/float16) ## Key Features Updates - Listed 55+ models by category (LLMs, SSMs, Vision, Multimodal, MoE) - Added Image Diffusion and Stable Diffusion trainers - Highlighted 12 DPO algorithms in trainer list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

opooladz and others added 5 commits October 3, 2025 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/image diffusion #230

Feature/image diffusion #230

Uh oh!

opooladz commented Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature/image diffusion #230

Are you sure you want to change the base?

Feature/image diffusion #230

Uh oh!

Conversation

opooladz commented Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant