Skip to content

SGLang Diffusion Model Cookbook Template #32

@jiapingW

Description

@jiapingW

SGLang Diffusion Cookbook Community Contribution Template

Reference: Wan2.2 Cookbook


1. Model Introduction

  • Overview: Brief description of model purpose and capabilities
  • Variants: List versions with specific use cases
  • Key Features: Unique capabilities (reasoning, tool calling, multimodal)
  • Links: HuggingFace model page and official documentation

2. Installation

Refer to the official installation guide.


3. Deployment

Basic Configuration

sglang serve --model-path [model-path]

Optimization Tips

  • Parallelism: Recommended TP and SP settings for different GPU counts
  • Memory: cpu-offload, quantization (--dit-cpu-offload)
  • Performance: Cache-Dit

4. API Usage

Refer to the official api usage guide.

Geneate an image

import base64
from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:30010/v1")
img = client.images.generate(
    prompt="A calico cat playing a piano on stage",
    size="1024x1024",
    n=1,
    response_format="b64_json",
)
image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(image_bytes)

Geneate a video

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:30010/v1")
video = client.videos.create(
    prompt="A calico cat playing a piano on stage",
    size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")

Document model-specific advanced usage: cache-dit, cpu-offload and so on.


5. Benchmarks

Environment:

  • Hardware: [GPU Type] × [Number]
  • Model: [Name/Variant]
  • SGLang Version: [Version]

Concurrency Levels

  • Low (1): Best latency
  • High (20): Max throughput

Benchmark Commands

Low Concurrency

# Server
# For text to video: such as Wan2.2-T2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-video --dataset vbench --task t2v --num-prompts 1 --max-concurrency 1
# For image to video: such as Wan2.2-I2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-video --dataset vbench --task i2v --num-prompts 1 --max-concurrency 1
# For image-text to video: such as Wan2.2-TI2V-5B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-video --dataset vbench --task ti2v --num-prompts 1 --max-concurrency 1
# For text to image: such as Qwen-Image
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-image --dataset vbench --task t2i --num-prompts 1 --max-concurrency 1
# For image-text to image: such as Qwen-Image-Edit
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-image --dataset vbench --task ti2i --num-prompts 1 --max-concurrency 1

High Concurrency

# Server
# For text to video: such as Wan2.2-T2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-video --dataset vbench --task t2v --num-prompts 20 --max-concurrency 20
# For image to video: such as Wan2.2-I2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-video --dataset vbench --task i2v --num-prompts 20 --max-concurrency 20
# For image-text to video: such as Wan2.2-TI2V-5B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-video --dataset vbench --task ti2v --num-prompts 20 --max-concurrency 20
# For text to image: such as Qwen-Image
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-image --dataset vbench --task t2i --num-prompts 20 --max-concurrency 20
# For image-text to image: such as Qwen-Image-Edit
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-image --dataset vbench --task ti2i --num-prompts 20 --max-concurrency 20

Key Metrics

  • Request Throughput (req/s), Output Throughput (tok/s)
  • Latency Mean (ms): Time to Per Step
  • Peak Memory Max (ms): Max Memory Usage during running

Contribution Checklist

  • Follow template structure
  • Include all two concurrency levels
  • Document hardware specifications
  • Link to official resources
  • Verify all commands work

Resources: SGLang Docs | GitHub | Cookbook Repo

Metadata

Metadata

Assignees

Labels

duplicateThis issue or pull request already existsgood first issueGood for newcomershelp wantedExtra attention is needed

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions