-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Labels
duplicateThis issue or pull request already existsThis issue or pull request already existsgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
SGLang Diffusion Cookbook Community Contribution Template
Reference: Wan2.2 Cookbook
1. Model Introduction
- Overview: Brief description of model purpose and capabilities
- Variants: List versions with specific use cases
- Key Features: Unique capabilities (reasoning, tool calling, multimodal)
- Links: HuggingFace model page and official documentation
2. Installation
Refer to the official installation guide.
3. Deployment
Basic Configuration
sglang serve --model-path [model-path]
Optimization Tips
- Parallelism: Recommended TP and SP settings for different GPU counts
- Memory: cpu-offload, quantization (
--dit-cpu-offload) - Performance: Cache-Dit
4. API Usage
Refer to the official api usage guide.Geneate an image
import base64
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:30010/v1")
img = client.images.generate(
prompt="A calico cat playing a piano on stage",
size="1024x1024",
n=1,
response_format="b64_json",
)
image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
f.write(image_bytes)
Geneate a video
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:30010/v1")
video = client.videos.create(
prompt="A calico cat playing a piano on stage",
size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")
Document model-specific advanced usage: cache-dit, cpu-offload and so on.
5. Benchmarks
Environment:
- Hardware: [GPU Type] × [Number]
- Model: [Name/Variant]
- SGLang Version: [Version]
Concurrency Levels
- Low (1): Best latency
- High (20): Max throughput
Benchmark Commands
Low Concurrency
# Server
# For text to video: such as Wan2.2-T2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-video --dataset vbench --task t2v --num-prompts 1 --max-concurrency 1
# For image to video: such as Wan2.2-I2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-video --dataset vbench --task i2v --num-prompts 1 --max-concurrency 1
# For image-text to video: such as Wan2.2-TI2V-5B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-video --dataset vbench --task ti2v --num-prompts 1 --max-concurrency 1
# For text to image: such as Qwen-Image
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-image --dataset vbench --task t2i --num-prompts 1 --max-concurrency 1
# For image-text to image: such as Qwen-Image-Edit
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-image --dataset vbench --task ti2i --num-prompts 1 --max-concurrency 1
High Concurrency
# Server
# For text to video: such as Wan2.2-T2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-video --dataset vbench --task t2v --num-prompts 20 --max-concurrency 20
# For image to video: such as Wan2.2-I2V-A14B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-video --dataset vbench --task i2v --num-prompts 20 --max-concurrency 20
# For image-text to video: such as Wan2.2-TI2V-5B-Diffusers
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-video --dataset vbench --task ti2v --num-prompts 20 --max-concurrency 20
# For text to image: such as Qwen-Image
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-image --dataset vbench --task t2i --num-prompts 20 --max-concurrency 20
# For image-text to image: such as Qwen-Image-Edit
python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
--backend sglang-image --dataset vbench --task ti2i --num-prompts 20 --max-concurrency 20
Key Metrics
- Request Throughput (req/s), Output Throughput (tok/s)
- Latency Mean (ms): Time to Per Step
- Peak Memory Max (ms): Max Memory Usage during running
Contribution Checklist
- Follow template structure
- Include all two concurrency levels
- Document hardware specifications
- Link to official resources
- Verify all commands work
Resources: SGLang Docs | GitHub | Cookbook Repo
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
duplicateThis issue or pull request already existsThis issue or pull request already existsgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed