Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
259 commits
Select commit Hold shift + click to select a range
11fffbc
[Doc]: Deepseek reference docs (#2787)
XiaotongJiang Jan 9, 2025
5cc1170
Doc: add block-wise FP8 in dpsk model reference (#2830)
zhaochenyang20 Jan 10, 2025
2db03a0
Update README.md (#2833)
merrymercy Jan 10, 2025
8f15789
Add more metrics to serving benchmark. (#2819)
mutinifni Jan 10, 2025
f290bd4
[Bugfix] Fix embedding model hangs with `--enable-metrics` (#2822)
Jan 10, 2025
5413ec2
[Bugfix] Fix bug in fork logic caused by null text_ (#2835)
Muqi1029 Jan 10, 2025
b170646
Fix port number overflow (#2826)
gty111 Jan 10, 2025
a47bf39
[Eagle2] Fix multiple concurrent request crashes (#2730)
coolhok Jan 10, 2025
5d6e946
Cache controller for hierarchical caching (#2804)
xiezhq-hermann Jan 11, 2025
f176958
Update threshold in test_nightly_gsm8k_eval.py (#2836)
merrymercy Jan 11, 2025
f0e15dc
[HotFix] fix fp8 scale load failed in tp>1 (#2837)
BBuf Jan 11, 2025
f624901
chore: bump v0.4.1.post5 (#2840)
zhyncs Jan 11, 2025
197cbf9
docs: update README (#2841)
zhyncs Jan 11, 2025
c4f9707
Improve: Token-In Token-Out Usage for RLHF (#2843)
shuaills Jan 11, 2025
e2b16c4
add sampling_scaling_penalties kernel (#2846)
BBuf Jan 13, 2025
a879c2f
fix sgl-kernel build (#2850)
zhyncs Jan 13, 2025
85b2e05
Add int8 quant kernel (#2848)
ispobock Jan 13, 2025
0bb0f76
Support FP8 E4M3 KV Cache (#2786)
bjmsong Jan 13, 2025
a18ab81
Update base image for ROCm (#2852)
sogalin Jan 13, 2025
e808c1d
Integrate ROCm ater package for ck moe function feasibility (#2854)
kkHuang-amd Jan 13, 2025
4093aa4
[Fix]eagle2 health_generate is first request,apiserver will core (#2853)
coolhok Jan 13, 2025
72c7776
Fix linear.py and improve weight loading (#2851)
merrymercy Jan 13, 2025
42f3909
Unify sglang coding style (#2856)
kkHuang-amd Jan 13, 2025
20a9f5d
fix: not delete CNAME (#2860)
zhyncs Jan 13, 2025
41d7e5b
docs: update link (#2857)
zhyncs Jan 13, 2025
4536d72
minor: use ubuntu-latest instead of self-hosted runner for amd build …
zhyncs Jan 13, 2025
67008f4
Use only one GPU for MLA CI tests (#2858)
merrymercy Jan 13, 2025
51ab3cc
Collect more metrics: num_requests_total (#2859)
merrymercy Jan 13, 2025
17de02f
Integration of TurboMind AWQ (#2828)
bjmsong Jan 13, 2025
f3516c2
Fix quant kernel accuracy issue (#2865)
ispobock Jan 13, 2025
6249e4a
Revert "Integration of TurboMind AWQ" (#2866)
merrymercy Jan 13, 2025
3b141e1
Dump requests (#2862)
merrymercy Jan 13, 2025
336ff5b
Fix typos in io_struct.py (#2867)
merrymercy Jan 13, 2025
d855653
minor: fix release docs (#2868)
zhyncs Jan 13, 2025
6ec75e6
add qwen2 eagle model (#2863)
Lzhang-hub Jan 13, 2025
c1e097c
Revert "Dump requests to a folder" (#2869)
merrymercy Jan 13, 2025
d08c77c
Sampling penalties memory interface (#2870)
BBuf Jan 13, 2025
923f518
CUDA-graph-compatible releasing and resuming KV cache and model weigh…
fzyzcjy Jan 13, 2025
46d4431
Add a new api configure_logging to allow dumping the requests (#2875)
merrymercy Jan 13, 2025
8000256
docs: update README (#2878)
zhyncs Jan 14, 2025
c19d848
Adjust flashinfer workspace size for Qwen2 models (#2879)
ispobock Jan 14, 2025
b8cd09f
update ROCm docker for layernorm kernel optimization (#2885)
kkHuang-amd Jan 14, 2025
cc0485b
Support w8a8 int8 quantization config (#2881)
ispobock Jan 14, 2025
f5c6c66
feat: support internlm 3 dense (#2888)
zhyncs Jan 14, 2025
f005758
introduce CUB in sgl-kernel (#2887)
BBuf Jan 14, 2025
955a2fb
Add performance and accuracy test code for FP8 GEMM operations
yych0745 Jan 7, 2025
30bdf20
support w8a8 fp8
HandH1998 Jan 8, 2025
4cac9fb
support bias
HandH1998 Jan 9, 2025
ecc90a4
opitmize
yych0745 Jan 10, 2025
3497950
add config_profile for sm_89
yych0745 Jan 13, 2025
05eb204
fp8 sm90-H100 singleTest done
yych0745 Jan 14, 2025
8d95538
fp8 sm90-H100 singleTest done
yych0745 Jan 14, 2025
724cf62
clean code
yych0745 Jan 14, 2025
93e2d85
fix
yych0745 Jan 14, 2025
8c08dbb
clean code
yych0745 Jan 14, 2025
fb95b0e
clean code
yych0745 Jan 15, 2025
b3e99df
chore: bump v0.4.1.post6 (#2899)
zhyncs Jan 15, 2025
bfbda62
Add ut for w8a8 int8 quantization (#2897)
ispobock Jan 15, 2025
b803b39
Disable graceful shutdown of tokenizer manager when not in the main t…
comaniac Jan 15, 2025
f65c13b
Remove normalized_prompt_logprobs from the engine to make code easier…
merrymercy Jan 15, 2025
6cb3974
optimize custom allreduce kernel (#2904)
yizhang2077 Jan 15, 2025
a53454c
fix: sgl-kernel link cuda (#2906)
zhyncs Jan 15, 2025
767c9de
adapt custom allreduce for tensorrt llm (#2511)
yizhang2077 Jan 15, 2025
58f42b1
minor: update pr test (#2908)
zhyncs Jan 15, 2025
b7f3fec
minor: rename bench for sgl kernel (#2909)
zhyncs Jan 15, 2025
ab31793
[kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911)
BBuf Jan 16, 2025
bf8d07a
feat: patch linear base (#2915)
zhyncs Jan 16, 2025
2dc957d
fix setup for sgl kernel (#2917)
zhyncs Jan 16, 2025
7596417
minor: use bear for compilation database (#2919)
zhyncs Jan 16, 2025
8f2c522
Improve benchmark scripts and error message printing (#2922)
merrymercy Jan 16, 2025
a2f602b
fixed lm_head.weight error for quantized qwen (#2910)
RinRin-32 Jan 16, 2025
e00e538
add profiling to bench_one_batch script (#2821)
yundai424 Jan 16, 2025
93d6906
Simplify the process launch code in server.py (#2923)
merrymercy Jan 16, 2025
58f3f2b
Add CI for sgl-kernel (#2924)
ispobock Jan 16, 2025
8b6ce52
Support multi-node DP attention (#2925)
merrymercy Jan 16, 2025
a883f07
Update release-docker-amd.yml to run on amd docker runner. (#2927)
saienduri Jan 16, 2025
bc6915e
Improve type annotation and styles (#2926)
merrymercy Jan 16, 2025
78e974b
[kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920)
BBuf Jan 16, 2025
bf3edc2
Docs: Update pull_request_template.md (#2928)
zhaochenyang20 Jan 16, 2025
0427416
Fix zmq binding (#2930)
merrymercy Jan 16, 2025
a8ccacc
[Frontend] Fix request length check and add option to disallow auto t…
CatherineSue Jan 16, 2025
6305173
Enable CPU device on SGLang (#2806)
chunyuan-w Jan 17, 2025
6a7973a
Update release-docs.yml (#2937)
merrymercy Jan 17, 2025
f3e9b48
Fix sgl-kernel ci (#2938)
ispobock Jan 17, 2025
5dc54f1
feat: remove vllm distributed (#2907)
zhyncs Jan 17, 2025
53e6552
Fix qwen accuracy issue (#2945)
ispobock Jan 17, 2025
c5644ca
docs: add Cursor for adoption and sponsorship (#2950)
zhyncs Jan 17, 2025
d06c1ab
update ci install dependency (#2949)
zhyncs Jan 17, 2025
033c715
cleanup models dependencies 1/n (#2948)
zhyncs Jan 17, 2025
d47c510
Add ut for qwen model (#2947)
ispobock Jan 17, 2025
dc2ac0c
Update pr template (#2951)
ispobock Jan 17, 2025
7a15e9a
cleanup models unused import 2/n (#2952)
zhyncs Jan 17, 2025
78e5b22
feat: use get_rope for gemma2 (#2954)
zhyncs Jan 17, 2025
120c363
Fix Llama-3.1-405B References Docs (#2944)
HermitSun Jan 17, 2025
13387e6
Multi-turn benchmark for hierarchical caching (#2942)
xiezhq-hermann Jan 18, 2025
d3024f4
support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894)
bjmsong Jan 18, 2025
8af7048
Query remaining memory dynamically for PrefillAdder (#2941)
xiezhq-hermann Jan 18, 2025
656dcc1
Remove fp8 monkey patch (#2960)
ispobock Jan 18, 2025
6f98c58
fix sgl-kernel setup.py (#2963)
sleepcoo Jan 18, 2025
2add697
feat: remove vllm get_rope (#2964)
zhyncs Jan 18, 2025
e2cdc8a
upgrade cutlass v3.7.0 (#2967)
zhyncs Jan 18, 2025
c2f212d
optimize MiniMax-Text-01 lightning_attn_decode triton (#2966)
BBuf Jan 18, 2025
3d93f84
[Feature] Support minicpmv v2.6 (#2785)
mickqian Jan 18, 2025
83452db
fix file name spelling mistake and useless variable in minmax-text-01…
BBuf Jan 19, 2025
2bd18e2
Memory pool: Minor optimize to avoid to (#2901)
zhengy001 Jan 19, 2025
4d4cdb3
Frontend: better error message handling for FINISH_ABORT in scheduler…
CatherineSue Jan 19, 2025
81d27c8
Refactor to add TypeBasedDispatcher to simplify dispatching (#2958)
fzyzcjy Jan 19, 2025
7906d1d
Remove the unused write_with_records (#2972)
merrymercy Jan 19, 2025
93b77c8
Fix the request loggings to make it fully able to be easily replayed …
merrymercy Jan 19, 2025
23196d5
Simplify logits processor (#2974)
merrymercy Jan 19, 2025
d33cbb7
remove cub and add cccl (#2976)
zhyncs Jan 19, 2025
53cc91e
[devcontainer] Fix mount and GPU & Support rust dev (#2978)
ByronHsu Jan 19, 2025
ef18b0e
[router] Allow empty worker list for sglang.launch_router (#2979)
ByronHsu Jan 19, 2025
4719c1d
[router] Fix sgl router path for release (#2980)
ByronHsu Jan 19, 2025
5a176c9
fix deepseek v2 with cpu device (#2975)
zhyncs Jan 19, 2025
24cafe3
add config to swtich from vllm custom allreduce to sgl_kernel custom …
yizhang2077 Jan 19, 2025
6ada05d
feat: check for is_cuda for sgl_kernel import (#2984)
zhyncs Jan 19, 2025
3fc2b62
update docker dev image (#2985)
zhyncs Jan 19, 2025
def5c31
docs: update supported_models (#2987)
zhyncs Jan 19, 2025
a69cb5c
cleanup unused header in sgl_kernel (#2986)
zhyncs Jan 19, 2025
8b6a448
fix missing revision arg when loading tokenizer (#2982)
giorgiopiatti-caffeinated Jan 19, 2025
d77caa2
[#2812] Make the decode status dict capcity adjustable by a CLI param…
seungduk-yanolja Jan 19, 2025
2c05f81
fix custom op version compatibility (#2988)
zhyncs Jan 19, 2025
3bcf5ec
support regex in xgrammar backend (#2983)
qeternity Jan 19, 2025
e403d23
[Feature] Add sampler custom logits processor (#2396)
hongpeng-guo Jan 19, 2025
61f42b5
Move sgl.Runtime under sglang/lang (#2990)
merrymercy Jan 20, 2025
cd493b5
Improve metrics, logging, and importing orders (#2992)
merrymercy Jan 20, 2025
0ffcfdf
Docs: Only use X-Grammar in structed output (#2991)
zhaochenyang20 Jan 20, 2025
1a820e3
Remove dependency of pynvml on ROCm (#2995)
lcskrishna Jan 20, 2025
44a9669
keep rotary_embedding only (#2997)
zhyncs Jan 20, 2025
0346489
Separate two entry points: Engine and HTTP server (#2996)
merrymercy Jan 20, 2025
09bcbe0
Update TypeBasedDispatcher and balance CI tests (#3001)
merrymercy Jan 20, 2025
51e87f6
Skip flaky custom_logit_processor tests (#3004)
merrymercy Jan 20, 2025
2584f6d
Docs: Add Performance Demonstaration for DPA (#3005)
zhaochenyang20 Jan 20, 2025
583697c
[Enhancement] Custom Logit Processor Improvement (#2998)
hongpeng-guo Jan 20, 2025
10bfce7
fix moe align blocks benchmark (#3003)
yiakwy-xpu-ml-framework-team Jan 20, 2025
dc18813
Fix perf regression on small batch sizes (#3008)
merrymercy Jan 20, 2025
89cd923
Roll back to use vllm custom allreduce (#3006)
merrymercy Jan 20, 2025
73401fd
Sync distributed package from vllm 0.6.4.post1 (#3010)
merrymercy Jan 20, 2025
b5caa22
[kernel] port rope cuda kernel to sgl-kernel (#2993)
ByronHsu Jan 20, 2025
e94fb7c
chore: bump v0.4.1.post7 (#3009)
zhyncs Jan 20, 2025
41a0ccd
Add clang-format check to sgl-kernel ci (#3012)
ispobock Jan 20, 2025
5dfcacf
Add compile flags for cutlass 3.x (#3013)
ispobock Jan 20, 2025
0311ce8
[router] Expose worker startup secs & Return error instead of panic f…
ByronHsu Jan 20, 2025
3a8428e
[router] Expose worker startup interval (#3019)
ByronHsu Jan 20, 2025
3ad4cd4
bump router to 0.1.3 (#3020)
ByronHsu Jan 20, 2025
af6c535
deepseek v3 and r1 chat template (#3015)
qeternity Jan 20, 2025
da4e8b3
enable kv_scale remap (#3017)
hliuca Jan 20, 2025
949b3fb
[Doc] Update doc of custom logit processor (#3021)
hongpeng-guo Jan 21, 2025
60b2a44
Fix flaky tests in test_programs.py (#3022)
merrymercy Jan 21, 2025
b730aa6
[EAGLE] Fix some boundary situation when retract reqs and req's max t…
josephydu Jan 21, 2025
d2571dd
Enable Cohere2 Models (#3018)
hliuca Jan 21, 2025
287d07a
Misc fixes for eagle (flush_cache, CPU overhead) (#3014)
merrymercy Jan 21, 2025
6c856b4
minor: update Makefile for sgl-kernel (#3025)
zhyncs Jan 21, 2025
ec1c21c
upgrade torch version for sgl-kernel (#3026)
zhyncs Jan 21, 2025
2bac342
fp8 dispatch change
yych0745 Jan 21, 2025
ba7ca85
clean code
yych0745 Jan 21, 2025
2727d7d
fix
yych0745 Jan 21, 2025
b11682e
clean code
yych0745 Jan 21, 2025
a4331cd
Add accuracy and latency tests of eagle into CI (#3027)
merrymercy Jan 21, 2025
5a0d680
feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033)
zhyncs Jan 21, 2025
0ac019f
Support sm90 Int8 gemm (#3035)
ispobock Jan 21, 2025
a42213d
fix pr-test-sgl-kernel (#3036)
zhyncs Jan 21, 2025
fe490cc
Add performance and accuracy test code for FP8 GEMM operations
yych0745 Jan 7, 2025
b2de73d
support w8a8 fp8
HandH1998 Jan 8, 2025
3d8f1c9
Use int64 as indices for set_kv_buffer (#3039)
merrymercy Jan 22, 2025
3691d68
support bias
HandH1998 Jan 9, 2025
38bcf52
fix compilation
HandH1998 Jan 21, 2025
d57f756
clean code
yych0745 Jan 22, 2025
e620244
clean code
yych0745 Jan 22, 2025
699fe9e
Merge pull request #6 from HandH1998/tmptmp
HandH1998 Jan 22, 2025
b6a88bb
Merge remote-tracking branch 'origin/main' into main_w8a8_fp8
HandH1998 Jan 22, 2025
604f4f5
format
HandH1998 Jan 22, 2025
98dc70d
format
HandH1998 Jan 22, 2025
6fc37bd
Fix sgl-kernel compile for sm80 (#3046)
ispobock Jan 22, 2025
a4025f6
Merge branch 'main' into main_w8a8_fp8
zhyncs Jan 22, 2025
8b87aad
upd
zhyncs Jan 22, 2025
9f8f2c7
update norm cu (#3048)
zhyncs Jan 22, 2025
bcda0c9
sync the upstream updates of flashinfer (#3051)
zhyncs Jan 22, 2025
b287319
Merge branch 'main' into main_w8a8_fp8
zhyncs Jan 22, 2025
7353fb9
feat: integrate norm kernels into sgl-kernel (#3052)
zhyncs Jan 22, 2025
9d9b482
feat: integrate activation kernels into sgl-kernel (#3053)
zhyncs Jan 22, 2025
b2bd8f4
minor: update header and use pytest (#3054)
zhyncs Jan 22, 2025
bf66960
feat: integrate bmm_fp8 kernel into sgl-kernel (#3056)
zhyncs Jan 22, 2025
0d2148e
fix rotary_embedding rope_scaling for phi (#3055)
sudo-root-ns Jan 22, 2025
806a300
add notice about flashinfer in sgl-kernel (#3057)
zhyncs Jan 22, 2025
ddc2001
disable custom allreduce on HIP (#3058)
hliuca Jan 22, 2025
b3393e9
[Doc] Update doc of profiling with PyTorch Profiler (#3038)
Fridge003 Jan 22, 2025
b8ab989
Fix the FP8 E4M3 parsing offline scales failure bug (#3045)
sleepcoo Jan 22, 2025
022614d
Add some flags to allow sync token ids across TP ranks (#3060)
merrymercy Jan 22, 2025
c0bf9bf
[devcontainer] add non-root user (#2989)
ByronHsu Jan 23, 2025
5de5065
[router] make error actionable (#3063)
ByronHsu Jan 23, 2025
8b84e69
Fix tp token sync for dp attention (#3062)
merrymercy Jan 23, 2025
862bcff
Support loading of larger models with on-the-fly quantization (#3061)
kwen2501 Jan 23, 2025
ea535dc
Revert "disable custom allreduce on HIP" (#3067)
merrymercy Jan 23, 2025
a547aad
docs: add developer guide for sgl-kernel (#3068)
zhyncs Jan 23, 2025
6de3ad4
Merge branch 'main_w8a8_fp8' of https://github.com/HandH1998/sglang i…
yych0745 Jan 23, 2025
44e12ce
docs: update developer guide for sgl-kernel (#3069)
zhyncs Jan 23, 2025
3e032c0
use v0.6.4.post1 for sgl-kernel ci (#3071)
zhyncs Jan 23, 2025
b4195b0
fix include
HandH1998 Jan 23, 2025
8290ba6
add more shapes for benchmark
HandH1998 Jan 23, 2025
a455233
Merge remote-tracking branch 'origin/main' into main_w8a8_fp8
HandH1998 Jan 23, 2025
ac2dc35
support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 …
BBuf Jan 23, 2025
42f408f
fix bug
HandH1998 Jan 23, 2025
553f5a3
Remove torch dependency in sgl-kernel (#3074)
merrymercy Jan 23, 2025
1f6cf0d
fix build error for sgl-kernel (#3078)
zhyncs Jan 23, 2025
3d0bfa3
update version setup for sgl-kernel (#3079)
zhyncs Jan 23, 2025
07a22cb
use env variable to control the build conf on the CPU build node (#3080)
zhyncs Jan 23, 2025
0da0989
sync flashinfer and update sgl-kernel tests (#3081)
zhyncs Jan 23, 2025
f1b6861
use flashinfer vec_dtypes in sgl_kernel (#3083)
BBuf Jan 23, 2025
e0cd65c
[hotfix] fix test_sampling_scaling_penalties.py ci test (#3084)
BBuf Jan 23, 2025
5de4051
feat: integrate sampling kernels into sgl-kernel (#3086)
zhyncs Jan 23, 2025
54bac8a
chore: bump sgl-kernel 0.0.2.post16 (#3087)
zhyncs Jan 23, 2025
1c4e0d2
Docs: Update doc for server arguments (#2742)
simveit Jan 23, 2025
7bad7e7
Add shapes for int8 gemm benchmark (#3093)
ispobock Jan 24, 2025
9a0cc2e
[router] Forward all request headers from router to workers (#3070)
ByronHsu Jan 24, 2025
8d8ef84
bump router to 0.1.4 (#3094)
ByronHsu Jan 24, 2025
3ed0a54
[router] Fix twine uploading (#3095)
ByronHsu Jan 24, 2025
1739631
Merge branch 'main_w8a8_fp8' of https://github.com/HandH1998/sglang i…
yych0745 Jan 24, 2025
0666d39
cutlass optimization
yych0745 Jan 24, 2025
6619f48
Fix cu118 group gemm compile issue (#3097)
ispobock Jan 24, 2025
b9980af
clean code
HandH1998 Jan 24, 2025
cd51083
fix reivew issues
HandH1998 Jan 24, 2025
4a98c75
Merge remote-tracking branch 'origin/main' into main_w8a8_fp8
HandH1998 Jan 24, 2025
8c3dc13
fix bug
HandH1998 Jan 24, 2025
153b414
minor: sync flashinfer and add turbomind as 3rdparty (#3105)
zhyncs Jan 24, 2025
685a573
Allow local cutlass directory to be used in sgl-kernel build (#3037)
trevor-m Jan 24, 2025
4505a43
[Docs] minor update for phi-3 and phi-4 (#3096)
adarshxs Jan 24, 2025
04f0b4c
minor: update sgl-kernel setup (#3107)
zhyncs Jan 24, 2025
a22f60a
Add workflow for sgl-kernel cu118 release (#3109)
ispobock Jan 24, 2025
665e5e8
Add step to update sgl-kernel whl index (#3110)
ispobock Jan 24, 2025
5d9d15e
support fp32 in sampling_scaling_penalties kernel (#3121)
BBuf Jan 25, 2025
9852214
mirror fix for custom allreduce (#3124)
yizhang2077 Jan 25, 2025
14e754a
chore: bump v0.0.2.post17 for sgl-kernel (#3125)
zhyncs Jan 25, 2025
3cab5f7
speedup pr test for sgl-kernel (#3126)
zhyncs Jan 25, 2025
67ad433
Update tag name for whl release (#3127)
ispobock Jan 25, 2025
c23d570
Update whl index path (#3128)
ispobock Jan 25, 2025
896c074
update installation doc for sgl-kernel (#3129)
zhyncs Jan 25, 2025
9286740
feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_M…
FlamingoPg Jan 25, 2025
da6f808
Fix CI tests (#3132)
merrymercy Jan 26, 2025
27acf63
Use torch.compile for scaling penalty (#3133)
merrymercy Jan 26, 2025
8e48ca8
enable kv_scale for Gemma2 (#3113)
hliuca Jan 26, 2025
a1b582e
Merge remote-tracking branch 'origin/main' into main_w8a8_fp8
HandH1998 Jan 26, 2025
822bae8
feat: cross python wheel for sgl-kernel (#3138)
zhyncs Jan 26, 2025
248391e
Merge branch 'main' into main_w8a8_fp8
zhyncs Jan 26, 2025
66283db
[Fix] Not skip NVML Check on AMD Platform (#3135)
BruceXcluding Jan 26, 2025
4f118a3
Fix repetition penalty (#3139)
merrymercy Jan 26, 2025
95f789a
minor: cleanup sgl-kernel (#3143)
zhyncs Jan 26, 2025
62bf9a4
fix name conflict
HandH1998 Jan 26, 2025
0d7f5a0
Merge remote-tracking branch 'origin/main' into main_w8a8_fp8
HandH1998 Jan 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
From lmsysorg/sglang:dev

# Create non-root user with specified UID and GID
# NOTE: Replace with your own UID and GID. This is a workaround from https://github.com/microsoft/vscode-remote-release/issues/49#issuecomment-489060908.
ARG HOST_UID=1003
ARG HOST_GID=1003
RUN groupadd -g $HOST_GID devuser && \
useradd -m -u $HOST_UID -g $HOST_GID -s /bin/zsh devuser

# Give devuser sudo access
RUN apt-get update && apt-get install -y sudo && \
echo "devuser ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/devuser && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean

# Set up oh-my-zsh for devuser
RUN cp -r /root/.oh-my-zsh /home/devuser/.oh-my-zsh && \
cp /root/.zshrc /home/devuser/.zshrc && \
cp /root/.vimrc /home/devuser/.vimrc && \
cp /root/.tmux.conf /home/devuser/.tmux.conf && \
sed -i 's|/root/.oh-my-zsh|/home/devuser/.oh-my-zsh|g' /home/devuser/.zshrc && \
chown -R devuser:devuser /home/devuser/

# Set workspace directory and ownership
WORKDIR /sgl-workspace/sglang
RUN chown -R devuser:devuser /sgl-workspace

# Switch to devuser
USER devuser

# Install uv
RUN curl -LsSf https://astral.sh/uv/install.sh | sh

# Install rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
10 changes: 7 additions & 3 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
{
"name": "sglang",
"build": {
"dockerfile": "../docker/Dockerfile.dev"
"dockerfile": "Dockerfile"
},
"remoteUser": "devuser",
"customizations": {
"vscode": {
"extensions": [
Expand All @@ -15,6 +16,9 @@
]
}
},
"workspaceFolder": "/sgl-workspace/sglang",
"forwardPorts": []
"forwardPorts": [],
"runArgs": [
"--gpus",
"all"
]
}
7 changes: 4 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

## Checklist

- [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/references/contribution_guide.md).
- [ ] Add unit tests as outlined in the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/references/contribution_guide.md).
- [ ] Update documentation as needed, including docstrings or example tutorials.
- [ ] Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit).
- [ ] Add unit tests as outlined in the [Running Unit Tests](https://docs.sglang.ai/references/contribution_guide.html#running-unit-tests-adding-to-ci).
- [ ] Update documentation / docstrings / example tutorials as needed, according to [Writing Documentation](https://docs.sglang.ai/references/contribution_guide.html#writing-documentation-running-docs-ci).
- [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to [Benchmark and Profiling](https://docs.sglang.ai/references/benchmark_and_profiling.html).
4 changes: 2 additions & 2 deletions .github/workflows/pr-test-rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
cd sgl-router/
cargo test

e2e-rust:
e2e-python:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 2-gpu-runner
steps:
Expand All @@ -65,7 +65,7 @@ jobs:
python3 run_suite.py

finish:
needs: [unit-test-rust, e2e-rust]
needs: [unit-test-rust, e2e-python]
runs-on: ubuntu-latest
steps:
- name: Finish
Expand Down
99 changes: 99 additions & 0 deletions .github/workflows/pr-test-sgl-kernel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
name: PR Test (sgl-kernel)

on:
push:
branches: [ main ]
paths:
- "sgl-kernel/**"
pull_request:
branches: [ main ]
paths:
- "sgl-kernel/**"
workflow_dispatch:

concurrency:
group: pr-test-sgl-kernel-${{ github.ref }}
cancel-in-progress: true

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Check clang-format
uses: DoozyX/clang-format-lint-action@v0.18.1
with:
source: sgl-kernel
extensions: h,c,cpp,hpp,cu,cuh,cc
clangFormatVersion: 16
style: file

build-wheels:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.9']
cuda-version: ['12.4']

steps:
- uses: actions/checkout@v4
with:
submodules: 'recursive'

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Build wheels for Python ${{ matrix.python-version }} and CUDA ${{ matrix.cuda-version }}
run: |
cd sgl-kernel
chmod +x ./build.sh
./build.sh "${{ matrix.python-version }}" "${{ matrix.cuda-version }}"

- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: wheel-python${{ matrix.python-version }}-cuda${{ matrix.cuda-version }}
path: sgl-kernel/dist/*

unit-test:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
needs: build-wheels
runs-on: 1-gpu-runner
steps:
- uses: actions/checkout@v4

- name: Download artifacts
uses: actions/download-artifact@v4
with:
path: sgl-kernel/dist/
merge-multiple: true
pattern: wheel-*

- name: Install
run: |
pip3 install torch==2.5.1 && pip3 install pytest && pip3 install vllm==0.6.4.post1
pip3 uninstall sgl-kernel -y || true
pip3 install sgl-kernel/dist/*whl --force-reinstall --no-deps
pip3 list | grep sgl-kernel

- name: Run test
timeout-minutes: 30
run: |
cd sgl-kernel
find tests -name "test_*.py" | xargs -n 1 python3

- name: Uninstall dependencies
run: |
pip3 uninstall sgl-kernel -y

finish:
needs: [unit-test, lint]
runs-on: ubuntu-latest
steps:
- name: Finish
run: echo "This is an empty step to ensure that all jobs are completed."
46 changes: 30 additions & 16 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ concurrency:
jobs:

unit-test-frontend:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 1-gpu-runner
steps:
- name: Checkout code
Expand All @@ -43,16 +43,18 @@ jobs:

- name: Run test
timeout-minutes: 10
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd test/lang
python3 run_suite.py --suite per-commit

unit-test-backend-1-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 1-gpu-runner
strategy:
matrix:
range: [0-6, 6-16, 16-23, 23-30, 30-100]
range: [0-6, 6-15, 15-22, 22-32, 32-40, 40-100]
steps:
- name: Checkout code
uses: actions/checkout@v3
Expand All @@ -75,7 +77,7 @@ jobs:


unit-test-backend-2-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 2-gpu-runner
steps:
- name: Checkout code
Expand All @@ -87,18 +89,16 @@ jobs:
run: |
bash scripts/ci_install_dependency.sh

- name: Evaluate data parallelism accuracy (DP=2)
- name: Test data parallelism (DP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 test_data_parallelism.py

- name: Evaluate MLA accuracy (TP=2)
- name: Test data parallelism attention (DP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 test_mla.py
python3 test_mla_fp8.py
python3 test_dp_attention.py

- name: Test update weights from distributed
Expand All @@ -107,14 +107,14 @@ jobs:
cd test/srt
python3 test_update_weights_from_distributed.py

- name: Evaluate MoE EP accuracy (TP=2)
- name: Test expert parallelism (EP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 test_moe_ep.py

performance-test-1-gpu-part-1:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 1-gpu-runner
steps:
- name: Checkout code
Expand All @@ -130,7 +130,7 @@ jobs:
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_default
python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_bs1

- name: Benchmark online latency
timeout-minutes: 10
Expand All @@ -150,8 +150,15 @@ jobs:
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_non_stream_small_batch_size

- name: Benchmark online latency (EAGLE)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_eagle


performance-test-1-gpu-part-2:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 1-gpu-runner
steps:
- name: Checkout code
Expand Down Expand Up @@ -182,7 +189,7 @@ jobs:
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default_fp8

performance-test-2-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 2-gpu-runner
steps:
- name: Checkout code
Expand All @@ -198,7 +205,13 @@ jobs:
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_moe_default
python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_moe_tp2_bs1

- name: Benchmark single latency + torch.compile (TP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_torch_compile_tp2_bs1

- name: Benchmark offline throughput (TP=2)
timeout-minutes: 10
Expand All @@ -212,8 +225,9 @@ jobs:
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_without_radix_cache


accuracy-test-1-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 1-gpu-runner
steps:
- name: Checkout code
Expand All @@ -237,7 +251,7 @@ jobs:


accuracy-test-2-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && github.event.pull_request.draft == false
runs-on: 2-gpu-runner
steps:
- name: Checkout code
Expand Down
16 changes: 12 additions & 4 deletions .github/workflows/release-docker-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,27 @@ on:
jobs:
publish:
if: github.repository == 'sgl-project/sglang'
runs-on: docker-builder-amd
runs-on: amd-docker
environment: 'prod'
strategy:
matrix:
rocm_version: ['6.2.0']
build_type: ['all', 'srt']
steps:
- name: Delete huge unnecessary tools folder
run: rm -rf /opt/hostedtoolcache

- name: Checkout repository
uses: actions/checkout@v3

- name: Free disk space
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false
docker-images: false
android: true
dotnet: true
haskell: true
large-packages: true
swap-storage: false

- name: Login to Docker Hub
uses: docker/login-action@v2
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:

- name: Execute notebooks and push to documents
env:
GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }}
GITHUB_TOKEN: ${{ secrets.DOCUMENTATION_PAT_TOKEN }}
run: |
cd docs
make clean
Expand All @@ -49,7 +49,7 @@ jobs:
cd _build/html

git clone https://$GITHUB_TOKEN@github.com/sgl-project/sgl-project.github.io.git ../sgl-project.github.io --depth 1
rm -rf ../sgl-project.github.io/*
find ../sgl-project.github.io/ -mindepth 1 -not -path "../sgl-project.github.io/.git*" -not -name CNAME -not -name ".jekyll" -not -name ".nojekyll" -delete
cp -r * ../sgl-project.github.io
cp ../../README.md ../sgl-project.github.io/README.md
cd ../sgl-project.github.io
Expand Down
Loading