From 694d147c347899fe4eb68e594278292e170e4b46 Mon Sep 17 00:00:00 2001
From: "claude[bot]" <41898282+claude[bot]@users.noreply.github.com>
Date: Thu, 5 Mar 2026 04:06:44 +0000
Subject: [PATCH] Add expert parallelism documentation to Claude workflow
 configs

Document the differences between vLLM's --enable-expert-parallel (boolean flag)
and SGLang's --expert-parallel-size N (explicit integer) in both claude.yml and
claude-pr-review.yml. Scripts should conditionally enable --enable-expert-parallel
based on the EP_SIZE env var rather than hardcoding it.

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
---
 .github/workflows/claude-pr-review.yml | 20 ++++++++++++++++++++
 .github/workflows/claude.yml           | 18 ++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/.github/workflows/claude-pr-review.yml b/.github/workflows/claude-pr-review.yml
index 1b5e3f96e..8c848b733 100644
--- a/.github/workflows/claude-pr-review.yml
+++ b/.github/workflows/claude-pr-review.yml
@@ -129,6 +129,26 @@ jobs:
             - **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding.
             - **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN).
 
+            ## Expert Parallelism Validation:
+            When reviewing benchmark scripts for MoE models, verify that expert parallelism flags are used correctly:
+
+            - **vLLM** (`vllm serve`): Uses `--enable-expert-parallel` (boolean flag). Does NOT accept `--expert-parallel-size`. EP size is automatically determined by vLLM.
+            - **SGLang** (`sglang.launch_server`): Uses `--expert-parallel-size N` (explicit integer).
+            - **ATOM** (AMD vLLM fork): Uses `--enable-expert-parallel` (same as vLLM).
+
+            **Required pattern for vLLM/ATOM scripts:**
+            Scripts must NOT hardcode `--enable-expert-parallel`. Instead, they should conditionally enable it based on the `EP_SIZE` env var:
+            ```bash
+            if [ "$EP_SIZE" -gt 1 ]; then
+              EP=" --enable-expert-parallel"
+            else
+              EP=" "
+            fi
+            ```
+            If a script hardcodes `--enable-expert-parallel` without checking `EP_SIZE`:
+            - This is a 🟡 **WARNING** issue
+            - Comment: "Expert parallelism should be conditional on the `EP_SIZE` env var from the config YAML, not hardcoded. Use the `if [ \"$EP_SIZE\" -gt 1 ]` pattern to conditionally enable `--enable-expert-parallel`."
+
             Remember: Silence is golden. No comment is better than a low-value comment.
 
             ## Container Image Accessibility Validation:
diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml
index c280ea3ca..1be4b1b98 100644
--- a/.github/workflows/claude.yml
+++ b/.github/workflows/claude.yml
@@ -274,3 +274,21 @@ jobs:
             - **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding.
             - **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN).
 
+            ### Expert Parallelism in Benchmark Scripts
+            vLLM and SGLang handle expert parallelism differently. When writing or reviewing benchmark scripts for MoE models:
+
+            - **vLLM** (`vllm serve`): Uses `--enable-expert-parallel` (a boolean flag). vLLM does NOT accept `--expert-parallel-size`. When EP is enabled, vLLM automatically determines the EP size based on TP and the number of available GPUs.
+            - **SGLang** (`sglang.launch_server`): Uses `--expert-parallel-size N` (an explicit integer). Pass the `EP_SIZE` env var value directly.
+            - **ATOM** (AMD vLLM fork): Uses `--enable-expert-parallel` (same as vLLM).
+
+            **Required pattern for vLLM/ATOM scripts:** Scripts must conditionally enable `--enable-expert-parallel` based on the `EP_SIZE` env var from the config YAML, rather than hardcoding it:
+            ```bash
+            if [ "$EP_SIZE" -gt 1 ]; then
+              EP=" --enable-expert-parallel"
+            else
+              EP=" "
+            fi
+            # Then use $EP in the vllm serve command
+            ```
+            This ensures the script respects the `ep` setting in the master config YAML's search-space.
+