SandAI-org
diff --git a/‎.cursor/skills/debug-test-failures/SKILL.md‎
Lines changed: 110 additions & 0 deletions b/‎.cursor/skills/debug-test-failures/SKILL.md‎
Lines changed: 110 additions & 0 deletions
diff --git a/‎.cursor/skills/magi-code-philosophy/SKILL.md‎
Lines changed: 183 additions & 0 deletions b/‎.cursor/skills/magi-code-philosophy/SKILL.md‎
Lines changed: 183 additions & 0 deletions
diff --git a/‎.flake8‎
Lines changed: 3 additions & 1 deletion b/‎.flake8‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 4 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎MANIFEST.in‎
Lines changed: 1 addition & 3 deletions b/‎MANIFEST.in‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎conftest.py‎
Lines changed: 11 additions & 0 deletions b/‎conftest.py‎
Lines changed: 11 additions & 0 deletions
@@ -0,0 +1,110 @@
+---
+name: debug-test-failures
+description: >-
+  Systematic approach to debug failing tests. Use when a user reports test
+  failures, regression bugs, or assertion mismatches. Core method: reproduce
+  the failure, then use git history to determine whether the failure is caused
+  by a recent commit or by the user's uncommitted changes.
+---
+
+# Debug Test Failures
+
+## Principle
+
+**Test failures always have a cause in code changes.** The fastest debugging
+path is to figure out *which change* broke it, not to guess at the logic.
+
+## Step 1: Reproduce and Extract Key Info
+
+Run the failing test, capture full output:
+
+```bash
+python -m pytest -sq <test_file> 2>&1 | head -80
+python -m pytest -sq <test_file> 2>&1 | tail -50
+```
+
+Extract from the error:
+- Which test case(s) failed
+- The **actual vs expected** values
+- The **file and line** of the failing assertion
+
+## Step 2: Determine the Scope of Failure
+
+Check if this test *ever* passed on the current branch:
+
+```bash
+# What files have uncommitted changes?
+git status
+
+# What committed changes touch files related to the failure?
+git log --oneline -20 -- <relevant_source_files>
+```
+
+This splits into two cases:
+
+### Case A: The user has uncommitted changes in related files
+
+```bash
+git diff -- <relevant_source_files>
+```
+
+Read the diff carefully. The bug is likely in the uncommitted changes.
+Compare the diff against the assertion error to find the mismatch.
+
+### Case B: No uncommitted changes in related files
+
+The regression was introduced by a recent commit. Proceed to Step 3.
+
+## Step 3: Walk Git History to Find the Offending Commit
+
+```bash
+# List recent commits touching the relevant files
+git log --oneline --all -- <file_path>
+```
+
+Then inspect each suspect commit:
+
+```bash
+git show <commit_hash> -- <file_path>
+```
+
+Walk commits **chronologically** and identify:
+1. **Last known good state** — what the logic looked like before
+2. **Offending commit** — where behavior changed
+3. **Intent** — was the change a refactor, feature, or bugfix?
+
+Common regression patterns:
+- **Refactoring that widens a condition** — e.g. merging two flags into one,
+  where the new condition covers more cases than intended
+- **Default value changes** — a dataclass/config default was changed, silently
+  affecting callers that relied on the old default
+- **Silent override in initialization** — `__init__` / `__post_init__` /
+  constructor overwrites a user-provided value under a too-broad condition
+
+## Step 4: Confirm Root Cause by Comparing Before/After
+
+Once you identify the suspect commit, compare the old and new logic side by
+side. Verify that:
+- The old logic would produce the **expected** test output
+- The new logic produces the **actual** (wrong) test output
+- The behavioral difference is **unintentional** (not a deliberate design change)
+
+## Step 5: Apply Minimal Fix and Verify
+
+1. Make the **smallest change** that restores correct behavior
+2. Ensure the original intent of the offending commit is preserved
+3. Re-run the failing test to confirm all cases pass
+
+```bash
+python -m pytest -sq <test_file> 2>&1 | tail -5
+```
+
+## Anti-Patterns
+
+- **Don't update expected values** to match broken output without understanding
+  why they differ — the tests encode domain knowledge.
+- **Don't guess at the fix** without tracing the cause — always check git
+  history first.
+- **Don't ignore the "other case"** — if the fix narrows a condition, verify
+  that the broader case (which the offending commit intended to handle) still
+  works correctly.
@@ -0,0 +1,183 @@
+---
+name: magi-code-philosophy
+description: >-
+  Core engineering philosophies for the Magi Attention codebase. Use when
+  writing, reviewing, or modifying any code in this project. Covers config
+  consistency, static-analysis-friendly readability, and test coverage
+  requirements. Any deviation from these philosophies must be strictly
+  commented with justification.
+---
+
+# Magi Code Philosophy
+
+These are the non-negotiable engineering principles for this codebase.
+Every contributor — human or AI — must follow them. When a principle
+cannot be followed, a **DEVIATION comment** is required (see bottom).
+
+---
+
+## Philosophy 1: Config Consistency
+
+> **A config field's value should mean what the user set it to.**
+
+If internal logic must transform, override, or reinterpret a user-supplied
+value, this is a deviation that requires explicit justification.
+
+### Rules
+
+1. **Preserve user intent** — When a user sets `field=X`, reading `obj.field`
+   should return `X` or something recognizably equivalent. If the value must
+   be normalized, store the original intent in a private field or property
+   before overwriting.
+
+2. **Validate early, normalize minimally** — `__post_init__` should primarily
+   assert invariants. Normalization should be the smallest necessary
+   adjustment. Never silently clamp or discard user input.
+
+3. **Derived fields ≠ overwritten fields** — Values computed from other
+   fields should be **new** fields, not overwrites of user input. Exception:
+   sentinel values (e.g., `-1` = "auto-detect") are designed to be replaced,
+   but still require a comment.
+
+4. **In-place mutation must be documented** — If `__post_init__` or a helper
+   mutates a nested structure (e.g., `num_tokens *= 2` for packed KV), the
+   site must comment: what is mutated, why, and how to recover the original.
+
+5. **Forced override needs justification** — When code forces a field value
+   from another field (e.g., `deterministic |= reduce_op != "sum"`), explain
+   why the user's choice is being overridden.
+
+### Deviation Format (Config)
+
+```python
+# DEVIATION: <one-line summary>
+# Reason: <why the user-facing value cannot be kept as-is>
+# Recovery: <how to access original intent, or "none">
+```
+
+---
+
+## Philosophy 2: Readability via Static Navigability
+
+> **Every symbol in the code must be statically resolvable and jump-to-able.**
+
+Code is read far more often than written. The reader should be able to
+Ctrl+Click (or equivalent) on any name and land on its definition. If the
+IDE's static analysis cannot resolve a symbol, the code is not readable
+enough.
+
+### Rules
+
+1. **Explicit imports over dynamic lookups** — Use direct imports, not
+   `getattr(module, name)` or `globals()[name]`. If dynamic dispatch is
+   truly needed, use a typed registry/dict with the concrete types visible
+   at the registration site.
+
+2. **Typed dicts and enums over magic strings** — Prefer `Enum` members
+   and `TypedDict` keys over raw string literals. Strings are invisible to
+   static analysis; enums and typed keys are jump-to-able.
+
+3. **No untyped `**kwargs` pass-through in public APIs** — Public-facing
+   functions should declare their parameters explicitly. `**kwargs` may be
+   used internally (e.g., forwarding to a backend), but the public signature
+   must be self-documenting.
+
+4. **Avoid deep `Any` typing** — `Any` kills jump-to-definition. Use
+   `Protocol`, generics, or union types. Reserve `Any` for truly
+   polymorphic boundaries (e.g., serialization).
+
+5. **String-based dispatch must have a central map** — If behavior branches
+   on a string value, define the mapping in one place (a dict or match/case)
+   so that all targets are visible together and searchable.
+
+6. **Re-exports must be explicit** — When `__init__.py` re-exports symbols,
+   use explicit `from .module import Name` rather than `import module` with
+   `__all__`. This ensures the IDE can resolve the re-exported name.
+
+### Deviation Format (Readability)
+
+```python
+# DEVIATION: <what is not statically resolvable>
+# Reason: <why dynamic dispatch / Any / kwargs is unavoidable here>
+# Mitigation: <how a reader can still find the target, e.g., "see registry at X">
+```
+
+---
+
+## Philosophy 3: Test Completeness
+
+> **Where there is code, there must be tests.**
+
+No feature, bug fix, refactor, or config change is considered done until
+it has corresponding test coverage. Untested code is assumed broken.
+
+### Rules
+
+1. **Every public function/class has a test** — If it's importable from
+   outside its module, it needs at least one test exercising its primary
+   path and one test for its most important edge case.
+
+2. **Config normalization must be tested** — For every `__post_init__`
+   normalization or deviation, there must be a test that:
+   - Sets the user-facing value
+   - Asserts the normalized internal value
+   - Asserts the original intent is recoverable (if applicable)
+
+3. **Bug fixes come with regression tests** — The test must reproduce the
+   original failure first (red), then pass with the fix (green).
+
+4. **Solver/algorithm changes need correctness tests** — Any change to a
+   solver (`overlap_solver`, `dispatch_solver`, `dist_attn_solver`, etc.)
+   must include tests that verify the output solution against known-good
+   reference values.
+
+5. **Test names describe the scenario** — Use descriptive names like
+   `test_overlap_config_degree_zero_normalizes_to_one`, not `test_config_1`.
+   The name should read as a specification.
+
+6. **No test-only code in production modules** — Test helpers, fixtures, and
+   mocks live in `tests/`. Production code should not contain `if TESTING:`
+   branches or similar.
+
+### Deviation Format (Test)
+
+```
+# DEVIATION: <what is not tested>
+# Reason: <why testing is impractical, e.g., requires multi-GPU hardware>
+# Tracking: <issue/TODO reference for future coverage>
+```
+
+---
+
+## General Deviation Protocol
+
+When **any** philosophy cannot be followed, add a structured comment at the
+deviation site. The format depends on the philosophy (see each section
+above). The key invariant is:
+
+> **Silence is not acceptable. If the code deviates, the code says so.**
+
+Reviewers (human or AI) should flag any deviation that lacks a comment as a
+blocking issue.
+
+---
+
+## Quick Checklist
+
+Before submitting code, verify:
+
+**Config Consistency**
+- [ ] Every `__post_init__` field overwrite has a DEVIATION comment
+- [ ] User intent is recoverable via private field or property
+- [ ] Sentinel values are documented in the class docstring
+
+**Readability**
+- [ ] All symbols are Ctrl+Click navigable (no unresolvable dynamic lookups)
+- [ ] Public APIs have explicit typed signatures (no bare `**kwargs`)
+- [ ] String-based dispatch has a central, visible mapping
+
+**Tests**
+- [ ] Every new/changed public API has corresponding tests
+- [ ] Config normalizations have dedicated test cases
+- [ ] Bug fixes include a regression test
+- [ ] Test names describe the scenario, not just a number
@@ -8,4 +8,6 @@ ignore =
     E203
 exclude = 
     # Exclude Python interface files
-    *.pyi  
+    *.pyi
+    # Exclude translation script (contains long i18n strings)
+    scripts/translate_po.py
@@ -148,6 +148,7 @@ celerybeat.pid
 .envrc
 .venv
 env/
+!magi_attention/env/
 venv/
 ENV/
 env.bak/
 
@@ -19,6 +19,7 @@ repos:
         pass_filenames: true
         always_run: true
         files: \.(txt|md|yaml|c|cc|cxx|cpp|cu|cuh|h|hpp|hxx|proto|py|pyi|sh)$
+        exclude: '(_zh\.\w+$|scripts/translate_po\.py|docs/source/conf\.py)'
     -   id: csrc_code_formatter
         name: check for csrc code format
         entry: bash scripts/run_csrc_code_formatter.sh
@@ -58,15 +59,15 @@ repos:
     -   id: black
         files: (.*\.(py|pyi|bzl)|BUILD|.*\.BUILD|WORKSPACE)$
 -   repo: https://github.com/PyCQA/flake8
-    rev: 6.1.0
+    rev: 7.3.0
     hooks:
     -   id: flake8
         args: ["--config=.flake8"]
 -   repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.1.5
+    rev: v0.11.4
     hooks:
     -   id: ruff
-        args: [--fix, --exit-non-zero-on-fix, --no-cache]
+        args: [--fix, --exit-non-zero-on-fix, --no-cache, --target-version=py310]
 -   repo: https://github.com/pre-commit/mirrors-isort
     rev: v5.10.1
     hooks:
@@ -77,4 +78,4 @@ repos:
     hooks:
     -   id: mypy
         files: \.py$
-        args: [--config=mypy.ini, --ignore-missing-imports]
+        args: [--config=mypy.ini, --ignore-missing-imports, --python-version=3.10]
@@ -4,10 +4,8 @@ include LICENSE
 
 # Only include source code under csrc (runtime JIT/extension needs)
 recursive-include magi_attention/csrc/common *.h *.hpp
-recursive-include magi_attention/csrc/extensions *.hpp *.cpp
+recursive-include magi_attention/csrc/extensions *.hpp *.cpp *.cu *.cuh *.h
 recursive-include magi_attention/csrc/flexible_flash_attention *.h *.hpp *.cuh *.cu *.cpp *.jinja *.py
-recursive-include magi_attention/csrc/utils *.cpp *.cu
-
 # Cutlass: keep only headers under include/
 prune magi_attention/csrc/cutlass
 graft magi_attention/csrc/cutlass/include
 
@@ -63,7 +63,7 @@ To achieve linear scalability in distributed attention, we implemented the follo
 
 - **Flexible Flash Attention Kernel**. We introduce a generalized attention mask formulation namely `AttnSlice` with a tailed kernel<em>Flex‑Flash‑Attention (FFA)</em>—natively designed to enable compact expression of diverse mask types and make distributed mask partitioning tractable, with performance comparable to [Flash-Attention 3](https://arxiv.org/abs/2407.08608) on Hopper GPUs, and preliminary support for Blackwell via a forked [Flash-Attention 4](https://github.com/demonatic/flash-attention/tree/magi_attn_blackwell_support).
 - **Computation Load Balancing**. With a fine-grained chunk‑level sharding strategy, we elaborate an efficient <em>dispatch solver</em> that ensures balanced computational workloads across each CP rank.
-- **Zero-Redundant Communication**. Instead of adopting the common Ring-style P2P communication pattern, we ropose two novel communication primitives, <em>GroupCast</em> and <em>GroupReduce</em>, realizing zero-redundant communication volume for both forward and backward passes.
+- **Zero-Redundant Communication**. Instead of adopting the common Ring-style P2P communication pattern, we propose two novel communication primitives, <em>GroupCast</em> and <em>GroupReduce</em>, realizing zero-redundant communication volume for both forward and backward passes.
 - **Adaptive Multi-Stage Overlap**. Leveraging the above enhancements, we further implement an adaptive multi-stage overlap strategy that schedules computation and communication to effectively hide latency and maximize utilization via either manual or automatic tuning.
 
 If you are interested in the detailed methodology and implementation, please check our [blog](https://SandAI-org.github.io/MagiAttention/docs/main/blog/magi_attn.html#methodology) for more information.
 
@@ -12,18 +12,29 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import os
+
 import pytest
 
 
 def pytest_addoption(parser):
     parser.addoption(
         "--skip-slow", action="store_true", default=False, help="skip slow tests"
     )
+    parser.addoption(
+        "--test-attn-config",
+        default=None,
+        help="comma-separated attn_config names to run (supports fnmatch wildcards)",
+    )
 
 
 def pytest_configure(config):
     config.addinivalue_line("markers", "slow: marks a test as slow to run")
 
+    attn_config_filter = config.getoption("--test-attn-config", default=None)
+    if attn_config_filter is not None:
+        os.environ["MAGI_ATTENTION_TEST_ATTN_CONFIG"] = attn_config_filter
+
 
 def pytest_collection_modifyitems(config, items):
     if config.getoption("--skip-slow"):