feat(pass): Add OutOfOrderSchedulerPass for cross-pipe dependency opt… #91

luohuan19 · 2026-01-26T03:44:36Z

…imization

This pass performs dependency-preserving reordering of statements within reorderable segments to reduce peak live cross-pipe dependencies (max 8 events), intended to run before InsertSyncPass to avoid hardware event_id exhaustion.

Key features:

Kahn-based topological scheduling with resource constraints
Multiple heuristic strategies to avoid greedy dead-ends
Graceful fallback when strict limits cannot be satisfied
Comprehensive Python bindings and test suite
Integration into PassManager's XPlatform optimization strategy

gemini-code-assist · 2026-01-26T03:44:53Z

Summary of Changes

Hello @luohuan19, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new compiler pass designed to optimize the execution order of operations within the Intermediate Representation (IR). By intelligently reordering statements, the pass minimizes the concurrent usage of critical hardware resources, specifically cross-pipe event IDs. This proactive scheduling ensures that subsequent synchronization passes can operate effectively without encountering resource limitations, thereby enhancing the overall efficiency and reliability of the generated code for heterogeneous architectures.

Highlights

New Out-of-Order Scheduler Pass: Introduced a new OutOfOrderSchedulerPass to reorder statements within reorderable segments (e.g., AssignStmt/EvalStmt) to reduce peak live cross-pipe dependencies.
Hardware Resource Optimization: The pass aims to keep the peak number of 'live' cross-pipe dependency edges at or below a limit of 8, preventing hardware event_id exhaustion, especially before the InsertSyncPass.
Advanced Scheduling Logic: Employs Kahn-based topological scheduling with resource constraints, incorporating multiple heuristic strategies to avoid greedy dead-ends and includes graceful fallback mechanisms when strict limits cannot be met.
Integration and Test Coverage: The new pass is integrated into the PassManager's XPlatform optimization strategy, complete with comprehensive Python bindings and a robust test suite covering various dependency types and edge cases.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the OutOfOrderSchedulerPass, a new optimization pass designed to reorder statements within reorderable segments to reduce peak live cross-pipe dependencies. This is crucial for avoiding hardware event ID exhaustion, especially when running before InsertSyncPass. The implementation utilizes a Kahn-based topological scheduling algorithm with multiple heuristic strategies and a graceful fallback mechanism. The changes include adding the C++ implementation, integrating it into the Python bindings and PassManager's XPlatform strategy, and providing a comprehensive test suite. The new pass significantly enhances the IR optimization capabilities by intelligently managing resource constraints.

gemini-code-assist · 2026-01-26T03:46:28Z

src/ir/transforms/out_of_order_scheduler_pass.cpp

+  if (s == "SCALAR" || s == "S") return PipeType::S;
+  if (s == "FIX") return PipeType::FIX;
+  if (s == "ALL") return PipeType::ALL;
+  return PipeType::S;


The PipeTypeFromString function defaults to PipeType::S if the input string does not match any known pipe type. While this provides a fallback, it might silently mask issues where an unexpected or malformed pipe_type string is provided. Consider adding a LOG_WARN or INTERNAL_CHECK for unhandled cases to aid debugging and prevent silent misconfigurations.

gemini-code-assist · 2026-01-26T03:46:28Z

src/ir/transforms/out_of_order_scheduler_pass.cpp

+        return PipeType::S;
+      }
+    }


Similar to PipeTypeFromString, the get_call_pipe lambda (and consequently GetStmtPipe) defaults to PipeType::S if call->HasKwarg("pipe_type") is true but the GetKwarg call fails or the string cannot be parsed. This could lead to incorrect scheduling decisions without explicit notification. Adding a LOG_WARN in the catch block would be beneficial for identifying such scenarios.

gemini-code-assist · 2026-01-26T03:46:29Z

tests/ut/ir/transforms/test_out_of_order_scheduler.py

+
+    # Verify order is preserved (both statements write to same variable)
+    assert optimized_func is not None
+    assert isinstance(optimized_func.body, ir.SeqStmts)


In test_out_of_order_scheduler_waw_dependency, the assertion assert isinstance(optimized_func.body, ir.SeqStmts) is quite general. To thoroughly verify WAW dependency preservation, it would be more robust to explicitly check that the statement defining _tile_a_v1 appears before the statement defining tile_a_v2 in the optimized_func.body.stmts list. This ensures the scheduler maintains the correct write order.

gemini-code-assist · 2026-01-26T03:46:29Z

tests/ut/ir/transforms/test_out_of_order_scheduler.py

+        # We don't enforce strict ordering here since the pass may do best-effort reordering
+        # The key is that it doesn't crash and returns a valid function
+        pass
+


In test_out_of_order_scheduler_exceeds_event_limit, the comment mentions that the implementation should log a warning when the 8-event limit cannot be satisfied. It would be good to add an assertion here to verify that this warning is indeed logged (e.g., by capturing logs or checking for a specific log message), ensuring the fallback behavior is correctly communicated.

luohuan19 · 2026-01-27T03:33:46Z

OutOfOrderSchedulerPass

Overview

OutOfOrderSchedulerPass reschedules reorderable statements to optimize cross-pipe dependencies while keeping peak event pressure ≤ 8 per pipeline pair.

Goal: Under dependency constraints, reorder statements to minimize peak pressure of cross-pipe synchronization events.

Core Concepts

Pipeline Types

Different computational units: M (CUBE), V (VECTOR), S (SCALAR), MTE1/2/3 (transfers), FIX, ALL.

Cross-Pipe Dependencies

When a statement on pipeline A depends on pipeline B (A ≠ B), synchronization via events is needed:

Producer (A) issues set_event
Consumer (B) waits on wait_event

Live Events

Event is "live" from set_event to wait_event. Resource constraint: max 8 live events per pipeline pair.

Reorderable Statements

This pass runs on each SeqStmts node and may reorder its direct children under dependency constraints.

Compute-like (typical reorder candidates): AssignStmt, EvalStmt
Control-flow / terminator nodes (kept stable in relative order): IfStmt, ForStmt, ReturnStmt, YieldStmt

Phase 1: Control Flow Node Support (CF-aware Analysis)

Phase 1 Overview

Phase 1 extends the scheduler to treat control flow nodes (IfStmt, ForStmt) as immovable black-box composite nodes in the dependency graph. This enables compute statements to be reordered across control flow boundaries when data dependencies allow.

Key Innovation: Instead of cutting statement streams into isolated segments separated by CF barriers, Phase 1 analyzes dependencies at the parent statement level (SeqStmts), allowing better reordering opportunities.

Design Principle: Black-Box CF Nodes

Immovable: Control flow nodes cannot change relative order (if A comes before B, A must stay before B)
Black-box: Statement-level analysis uses StmtEffect to conservatively summarize CF node reads/writes
Permeable: Compute statements can cross CF boundaries if data dependencies permit

StmtEffect: Conservative Side-Effect Summary

Each statement (including CF nodes) is analyzed for side effects:

struct StmtEffect {
  std::set<MemRefPtr> reads;                  // MemRefs read by statement
  std::set<MemRefPtr> writes;                 // MemRefs written by statement
  bool has_unknown_side_effect = false;       // Conservative flag
};

Analysis rules by statement type:

AssignStmt: writes = var, reads = value's MemRefs
EvalStmt: reads = expr's MemRefs, unknown_side_effect = true (conservative)
IfStmt: Union of condition reads + both branch effects
ForStmt: Union of bounds reads + body reads/writes (loop-carried)
SeqStmts/OpStmts: Fold effects from all children
Return/Yield: unknown_side_effect = true (terminators)

Conservative union for branching: When IfStmt or ForStmt can execute different code paths, we conservatively take the union of all possible effects.

Scheduling with CF Nodes

Dependency graph construction (CF-aware mode):

Analyze all statements (compute + CF) using MemRef reads/writes
Build RAW/WAW/WAR edges using StmtEffect results
Unknown side effects create barriers (edges to all subsequent statements)

Ordering constraints (Stability Chain):
After dependency edges are built, add "CF stability chain":

Identify all CF-like nodes in original order: c0, c1, ..., ck
Add edges: c0 → c1 → ... → ck
This preserves CF relative order while allowing compute to cross them

Candidate selection (Strategy A):
During Kahn scheduling, prioritize schedulable compute statements over CF nodes:

First pass: Schedule compute statements with best score
If none available: Fall back to CF nodes
This prevents CF nodes from "blocking" compute optimization

Example: Cross-CF Reordering

Input:

tile_a = load(input_a)      # Depends on input_a

if cond:                    # CF node (reads cond)
    tile_b = add(tile_a, tile_a)

tile_c = load(input_c)      # Independent of If (reads different input)
result = add(tile_c, tile_c) # Depends on tile_c

Dependency Analysis:

tile_a depends on input_a (RAW)
if depends on cond (reads cond expression)
tile_c depends on input_c (RAW, independent of tile_a)
result depends on tile_c (RAW)
result does NOT depend on if statement (different MemRefs)

Phase 1 Optimized Order:

tile_a = load(input_a)      # tile_a first (needed by if body)
tile_c = load(input_c)      # tile_c can cross if (no dependency)
result = add(tile_c, tile_c) # result follows tile_c

if cond:                    # If node preserved (CF stability chain)
    tile_b = add(tile_a, tile_a)

Benefit: tile_c load and result computation moved before if → better pipelining and cross-pipe synchronization.

MemRefCollector

Collects memory references from expressions to build dependency relationships. Analyzes reads/writes to detect:

RAW (Read-After-Write): reads must follow writes
WAW (Write-After-Write): writes must follow previous writes
WAR (Write-After-Read): writes must follow all reads

GetStmtPipe

Extracts pipeline type of statement:

Use Op::GetPipe() if available
Fall back to call.kwargs["pipe_type"]
Default to PipeType::S (scalar)

Returns the pipeline where the statement executes.

LiveCrossPipeEvents

Tracks cross-pipe event state during scheduling:

live_by_pair_: Global live event count per pipeline pair (counts unique active producers, not edges)
pending_successors_: Per-producer map tracking unscheduled consumers per pipe pair
incoming_producers_: Per-consumer list of (producer, pair) dependencies
peak_by_pair_: Peak pressure statistics

Key methods:

PredictAfterScheduling(candidate): Predicts resource impact, returns whether scheduling is feasible
ReleaseIncomingBeforeExecute(stmt): Release wait-side events before statement execution
AllocateOutgoingAfterExecute(stmt): Allocate set-side events after statement execution

Event Semantics: Broadcast Model

Hardware reality: Cross-pipe synchronization uses broadcast semantics:

Producer issues ONE set_event(id) per unique (SRC, DST) pair
Multiple consumers on the same DST pipe can share this event via wait_event(id)
Event_id slot is freed when the FIRST consumer is scheduled (matching InsertSyncPass behavior: sync_dst is inserted only before the first consumer; after that the hardware event_id can be reused)

Implementation:

pending_successors_[producer][pair].remaining: Counts unscheduled consumers (for correctness bookkeeping).
pending_successors_[producer][pair].event_live: Tracks whether this (producer, pair) still occupies an event_id slot.
incoming_producers_[consumer]: Tracks which (producer, pair) combinations this consumer depends on
live_by_pair_[pair]: Counts unique active producers (NOT edges)

Example: If producer P on MTE2 has 3 consumers on V:

Old (per-edge): live_by_pair_[(MTE2,V)] += 3 ❌
New (broadcast): live_by_pair_[(MTE2,V)] += 1 ✓

The (producer, pair) event_id slot is freed when the first of these consumers is actually scheduled. Remaining consumers still keep the dependency relationship, but do not consume an event_id slot.

Lifecycle:

P (MTE2) → C1, C2, C3 (all on V)

After P executes:  live_by_pair_[(MTE2,V)] = 1, pending_successors_[P][(MTE2,V)] = 3
After first scheduled consumer (e.g. C2): pending = 2, live = 0 (event_id slot freed)
After next consumer (e.g. C1): pending = 1, live = 0
After last consumer (e.g. C3): pending = 0 (bookkeeping cleanup), live = 0

Consumer Role Tracking

This scheduler treats “first-consumer” as a dynamic concept:

releases_event (first scheduled consumer): If a ready candidate has at least one incoming (producer, pair) whose event_live is still true, then scheduling this candidate will free at least one event_id slot.
- This matches the runtime insertion model: whichever consumer is scheduled first will be the one that carries the sync_dst wait, and thus frees the event_id slot for reuse.
- Other consumers still keep dependency ordering (they must be scheduled after the producer), but they do not consume additional event_id slots.

Scheduling Algorithm

Overall Flow

Visit each SeqStmts: Collect and visit all direct children
Build dependency graph (CF-aware): Conservative MemRef hazard detection (RAW/WAW/WAR) + unknown side-effect barriers
Add CF stability chain: Preserve relative order among CF/terminator nodes
Kahn topological sort: Enhanced with event_id resource constraints
Multi-strategy scheduling: Try multiple heuristics to find a feasible schedule (strict), then best-effort (relaxed)

Building Dependency Graph

For each statement, collect read/write memory references:

Track last writer for each memory location
Track all readers since last write

Build edges:

RAW: Add edge from last writer to current reader
WAW: Add edge from last writer to current writer
WAR: Add edges from all readers to current writer

Mark each edge as cross-pipe or same-pipe based on pipeline types.

Kahn + Resource Constraints

Enhanced Kahn algorithm that respects event limits:

Initialize ready set with statements having indegree 0
While unscheduled statements exist:
  For each candidate in ready set:
    Predict resource impact if scheduled
    Skip if violates constraint (live events > 8)
    Score candidate using strategy
    Prefer candidates that release at least one live event_id slot (`releases_event`)

  Select best candidate
  Release incoming events (before execution)
  Mark as scheduled
  Allocate outgoing events (after execution)
  Update peak statistics

  Update ready set with new zero-indegree statements

First-Consumer Priority Optimization:

To minimize peak event pressure, the scheduler prioritizes first-consumers:

Candidate comparison first prefers candidates that releases_event == true
This schedules event-releasing consumers earlier, freeing event_id slots sooner
Reduces the likelihood of exceeding the 8-event limit per pipeline pair
Works in conjunction with Strategy A (compute over CF nodes)

Example benefit:

Without priority:  tail_x → [consumer_1, consumer_2, ..., consumer_0] → event held until consumer_0
With priority:     tail_x → [consumer_0, consumer_1, consumer_2, ...] → event released immediately

Candidate Selection Strategies

Selection criteria (in priority order):

kMinMaxThenSumThenIndex (default):
- Primary: Minimize worst pipeline pair pressure (pred_max)
- Secondary: Minimize total pressure (pred_sum)
- Tertiary: By original index
kMinSumThenMaxThenIndex:
- Primary: Minimize total pressure first
- Avoids local greedy traps
kMinMaxThenIndex:
- Only minimize worst pressure
- Simpler, faster decisions

Fallback Strategy

Try strategies in order:

Strict mode (enforce_limit=true):
- Try each strategy
- Enforce 8-event limit strictly
- Return first successful schedule
Relaxed mode (enforce_limit=false):
- If all strict strategies fail
- Don't enforce limit, but minimize pressure
- Generate best-effort topological order
- Logs warning to user

Invariants

Resource Constraint

Each pipeline pair (SRC, DST) has at most 8 live events at any time. This is hardware-enforced and cannot be violated.

Invariant verification:

PredictAfterScheduling checks this before scheduling
INTERNAL_CHECK(pred >= 0) ensures release doesn't make count negative

State Consistency

Internal bookkeeping stays consistent:

live_by_pair_ never goes negative; predicted counts must be ≥ 0
pending_successors_ and incoming_producers_ remain consistent (no double-release, no missing producer-pair state)
Peak statistics tracked accurately

Topological Order

Output satisfies all dependencies (RAW/WAW/WAR). Guaranteed by Kahn algorithm: only schedules statements with indegree 0.

Example

Input Code

A = compute_on_M(...)     # Pipeline M
B = compute_on_V(A)       # Pipeline V, depends on A (cross-pipe)
C = compute_on_M(...)     # Pipeline M
D = compute_on_V(C)       # Pipeline V, depends on C (cross-pipe)
E = compute_on_V(B, D)    # Pipeline V, depends on B and D

Dependency Graph

A(M) → B(V)
       ↓
C(M) → D(V) → E(V)

Cross-pipe edges: A→B, C→D

Original Schedule

Order: A → B → C → D → E

Time	Execute	Live Events	(M→V) Count
1	A	{A→B}	1
2	B	{}	0
3	C	{C→D}	1
4	D	{}	0
5	E	{}	0

Peak M→V events: 1

Optimized Schedule

Order: A → C → B → D → E

Time	Execute	Live Events	(M→V) Count
1	A	{A→B}	1
2	C	{A→B, C→D}	2
3	B	{C→D}	1
4	D	{}	0
5	E	{}	0

Peak M→V events: 2

Benefit: Pipeline M operations batched together (A, C), then pipeline V operations (B, D, E). Reduces pipeline switches and improves instruction-level parallelism, even though peak event pressure slightly increases.

Complexity

Time: O(n²) graph building + O(n × |ready| × 3) Kahn scheduling = O(n²) worst case
Space: O(n²) edges + O(pipeline pairs × n) live events

Limitations

Phase 1 limitations:
- Control flow nodes treated as immovable black boxes (no inter-procedural analysis)
- StmtEffect uses conservative union for branches (may create false dependencies)
- No path-sensitive analysis (assumes all branches equally likely)
- No loop-invariant code motion (LICM not implemented yet)
Conservative: MemRef-based analysis may be overly conservative
Hardcoded limit: kMaxEventIds = 8 not configurable
Best-effort fallback: May not always satisfy constraints

Future Work (Phase 2+)

Path-sensitive analysis:

Analyze conditional branches to enable more aggressive reordering
Differentiate between "must execute" vs "may execute" effects

Loop-invariant code motion (LICM):

Move loop-invariant computations outside ForStmt bodies
Requires proving expressions don't change across iterations

Inter-procedural analysis:

Analyze nested CF bodies for finer-grained reordering opportunities
Recursively schedule within If/For statement bodies

Debugging

Enable debug logs to track:

Segment scheduling: "scheduled segment size=X, worst_peak=Y"
Strategy recovery: "Recovered feasible schedule with strategy=Z"
Relaxed fallback: "Cannot satisfy event limit, using best-effort"

Verify:

GetStmtPipe returns correct pipeline types
Dependency graph captures RAW/WAW/WAR correctly
Live event tracking matches expectations

Hzfengsy · 2026-01-27T09:42:54Z

docs/dev/passes/OutOfOrderSchedulerPass.md

rename file to 00-out-of-order-schedule.md

Make it shorter, each docs should be around 200-300 lines of markdown

Alright, I'll make the changes.

Hzfengsy · 2026-01-27T09:44:39Z

src/ir/transform/out_of_order_scheduler_pass.cpp

+    }
+  };
+
+  auto Better = [](const CandidateScore& a, const CandidateScore& b, PickStrategy strategy) -> bool {


ask AI to reorganize code. The current version contains one large function, which is hard to read

Alright, I'll refactor that function.

…imization This pass performs dependency-preserving reordering of statements within reorderable segments to reduce peak live cross-pipe dependencies (max 8 events), intended to run before InsertSyncPass to avoid hardware event_id exhaustion. Key features: - Kahn-based topological scheduling with resource constraints - Multiple heuristic strategies to avoid greedy dead-ends - Graceful fallback when strict limits cannot be satisfied - Comprehensive Python bindings and test suite - Integration into PassManager's XPlatform optimization strategy

…and tests

Extract large ScheduleSegment function (~260 lines) into focused helper functions for better readability and maintainability: - GetMemRefs: Extract memory references from expressions - BuildDependencyGraph: Build RAW/WAW/WAR dependency edges - BuildAdjacencyLists: Create successor lists and indegree arrays - IsBetterCandidate: Compare candidates using selection strategies - RunKahnScheduling: Kahn topological sort with resource constraints - FindFeasibleSchedule: Multi-strategy scheduling with fallback ScheduleSegment now serves as a clean 40-line orchestrator showing the three main steps: extract pipe types, build dependencies, find schedule. Also rename OutOfOrderSchedulerPass.md to 00-out-of-order-schedule.md to follow documentation naming convention.

…d broadcast event model Add Phase 1 control flow node support that treats IfStmt/ForStmt as immovable black-box composite nodes in the dependency graph. This enables compute statements to be reordered across control flow boundaries when data dependencies allow. Key improvements: - Introduce StmtEffect for conservative side-effect analysis of CF nodes - Implement broadcast event semantics (one event_id per producer-pair, not per edge) matching hardware reality and InsertSyncPass behavior - Add first-consumer priority optimization to minimize peak event pressure - Add CF stability chain to preserve relative order of control flow nodes - Refactor LiveCrossPipeEvents tracking with pending_successors and incoming_producers maps Documentation updates: - Add comprehensive Phase 1 design documentation - Document broadcast event model and consumer role tracking - Add cross-CF reordering examples and event lifecycle diagrams - Update limitations and add future work section Test improvements: - Add run_pass_with_ir_print helper for better test debugging - Refactor test organization and cleanup

- Remove separate header file, consolidate into source file - Update pass registration in passes.h and bindings - Synchronize Python bindings and type stubs - Update test implementation This refactoring simplifies the pass structure by moving the class definition into the implementation file, reducing header dependencies while maintaining all functionality.

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

Hzfengsy reviewed Jan 27, 2026

View reviewed changes

luohuan19 force-pushed the main branch from d3d203f to a3ed65d Compare January 28, 2026 01:36

luohuan19 requested a review from Hzfengsy January 28, 2026 01:54

luohuan19 force-pushed the main branch from a3ed65d to 03e2d5c Compare January 29, 2026 02:20

luohuan19 added 5 commits January 29, 2026 22:59

docs(pass): Add OutOfOrderSchedulerPass implementation documentation …

317b17b

…and tests

luohuan19 force-pushed the main branch from 03e2d5c to b25d63a Compare January 30, 2026 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pass): Add OutOfOrderSchedulerPass for cross-pipe dependency opt… #91

feat(pass): Add OutOfOrderSchedulerPass for cross-pipe dependency opt… #91

Uh oh!

luohuan19 commented Jan 26, 2026

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

luohuan19 commented Jan 27, 2026 •

edited

Loading

Uh oh!

Hzfengsy Jan 27, 2026

Uh oh!

luohuan19 Jan 27, 2026

Uh oh!

Hzfengsy Jan 27, 2026

Uh oh!

luohuan19 Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(pass): Add OutOfOrderSchedulerPass for cross-pipe dependency opt… #91

Are you sure you want to change the base?

feat(pass): Add OutOfOrderSchedulerPass for cross-pipe dependency opt… #91

Uh oh!

Conversation

luohuan19 commented Jan 26, 2026

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

luohuan19 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OutOfOrderSchedulerPass

Overview

Core Concepts

Pipeline Types

Cross-Pipe Dependencies

Live Events

Reorderable Statements

Phase 1: Control Flow Node Support (CF-aware Analysis)

Phase 1 Overview

Design Principle: Black-Box CF Nodes

StmtEffect: Conservative Side-Effect Summary

Scheduling with CF Nodes

Example: Cross-CF Reordering

MemRefCollector

GetStmtPipe

LiveCrossPipeEvents

Event Semantics: Broadcast Model

Consumer Role Tracking

Scheduling Algorithm

Overall Flow

Building Dependency Graph

Kahn + Resource Constraints

Candidate Selection Strategies

Fallback Strategy

Invariants

Resource Constraint

State Consistency

Topological Order

Example

Input Code

Dependency Graph

Original Schedule

Optimized Schedule

Complexity

Limitations

Future Work (Phase 2+)

Debugging

Uh oh!

Hzfengsy Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

luohuan19 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Hzfengsy Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

luohuan19 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

luohuan19 commented Jan 27, 2026 •

edited

Loading