-
Notifications
You must be signed in to change notification settings - Fork 15
feat(pass): Add OutOfOrderSchedulerPass for cross-pipe dependency opt… #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @luohuan19, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new compiler pass designed to optimize the execution order of operations within the Intermediate Representation (IR). By intelligently reordering statements, the pass minimizes the concurrent usage of critical hardware resources, specifically cross-pipe event IDs. This proactive scheduling ensures that subsequent synchronization passes can operate effectively without encountering resource limitations, thereby enhancing the overall efficiency and reliability of the generated code for heterogeneous architectures. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the OutOfOrderSchedulerPass, a new optimization pass designed to reorder statements within reorderable segments to reduce peak live cross-pipe dependencies. This is crucial for avoiding hardware event ID exhaustion, especially when running before InsertSyncPass. The implementation utilizes a Kahn-based topological scheduling algorithm with multiple heuristic strategies and a graceful fallback mechanism. The changes include adding the C++ implementation, integrating it into the Python bindings and PassManager's XPlatform strategy, and providing a comprehensive test suite. The new pass significantly enhances the IR optimization capabilities by intelligently managing resource constraints.
| if (s == "SCALAR" || s == "S") return PipeType::S; | ||
| if (s == "FIX") return PipeType::FIX; | ||
| if (s == "ALL") return PipeType::ALL; | ||
| return PipeType::S; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PipeTypeFromString function defaults to PipeType::S if the input string does not match any known pipe type. While this provides a fallback, it might silently mask issues where an unexpected or malformed pipe_type string is provided. Consider adding a LOG_WARN or INTERNAL_CHECK for unhandled cases to aid debugging and prevent silent misconfigurations.
| return PipeType::S; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to PipeTypeFromString, the get_call_pipe lambda (and consequently GetStmtPipe) defaults to PipeType::S if call->HasKwarg("pipe_type") is true but the GetKwarg call fails or the string cannot be parsed. This could lead to incorrect scheduling decisions without explicit notification. Adding a LOG_WARN in the catch block would be beneficial for identifying such scenarios.
|
|
||
| # Verify order is preserved (both statements write to same variable) | ||
| assert optimized_func is not None | ||
| assert isinstance(optimized_func.body, ir.SeqStmts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test_out_of_order_scheduler_waw_dependency, the assertion assert isinstance(optimized_func.body, ir.SeqStmts) is quite general. To thoroughly verify WAW dependency preservation, it would be more robust to explicitly check that the statement defining _tile_a_v1 appears before the statement defining tile_a_v2 in the optimized_func.body.stmts list. This ensures the scheduler maintains the correct write order.
| # We don't enforce strict ordering here since the pass may do best-effort reordering | ||
| # The key is that it doesn't crash and returns a valid function | ||
| pass | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test_out_of_order_scheduler_exceeds_event_limit, the comment mentions that the implementation should log a warning when the 8-event limit cannot be satisfied. It would be good to add an assertion here to verify that this warning is indeed logged (e.g., by capturing logs or checking for a specific log message), ensuring the fallback behavior is correctly communicated.
OutOfOrderSchedulerPassOverview
Goal: Under dependency constraints, reorder statements to minimize peak pressure of cross-pipe synchronization events. Core ConceptsPipeline TypesDifferent computational units: M (CUBE), V (VECTOR), S (SCALAR), MTE1/2/3 (transfers), FIX, ALL. Cross-Pipe DependenciesWhen a statement on pipeline A depends on pipeline B (A ≠ B), synchronization via events is needed:
Live EventsEvent is "live" from Reorderable StatementsThis pass runs on each
Phase 1: Control Flow Node Support (CF-aware Analysis)Phase 1 OverviewPhase 1 extends the scheduler to treat control flow nodes (IfStmt, ForStmt) as immovable black-box composite nodes in the dependency graph. This enables compute statements to be reordered across control flow boundaries when data dependencies allow. Key Innovation: Instead of cutting statement streams into isolated segments separated by CF barriers, Phase 1 analyzes dependencies at the parent statement level (SeqStmts), allowing better reordering opportunities. Design Principle: Black-Box CF Nodes
StmtEffect: Conservative Side-Effect SummaryEach statement (including CF nodes) is analyzed for side effects: struct StmtEffect {
std::set<MemRefPtr> reads; // MemRefs read by statement
std::set<MemRefPtr> writes; // MemRefs written by statement
bool has_unknown_side_effect = false; // Conservative flag
};Analysis rules by statement type:
Conservative union for branching: When IfStmt or ForStmt can execute different code paths, we conservatively take the union of all possible effects. Scheduling with CF NodesDependency graph construction (CF-aware mode):
Ordering constraints (Stability Chain):
Candidate selection (Strategy A):
Example: Cross-CF ReorderingInput: tile_a = load(input_a) # Depends on input_a
if cond: # CF node (reads cond)
tile_b = add(tile_a, tile_a)
tile_c = load(input_c) # Independent of If (reads different input)
result = add(tile_c, tile_c) # Depends on tile_cDependency Analysis:
Phase 1 Optimized Order: tile_a = load(input_a) # tile_a first (needed by if body)
tile_c = load(input_c) # tile_c can cross if (no dependency)
result = add(tile_c, tile_c) # result follows tile_c
if cond: # If node preserved (CF stability chain)
tile_b = add(tile_a, tile_a)Benefit: tile_c load and result computation moved before if → better pipelining and cross-pipe synchronization. MemRefCollectorCollects memory references from expressions to build dependency relationships. Analyzes reads/writes to detect:
GetStmtPipeExtracts pipeline type of statement:
Returns the pipeline where the statement executes. LiveCrossPipeEventsTracks cross-pipe event state during scheduling:
Key methods:
Event Semantics: Broadcast ModelHardware reality: Cross-pipe synchronization uses broadcast semantics:
Implementation:
Example: If producer P on MTE2 has 3 consumers on V:
The (producer, pair) event_id slot is freed when the first of these consumers is actually scheduled. Remaining consumers still keep the dependency relationship, but do not consume an event_id slot. Lifecycle: Consumer Role TrackingThis scheduler treats “first-consumer” as a dynamic concept:
Scheduling AlgorithmOverall Flow
Building Dependency GraphFor each statement, collect read/write memory references:
Build edges:
Mark each edge as cross-pipe or same-pipe based on pipeline types. Kahn + Resource ConstraintsEnhanced Kahn algorithm that respects event limits: First-Consumer Priority Optimization: To minimize peak event pressure, the scheduler prioritizes first-consumers:
Example benefit: Candidate Selection StrategiesSelection criteria (in priority order):
Fallback StrategyTry strategies in order:
InvariantsResource ConstraintEach pipeline pair Invariant verification:
State ConsistencyInternal bookkeeping stays consistent:
Topological OrderOutput satisfies all dependencies (RAW/WAW/WAR). Guaranteed by Kahn algorithm: only schedules statements with indegree 0. ExampleInput CodeA = compute_on_M(...) # Pipeline M
B = compute_on_V(A) # Pipeline V, depends on A (cross-pipe)
C = compute_on_M(...) # Pipeline M
D = compute_on_V(C) # Pipeline V, depends on C (cross-pipe)
E = compute_on_V(B, D) # Pipeline V, depends on B and DDependency GraphCross-pipe edges: A→B, C→D Original ScheduleOrder: A → B → C → D → E
Peak M→V events: 1 Optimized ScheduleOrder: A → C → B → D → E
Peak M→V events: 2 Benefit: Pipeline M operations batched together (A, C), then pipeline V operations (B, D, E). Reduces pipeline switches and improves instruction-level parallelism, even though peak event pressure slightly increases. Complexity
Limitations
Future Work (Phase 2+)Path-sensitive analysis:
Loop-invariant code motion (LICM):
Inter-procedural analysis:
DebuggingEnable debug logs to track:
Verify:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rename file to
00-out-of-order-schedule.md - Make it shorter, each docs should be around 200-300 lines of markdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll make the changes.
| } | ||
| }; | ||
|
|
||
| auto Better = [](const CandidateScore& a, const CandidateScore& b, PickStrategy strategy) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ask AI to reorganize code. The current version contains one large function, which is hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll refactor that function.
…imization This pass performs dependency-preserving reordering of statements within reorderable segments to reduce peak live cross-pipe dependencies (max 8 events), intended to run before InsertSyncPass to avoid hardware event_id exhaustion. Key features: - Kahn-based topological scheduling with resource constraints - Multiple heuristic strategies to avoid greedy dead-ends - Graceful fallback when strict limits cannot be satisfied - Comprehensive Python bindings and test suite - Integration into PassManager's XPlatform optimization strategy
Extract large ScheduleSegment function (~260 lines) into focused helper functions for better readability and maintainability: - GetMemRefs: Extract memory references from expressions - BuildDependencyGraph: Build RAW/WAW/WAR dependency edges - BuildAdjacencyLists: Create successor lists and indegree arrays - IsBetterCandidate: Compare candidates using selection strategies - RunKahnScheduling: Kahn topological sort with resource constraints - FindFeasibleSchedule: Multi-strategy scheduling with fallback ScheduleSegment now serves as a clean 40-line orchestrator showing the three main steps: extract pipe types, build dependencies, find schedule. Also rename OutOfOrderSchedulerPass.md to 00-out-of-order-schedule.md to follow documentation naming convention.
…d broadcast event model Add Phase 1 control flow node support that treats IfStmt/ForStmt as immovable black-box composite nodes in the dependency graph. This enables compute statements to be reordered across control flow boundaries when data dependencies allow. Key improvements: - Introduce StmtEffect for conservative side-effect analysis of CF nodes - Implement broadcast event semantics (one event_id per producer-pair, not per edge) matching hardware reality and InsertSyncPass behavior - Add first-consumer priority optimization to minimize peak event pressure - Add CF stability chain to preserve relative order of control flow nodes - Refactor LiveCrossPipeEvents tracking with pending_successors and incoming_producers maps Documentation updates: - Add comprehensive Phase 1 design documentation - Document broadcast event model and consumer role tracking - Add cross-CF reordering examples and event lifecycle diagrams - Update limitations and add future work section Test improvements: - Add run_pass_with_ir_print helper for better test debugging - Refactor test organization and cleanup
- Remove separate header file, consolidate into source file - Update pass registration in passes.h and bindings - Synchronize Python bindings and type stubs - Update test implementation This refactoring simplifies the pass structure by moving the class definition into the implementation file, reducing header dependencies while maintaining all functionality.
…imization
This pass performs dependency-preserving reordering of statements within reorderable segments to reduce peak live cross-pipe dependencies (max 8 events), intended to run before InsertSyncPass to avoid hardware event_id exhaustion.
Key features: