GPU Plan: Fix Density Math, Add Semiring/Masking, Adopt Two-Phase SpGEMM, and Add UVM Tuning (suggestions from GPT5-pro) #55

rappdw · 2025-11-05T14:26:23Z

Summary

This PR updates the GPU implementation plans under .plans/gpu_implementation to address correctness and completeness issues identified in the initial draft. It corrects scaling assumptions, introduces formal semiring and masking abstractions, replaces the kernel sketch with a proper two-phase sparse matrix multiplication (SpGEMM) design, and adds unified memory (UVM) performance guidance.

Key Updates

Fixed scale arithmetic: Corrected the 1B×1B @ 0.1% density example (previously understated by ~10⁶×). The plan now frames scalability in terms of edges per node (O(n)) and provides realistic CSR memory budgets for billion-scale graphs.
Semiring and Mask abstractions: Introduces SemiringSpec and MaskSpec as first-class concepts to define the avos semiring and structural masks (e.g., upper-triangular).
Two-phase SpGEMM: Replaces the naïve nested-loop kernel with a standard symbolic → numeric design supporting deterministic (merge-based) and fast (hash-based) variants.
Deterministic mode: Adds a bit-exact path for validation, plus a fast path using atomics for production workloads.
Unified Memory guidance: Clarifies that UVM still requires cudaMemPrefetchAsync and cudaMemAdvise for performance on GH200/H100 systems.
Index width policy: Documents when to switch indptr to int64 for billion-scale workloads.
Testing and performance plans: Expands test coverage (property-based, adversarial, deterministic vs fast) and defines perf experiments including UVM on/off and cuSPARSE baselines.
Implementation phases: Refactors the roadmap to explicitly separate symbolic, numeric, and optimization phases.

Impact

No code changes yet—this PR revises the planning documents only. These updates make the GPU plan mathematically correct, technically credible, and aligned with state-of-the-art sparse GPU practice.

Next Steps

Implement deterministic symbolic/numeric kernels.
Add UVM prefetch/advice tuning hooks.
Integrate deterministic/fast mode toggling and performance CI.

- Created detailed planning documents for GPU acceleration - Updated for DGX Spark (Grace Hopper) with unified memory - Optimized for upper triangular matrices (50% savings) - Target: 1B×1B matrices at 0.1% density on single H100 - Timeline: 8-12 weeks with two-tier approach Key documents: - 00_UPDATED_CONTEXT.md: Impact analysis of DGX/triangular/billion-scale - IMPLEMENTATION_STRATEGY.md: Concrete implementation approach - QUICK_START.md: Quick reference guide - 01-06: Detailed architecture, kernels, API, testing, phases - EXECUTIVE_SUMMARY.md: High-level recommendations Changes from original plan: - Unified memory simplifies architecture (30% less code) - Triangular optimization for 50% performance gain - Single-GPU first (Tier 1), multi-GPU later if needed (Tier 2) - Same timeline but better outcome

…UVM tuning; deterministic mode; perf/tests updates

github-actions · 2025-11-05T14:27:31Z

Package	Line Rate	Health
.	68%	➖
core	92%	✔
reference	98%	✔
sparse	85%	✔
sparse.csgraph	83%	✔
types	100%	✔
util	70%	➖
Summary	83% (763 / 920)	✔

Minimum allowed line rate is 65%

github-actions · 2025-11-05T14:27:31Z

Package	Line Rate	Health
.	68%	➖
core	92%	✔
reference	98%	✔
sparse	85%	✔
sparse.csgraph	83%	✔
types	100%	✔
util	70%	➖
Summary	83% (763 / 920)	✔

Minimum allowed line rate is 65%

rappdw · 2025-11-05T14:28:56Z

From Sonnet 4.5 (Thinking)

Verdict

LGTM - Ready to Merge 🚀

This PR transforms the GPU implementation plan from a preliminary sketch into a production-ready technical specification. The math corrections alone justify the merge, and the formal abstractions (SemiringSpec, MaskSpec) will pay dividends during implementation and testing.

No blocking issues identified. The plan is technically sound, well-documented, and properly scoped.

…pdates)

rappdw added 2 commits November 3, 2025 22:59

docs(gpu): fix density math; add semiring+masking; two-phase SpGEMM; …

4d9b6cd

…UVM tuning; deterministic mode; perf/tests updates

rappdw merged commit a763d9f into master Nov 5, 2025
6 checks passed

rappdw deleted the feature/gpu-plan-update branch November 5, 2025 14:29

rappdw added a commit that referenced this pull request Nov 5, 2025

Merge master into gpu-implementation (resolve conflicts with PR #55 u…

9000952

…pdates)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Plan: Fix Density Math, Add Semiring/Masking, Adopt Two-Phase SpGEMM, and Add UVM Tuning (suggestions from GPT5-pro) #55

GPU Plan: Fix Density Math, Add Semiring/Masking, Adopt Two-Phase SpGEMM, and Add UVM Tuning (suggestions from GPT5-pro) #55

Uh oh!

rappdw commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

rappdw commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GPU Plan: Fix Density Math, Add Semiring/Masking, Adopt Two-Phase SpGEMM, and Add UVM Tuning (suggestions from GPT5-pro) #55

GPU Plan: Fix Density Math, Add Semiring/Masking, Adopt Two-Phase SpGEMM, and Add UVM Tuning (suggestions from GPT5-pro) #55

Uh oh!

Conversation

rappdw commented Nov 5, 2025

Summary

Key Updates

Impact

Next Steps

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

rappdw commented Nov 5, 2025

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants