-
Notifications
You must be signed in to change notification settings - Fork 3
Add scan (prefix sum) operations support #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This depends on #37 PR. |
This commit adds support for scan (parallel prefix sum) operations to cuTile, based on the IntegerReduce branch and commit 0c9ab90. Key changes: - Added encode_ScanOp! to bytecode encodings for generating ScanOp bytecode - Added encode_scan_identity_array! to reuse existing identity encoding - Added scan intrinsic implementation using operation_identity from IntegerReduce - Added scan() and cumsum() public APIs with proper 1-indexed to 0-indexed axis conversion - Added comprehensive codegen tests for scan operations - Added scankernel.jl example demonstrating CSDL scan algorithm Features: - Supports cumulative sum (cumsum) for float and integer types - Supports both forward and reverse scan directions - Reuses FloatIdentityOp and IntegerIdentityOp from IntegerReduce - Uses operation_identity function for cleaner identity value creation - 1-indexed axis parameter (consistent with reduce operations) - Preserves tile shape (scan is an element-wise operation along one dimension) Tests: - All 142 codegen tests pass (including 6 new scan tests) - Scankernel.jl example runs successfully with CSDL algorithm - Clarify that it demonstrates device-side scan operation - Add note that test might occasionally fail (race condition in phase 2 loop) Minor comment improvements in scankernel.jl example - Clarify that it demonstrates device-side scan operation - Add note that test might occasionally fail (race condition in phase 2 loop)
3731b87 to
368e6c5
Compare
|
This will fail. Still uses IdentityOp. |
|
@maleadt Thanks. |
|
I moved the example into the test suite, if you don't mind. I'd rather keep the |
Completely agree. I updated comments too. |
This commit adds support for scan (parallel prefix sum) operations to cuTile,
based on the IntegerReduce branch and commit 0c9ab90.
Key changes:
- Added scankernel.jl example demonstrating two pass chained scan algorithmFeatures:
- 1-indexed axis parameter (consistent with reduce operations)Tests: