Skip to content

Conversation

@tpn
Copy link
Contributor

@tpn tpn commented Jan 20, 2026

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 20, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This really needs some more explicit motivation in the PR description, as well as some real justification for all the duplicated tooling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure what the purpose of this file is beyond what's happening in the existing test_kernel_launch.py benchmarks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of this can be done with the existing pytest-benchmark plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all or most of this functionality can be done with the existing pytest-benchmark plugin.

Really would like to avoid duplicating functionality, especially if it's AI generated duplication.

- `bench-launch-overhead`
- `bench`
- `benchcmp`
- `bench-against`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this script doesn't do a three way comparison, it also doesn't require writing any new code to run it.

Can we try to reuse bench-against instead of reinventing a lot of what that already does?

Comment on lines +84 to +100
def some_kernel_1():
return

@cuda.jit("void(float32[:])")
def some_kernel_2(arr1):
return

@cuda.jit("void(float32[:],float32[:])")
def some_kernel_3(arr1, arr2):
return

@cuda.jit("void(float32[:],float32[:],float32[:])")
def some_kernel_4(arr1, arr2, arr3):
return

@cuda.jit("void(float32[:],float32[:],float32[:],float32[:])")
def some_kernel_5(arr1, arr2, arr3, arr4):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are nearly identical to the existing benchmarks. Let's avoid repeating existing benchmarks and tools that run them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants