Skip to content

Conversation

@xTimeCrystal
Copy link

Changed to use median time instead of min time, as unrealistic times such as 1000+ tokens/s were observed with specific torch.compile flags.

Tested using rwkv7-g0a-7.2b-20250829-ctx4096.pth, gains over baseline were when parameters > 1.5B for bsz=1.

Baseline (BlinkDL original but with median time):

Token/s = 69.58 (forward), 69.58 (full) || Bandwidth = 964.65 GB/s || 4.032s

This PR (median time):

Token/s = 82.79 (forward), 82.79 (full) || Bandwidth = 1147.71 GB/s || 92.123s <== ~90s compile time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant