Skip to content

add token_counting_strategy override for provider-aware token counting#843

Open
adilhafeez wants to merge 2 commits intomainfrom
adil/optional-token-counting
Open

add token_counting_strategy override for provider-aware token counting#843
adilhafeez wants to merge 2 commits intomainfrom
adil/optional-token-counting

Conversation

@adilhafeez
Copy link
Copy Markdown
Contributor

@adilhafeez adilhafeez commented Mar 23, 2026

add token_counting_strategy override (estimate|auto) for provider-aware token counting

use len/4 token estimate by default, make tiktoken opt-in via enable_token_counting. This takes out around ~80ms time from request,

.404    Brightstaff accepts connection, reuses idle conn to localhost:12001
.405    Envoy WASM filter receives request, resolves model, starts tokenization
.484    Tokenization complete (79ms to count 225 tokens — result unused, ratelimit skipped)
.484    Upstream transform — request sent to archfc.katanemo.dev:443
.557    Upstream response received (73ms round-trip, TLS session reused)
.560    Response processed, route=coding → openai/gpt-4o
.563    Response returned to client

@adilhafeez adilhafeez marked this pull request as ready for review March 23, 2026 04:48
By default, use cheap len/4 estimate for input token counting (metrics
and ratelimit). When enable_token_counting is set to true in overrides,
use tiktoken BPE for exact counts. This eliminates ~80ms of per-request
latency from tiktoken in the WASM filter while keeping metrics and
ratelimit functional.

Made-with: Cursor
@adilhafeez adilhafeez force-pushed the adil/optional-token-counting branch from a295a1b to e5f3039 Compare March 23, 2026 04:53
@adilhafeez adilhafeez changed the title make tiktoken token counting optional via enable_token_counting override add token_counting_strategy override (estimate|auto) for provider-aware token counting Mar 25, 2026
@adilhafeez adilhafeez changed the title add token_counting_strategy override (estimate|auto) for provider-aware token counting add token_counting_strategy override for provider-aware token counting Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant