-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fix auto-compaction token counting to include system prompt and tools #6411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix auto-compaction token counting to include system prompt and tools #6411
Conversation
0574f19 to
2c2c438
Compare
DOsinga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is how token counting works /cc @katzdave
| Some(tokens) => (tokens as usize, "session metadata"), | ||
| Some(tokens) => { | ||
| // Session metadata only tracks message tokens, so we still need to | ||
| // add the system_prompt + tools overhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is true; if you start a new session, we immediately hit 6.6K tokens. if that was just the first message and the reply, it would be way less
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly — that ~6.6K is the baseline overhead from system prompt + tools (and any default Goose framing). The previous compaction check only counted message tokens (or session metadata), so it could underestimate and trigger compaction too late. This change makes the check include that overhead so we decide correctly even at session start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, so I don't think that is true. we get the tokens that were used by the provider and that includes the tools and the system prompt. so adding that again should not be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't think so either. We are using the provider's token count to determine if we need to compact.
So if we fail from context exceeded from the provider, on compaction we'll remove the system prompt and the compaction prompt is much shorter than that.
We have some other defense mechanisms too for large inputs (removing old tool responses),
Signed-off-by: Your Name <you@example.com>
2c2c438 to
40c4784
Compare
|
Sorry for the delay was OOO, closing this. Auto-compact uses the counts returned from the provider. |
|
Why this fix is correct Original code (BEFORE): let (current_tokens, token_source) = match session.total_tokens { Problems:
Fixed code (AFTER): Some(tokens) => { Why the objection is wrong: "we get the tokens that were used by the provider and that includes the tools and the system prompt" session.total_tokens stores tokens from the last response. But check_if_compaction_needed predicts if the next request will exceed context. Next request = system_prompt + tools + messages Example:
This explains issue #5255 ("minimum message size is 8k") - the overhead alone fills small contexts but the old code never counted it. |
|
Timing issue: check_if_compaction_needed runs BEFORE the provider call:
So session.total_tokens is always ONE message behind - it doesn't include the new user message about to be sent. But the real killer is the None case: .map(|msg| token_counter.count_chat_tokens("", std::slice::from_ref(msg), &[])) When is session.total_tokens None?
In these cases, the old code estimated tokens with zero overhead - that's clearly a bug regardless of what the provider returns later. The statement "Auto-compact uses the counts returned from the provider" is only partially true, and completely ignores the fallback path. |
Summary
Type of Change
AI Assistance
Testing
Related Issues
Relates to #ISSUE_ID
Discussion: LINK (if any)
Screenshots/Demos (for UX changes)
Before:
After: