Skip to content

Conversation

@HansPeterRadtke
Copy link

@HansPeterRadtke HansPeterRadtke commented Jan 9, 2026

Summary

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Related Issues

Relates to #ISSUE_ID
Discussion: LINK (if any)

Screenshots/Demos (for UX changes)

Before:

After:

@HansPeterRadtke HansPeterRadtke force-pushed the fix/compaction-counts-system-prompt-and-tools branch from 0574f19 to 2c2c438 Compare January 9, 2026 10:05
Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is how token counting works /cc @katzdave

Some(tokens) => (tokens as usize, "session metadata"),
Some(tokens) => {
// Session metadata only tracks message tokens, so we still need to
// add the system_prompt + tools overhead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is true; if you start a new session, we immediately hit 6.6K tokens. if that was just the first message and the reply, it would be way less

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly — that ~6.6K is the baseline overhead from system prompt + tools (and any default Goose framing). The previous compaction check only counted message tokens (or session metadata), so it could underestimate and trigger compaction too late. This change makes the check include that overhead so we decide correctly even at session start.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, so I don't think that is true. we get the tokens that were used by the provider and that includes the tools and the system prompt. so adding that again should not be needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think so either. We are using the provider's token count to determine if we need to compact.

So if we fail from context exceeded from the provider, on compaction we'll remove the system prompt and the compaction prompt is much shorter than that.

We have some other defense mechanisms too for large inputs (removing old tool responses),

@HansPeterRadtke HansPeterRadtke force-pushed the fix/compaction-counts-system-prompt-and-tools branch from 2c2c438 to 40c4784 Compare January 9, 2026 15:49
@katzdave
Copy link
Collaborator

Sorry for the delay was OOO, closing this. Auto-compact uses the counts returned from the provider.

@katzdave katzdave closed this Jan 20, 2026
@HansPeterRadtke
Copy link
Author

Why this fix is correct

Original code (BEFORE):

let (current_tokens, token_source) = match session.total_tokens {
Some(tokens) => (tokens as usize, "session metadata"),
None => {
let token_counts: Vec<_> = messages
.iter()
.filter(|m| m.is_agent_visible())
.map(|msg| token_counter.count_chat_tokens("", std::slice::from_ref(msg), &[]))
.collect();
(token_counts.iter().sum(), "estimated")
}
};

Problems:

  1. Some(tokens) - uses session tokens directly, no overhead added
  2. None - calls count_chat_tokens("", ..., &[]) with empty system_prompt and empty tools array

Fixed code (AFTER):

Some(tokens) => {
let overhead = token_counter.count_chat_tokens(system_prompt, &[], tools);
(tokens as usize + overhead, "session metadata + overhead")
}
None => {
let total_tokens = token_counter.count_chat_tokens(system_prompt, &agent_visible_messages, tools);
(total_tokens, "estimated with full context")
}

Why the objection is wrong:

"we get the tokens that were used by the provider and that includes the tools and the system prompt"

session.total_tokens stores tokens from the last response. But check_if_compaction_needed predicts if the next request will exceed context.

Next request = system_prompt + tools + messages

Example:

  • session.total_tokens = 5000
  • system_prompt + tools = 6600
  • Old code sees: 5000 tokens → no compaction
  • Reality: next request = 11600 tokens

This explains issue #5255 ("minimum message size is 8k") - the overhead alone fills small contexts but the old code never counted it.

@HansPeterRadtke
Copy link
Author

Timing issue: check_if_compaction_needed runs BEFORE the provider call:

  1. User sends new message
  2. check_if_compaction_needed() runs ← uses session.total_tokens from LAST request
  3. Provider call happens
  4. session.total_tokens updated with new count

So session.total_tokens is always ONE message behind - it doesn't include the new user message about to be sent.

But the real killer is the None case:

.map(|msg| token_counter.count_chat_tokens("", std::slice::from_ref(msg), &[]))
// ^^ ^^
// empty system_prompt empty tools

When is session.total_tokens None?

  • New session (first message)
  • Session restored from disk
  • Any state where provider hasn't responded yet

In these cases, the old code estimated tokens with zero overhead - that's clearly a bug regardless of what the provider returns later.

The statement "Auto-compact uses the counts returned from the provider" is only partially true, and completely ignores the fallback path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants