Skip to content

feat: auto-detect 1M context window and scale guard thresholds#20

Open
ynaamane wants to merge 1 commit intoRuya-AI:mainfrom
ynaamane:feat/1m-context-window-support
Open

feat: auto-detect 1M context window and scale guard thresholds#20
ynaamane wants to merge 1 commit intoRuya-AI:mainfrom
ynaamane:feat/1m-context-window-support

Conversation

@ynaamane
Copy link

@ynaamane ynaamane commented Mar 13, 2026

Summary

Claude Code now appends [1m] to model IDs when using extended 1M context (e.g. claude-opus-4-6[1m]). Cozempic v1.0.0 hardcodes all models to 200K in MODEL_CONTEXT_WINDOWS, causing:

  • Premature guard pruning: hard threshold fires at 150K tokens (75% of 200K) instead of 750K (75% of 1M)
  • Incorrect diagnostics: cozempic current --diagnose shows 50% context used when it's actually 10%
  • Aggressive overflow recovery: danger threshold of 90MB is too low for 1M sessions that can legitimately reach 200-400MB

Changes

  • tokens.py: Add [1m]-suffixed entries to MODEL_CONTEXT_WINDOWS (1M context). Update detect_context_window() with bracket-aware prefix matching for versioned model IDs (e.g. claude-opus-4-6-20260301[1m])
  • init.py: Update SessionStart hook template to extract context_window.context_window_size from Claude Code stdin JSON and pass --context-window dynamically to guard daemon
  • guard.py: Always detect context window upfront (used for display + overflow scaling). Display detected context window in guard banner. Scale overflow danger_threshold_mb and danger_threshold_tokens based on context window
  • tests/test_model_detection.py: Add 15 new tests covering 1M exact match, versioned prefix match, 200K/1M non-confusion, env override priority, threshold scaling, and context % calculations

Backward compatibility

  • Zero breaking changes: 200K models keep their exact same thresholds and behavior
  • COZEMPIC_CONTEXT_WINDOW env var override still takes highest priority
  • Old hooks (without --context-window) fall back to model-based detection from session JSONL
  • Users with existing hooks can re-run cozempic init to get the updated template

Before/After

Metric Before (200K assumed) After (1M detected)
Hard token threshold 150,000 750,000
Soft token threshold 90,000 450,000
Context % for 100K tokens 50% 10%
Overflow danger (MB) 90 90 (scaled to ~1.8x hard MB threshold)
Overflow danger (tokens) not set 900,000 (90% of context)

Test plan

  • All 89 existing tests pass
  • 15 new tests pass (1M detection, threshold scaling, edge cases)
  • Manual: cozempic current --diagnose with 1M model shows correct context window
  • Manual: Guard banner shows Context: 1.0M on 1M sessions
  • Manual: Hook extraction test: echo '{"session_id":"test","context_window":{"context_window_size":1000000}}' | bash -c '<hook_command>'

Claude Code appends [1m] to model IDs when using extended 1M context.
Cozempic was hardcoding all models to 200K, causing premature pruning
(guard firing at 150K tokens instead of 750K) and incorrect context %
in diagnostics.

Changes:
- Add [1m]-suffixed entries to MODEL_CONTEXT_WINDOWS (1M for all models)
- Update detect_context_window() with bracket-aware prefix matching
- Update SessionStart hook to extract context_window_size from Claude Code
  stdin JSON and pass --context-window dynamically to guard daemon
- Display detected context window in guard banner
- Scale overflow danger thresholds based on detected context window
- Add 15 new tests for 1M detection, threshold scaling, and edge cases

Backward compatible: 200K models keep exact same behavior. Env var
COZEMPIC_CONTEXT_WINDOW override still takes priority.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant