Skip to content

NodeNestor/claude-rolling-context

Repository files navigation

Rolling Context for Claude Code

MIT License Python 3.7+ Zero Dependencies

A transparent proxy that gives Claude Code rolling context compression — old messages get automatically summarized while recent messages stay fully verbatim. You never hit the context wall, and you never lose important details.

Zero config. Uses your existing Claude Code auth. No API key needed. Just install and forget.

Claude Code's built-in /compact replaces your entire conversation with a lossy summary. After a few compactions, you're summarizing a summary of a summary. This plugin only compresses old messages — recent context stays untouched.

/compact vs Rolling Context

/compact (built-in) Rolling Context
What gets compressed Everything Only old messages
Recent context Summarized (lossy) Kept verbatim
When it runs Manual or at threshold Automatic, background
Latency impact Blocks until done Zero — async
After multiple compressions Summary of summary of summary Fresh rolling merge each time
Original transcript Replaced Preserved (JSONL unchanged)

How It Works

Claude Code  ──►  Rolling Context Proxy (:5588)  ──►  Anthropic API
                         │
                         ├─ context < 100K tokens? pass through unchanged
                         │
                         └─ context > 100K tokens?
                              1. summarize old messages with Haiku (background, async)
                              2. keep ~40K tokens of recent messages verbatim
                              3. inject compressed context on next request
                              4. never blocks, never adds latency

Instead of replacing everything, this plugin:

  1. Keeps recent messages untouched — recent context stays verbatim
  2. Only compresses when needed — triggers at 100K (real API token count), compresses old messages, grows naturally until next trigger
  3. Merges summaries — each compression cycle merges with the previous summary, building a rolling timeline
  4. Never blocks — compression runs in the background, applied on the next request
  5. Full transcripts preserved — Claude Code still saves everything to JSONL in ~/.claude/projects/

Install

Option 1: Claude Code Plugin (recommended)

Run these two commands inside Claude Code:

/plugin marketplace add https://github.com/NodeNestor/nestor-plugins
/plugin install rolling-context

Restart your terminal and start a new Claude Code session. On the first start, the plugin configures ANTHROPIC_BASE_URL and starts the proxy. Since the env var only takes effect on the next terminal, restart your terminal once more — after that, everything works automatically. No pip install needed — pure Python stdlib.

Option 2: Manual install

Linux / macOS:

git clone https://github.com/NodeNestor/claude-rolling-context.git ~/claude-rolling-context
cd ~/claude-rolling-context
bash install.sh

Windows (PowerShell):

git clone https://github.com/NodeNestor/claude-rolling-context.git $HOME\claude-rolling-context
cd $HOME\claude-rolling-context
powershell -ExecutionPolicy Bypass -File install.ps1

The installer configures ANTHROPIC_BASE_URL and registers the plugin. Restart your terminal and you're done. Requires Python 3.7+ (no pip install needed — pure stdlib).

How Compression Works

When the message array exceeds the trigger threshold:

BEFORE (hit 100K trigger):
  [msg1] [msg2] [msg3] ... [msg60] [msg61] ... [msg100]
  |<——————————————— ~105K tokens ——————————————>|

AFTER (compressed):
  [rolling summary] [ack] [msg61] ... [msg100]
  |<— ~5K summary —>|    |<—— verbatim ————————>|

NEXT CYCLE (grows back to 100K, triggers again):
  [rolling summary] [ack] [msg61] ... [msg140]
  |<——————————————— ~105K tokens ——————————————>|
  → new summary merges old summary + msg61-msg100
  → keeps msg101-msg140 verbatim

The summary preserves a structured record of everything that happened:

  • Active Goal — what the user is currently asking for, constraints, do/don't rules
  • Previous Goals — completed or shifted-away-from goals (kept brief)
  • Timeline — chronological numbered steps: every file change, decision, error, and user instruction
  • Current State — what's done, in progress, and next
  • Key Details — file paths, configs, decisions that must survive compression

Goals evolve naturally across rolling compressions — the latest request stays prominent while completed goals move to the previous section. User instructions are never lost.

Architecture

The proxy is fully stateless — no sessions, no databases, no tracking. It works by hashing message content:

  1. When a response comes back from the API with a high token count, the proxy compresses the messages and stores the result keyed by content hashes
  2. On the next request, it hashes the incoming messages and checks for a matching compression
  3. If found, it swaps in the compressed version transparently

This means:

  • Multiple conversations work automatically — each conversation has unique content, unique hashes, no collision
  • Subagents and branches just work — the proxy doesn't care about sessions, only content
  • No state to corrupt — restart the proxy anytime, worst case is one extra compression cycle
  • Claude Code sees nothing different — the proxy is invisible, JSONL transcripts are unmodified

Configuration

All settings via environment variables (all optional — defaults work great):

Variable Default Description
ROLLING_CONTEXT_TRIGGER 100000 Compress when context exceeds this many tokens
ROLLING_CONTEXT_TARGET 40000 Keep this many tokens of recent messages after compression
ROLLING_CONTEXT_MODEL claude-haiku-4-5-20251001 Model used for summarization
ROLLING_CONTEXT_PORT 5588 Proxy listen port
ROLLING_CONTEXT_UPSTREAM https://api.anthropic.com Upstream API URL (chain to another proxy!)
ROLLING_CONTEXT_SUMMARIZER_URL https://api.anthropic.com Custom endpoint for summarization (e.g. local vLLM)
ROLLING_CONTEXT_SUMMARIZER_KEY (uses Claude Code auth) API key for custom summarizer endpoint

Proxy Chaining

Already using another proxy (model router, API gateway, etc.)? Rolling Context auto-detects this and chains through it:

Claude Code  ──►  Rolling Context (:5588)  ──►  Your Proxy  ──►  Anthropic API

If ANTHROPIC_BASE_URL is already set when you install, the plugin automatically saves it as ROLLING_CONTEXT_UPSTREAM and inserts itself in front. No manual config needed.

You can also set it explicitly:

export ROLLING_CONTEXT_UPSTREAM=http://localhost:8080  # your existing proxy

Health Check

curl http://127.0.0.1:5588/health

Returns compression stats: how many compressions, tokens saved, etc.

Debug

curl http://127.0.0.1:5588/debug/compressions

Returns the stored compression entries with their full summary content — useful for verifying what the rolling summary captured and whether user goals/instructions survived compression.

Uninstall

Run the uninstall script — it handles both manual and marketplace installs, stops the proxy, cleans env vars, and removes all plugin registrations.

Linux / macOS:

cd ~/claude-rolling-context && bash uninstall.sh

Windows (PowerShell):

cd $HOME\claude-rolling-context; powershell -ExecutionPolicy Bypass -File uninstall.ps1

If you installed via marketplace and already deleted the repo, you can run it from the cache:

cd $HOME\.claude\plugins\cache\rolling-context-marketplace\rolling-context\1.0.0
powershell -ExecutionPolicy Bypass -File uninstall.ps1

License

MIT

About

Rolling context compression for Claude Code — never hit the context wall. Auto-compresses old messages while keeping recent context verbatim. Zero config, zero latency. Works as a Claude Code plugin.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors