Rolling Context for Claude Code

A transparent proxy that gives Claude Code rolling context compression — old messages get automatically summarized while recent messages stay fully verbatim. You never hit the context wall, and you never lose important details.

Zero config. Uses your existing Claude Code auth. No API key needed. Just install and forget.

Claude Code's built-in /compact replaces your entire conversation with a lossy summary. After a few compactions, you're summarizing a summary of a summary. This plugin only compresses old messages — recent context stays untouched.

`/compact` vs Rolling Context

	`/compact` (built-in)	Rolling Context
What gets compressed	Everything	Only old messages
Recent context	Summarized (lossy)	Kept verbatim
When it runs	Manual or at threshold	Automatic, background
Latency impact	Blocks until done	Zero — async
After multiple compressions	Summary of summary of summary	Fresh rolling merge each time
Original transcript	Replaced	Preserved (JSONL unchanged)

How It Works

Claude Code  ──►  Rolling Context Proxy (:5588)  ──►  Anthropic API
                         │
                         ├─ context < 100K tokens? pass through unchanged
                         │
                         └─ context > 100K tokens?
                              1. summarize old messages with Haiku (background, async)
                              2. keep ~40K tokens of recent messages verbatim
                              3. inject compressed context on next request
                              4. never blocks, never adds latency

Instead of replacing everything, this plugin:

Keeps recent messages untouched — recent context stays verbatim
Only compresses when needed — triggers at 100K (real API token count), compresses old messages, grows naturally until next trigger
Merges summaries — each compression cycle merges with the previous summary, building a rolling timeline
Never blocks — compression runs in the background, applied on the next request
Full transcripts preserved — Claude Code still saves everything to JSONL in ~/.claude/projects/

Install

Option 1: Claude Code Plugin (recommended)

Run these two commands inside Claude Code:

/plugin marketplace add https://github.com/NodeNestor/nestor-plugins
/plugin install rolling-context

Restart your terminal and start a new Claude Code session. On the first start, the plugin configures ANTHROPIC_BASE_URL and starts the proxy. Since the env var only takes effect on the next terminal, restart your terminal once more — after that, everything works automatically. No pip install needed — pure Python stdlib.

Option 2: Manual install

Linux / macOS:

git clone https://github.com/NodeNestor/claude-rolling-context.git ~/claude-rolling-context
cd ~/claude-rolling-context
bash install.sh

Windows (PowerShell):

git clone https://github.com/NodeNestor/claude-rolling-context.git $HOME\claude-rolling-context
cd $HOME\claude-rolling-context
powershell -ExecutionPolicy Bypass -File install.ps1

The installer configures ANTHROPIC_BASE_URL and registers the plugin. Restart your terminal and you're done. Requires Python 3.7+ (no pip install needed — pure stdlib).

How Compression Works

When the message array exceeds the trigger threshold:

BEFORE (hit 100K trigger):
  [msg1] [msg2] [msg3] ... [msg60] [msg61] ... [msg100]
  |<——————————————— ~105K tokens ——————————————>|

AFTER (compressed):
  [rolling summary] [ack] [msg61] ... [msg100]
  |<— ~5K summary —>|    |<—— verbatim ————————>|

NEXT CYCLE (grows back to 100K, triggers again):
  [rolling summary] [ack] [msg61] ... [msg140]
  |<——————————————— ~105K tokens ——————————————>|
  → new summary merges old summary + msg61-msg100
  → keeps msg101-msg140 verbatim

The summary preserves a structured record of everything that happened:

Active Goal — what the user is currently asking for, constraints, do/don't rules
Previous Goals — completed or shifted-away-from goals (kept brief)
Timeline — chronological numbered steps: every file change, decision, error, and user instruction
Current State — what's done, in progress, and next
Key Details — file paths, configs, decisions that must survive compression

Goals evolve naturally across rolling compressions — the latest request stays prominent while completed goals move to the previous section. User instructions are never lost.

Architecture

The proxy is fully stateless — no sessions, no databases, no tracking. It works by hashing message content:

When a response comes back from the API with a high token count, the proxy compresses the messages and stores the result keyed by content hashes
On the next request, it hashes the incoming messages and checks for a matching compression
If found, it swaps in the compressed version transparently

This means:

Multiple conversations work automatically — each conversation has unique content, unique hashes, no collision
Subagents and branches just work — the proxy doesn't care about sessions, only content
No state to corrupt — restart the proxy anytime, worst case is one extra compression cycle
Claude Code sees nothing different — the proxy is invisible, JSONL transcripts are unmodified

Configuration

All settings via environment variables (all optional — defaults work great):

Variable	Default	Description
`ROLLING_CONTEXT_TRIGGER`	`100000`	Compress when context exceeds this many tokens
`ROLLING_CONTEXT_TARGET`	`40000`	Keep this many tokens of recent messages after compression
`ROLLING_CONTEXT_MODEL`	`claude-haiku-4-5-20251001`	Model used for summarization
`ROLLING_CONTEXT_PORT`	`5588`	Proxy listen port
`ROLLING_CONTEXT_UPSTREAM`	`https://api.anthropic.com`	Upstream API URL (chain to another proxy!)
`ROLLING_CONTEXT_SUMMARIZER_URL`	`https://api.anthropic.com`	Custom endpoint for summarization (e.g. local vLLM)
`ROLLING_CONTEXT_SUMMARIZER_KEY`	(uses Claude Code auth)	API key for custom summarizer endpoint

Proxy Chaining

Already using another proxy (model router, API gateway, etc.)? Rolling Context auto-detects this and chains through it:

Claude Code  ──►  Rolling Context (:5588)  ──►  Your Proxy  ──►  Anthropic API

If ANTHROPIC_BASE_URL is already set when you install, the plugin automatically saves it as ROLLING_CONTEXT_UPSTREAM and inserts itself in front. No manual config needed.

You can also set it explicitly:

export ROLLING_CONTEXT_UPSTREAM=http://localhost:8080  # your existing proxy

Health Check

curl http://127.0.0.1:5588/health

Returns compression stats: how many compressions, tokens saved, etc.

Debug

curl http://127.0.0.1:5588/debug/compressions

Returns the stored compression entries with their full summary content — useful for verifying what the rolling summary captured and whether user goals/instructions survived compression.

Uninstall

Run the uninstall script — it handles both manual and marketplace installs, stops the proxy, cleans env vars, and removes all plugin registrations.

Linux / macOS:

cd ~/claude-rolling-context && bash uninstall.sh

Windows (PowerShell):

cd $HOME\claude-rolling-context; powershell -ExecutionPolicy Bypass -File uninstall.ps1

If you installed via marketplace and already deleted the repo, you can run it from the cache:

cd $HOME\.claude\plugins\cache\rolling-context-marketplace\rolling-context\1.0.0
powershell -ExecutionPolicy Bypass -File uninstall.ps1

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.claude-plugin		.claude-plugin
hooks		hooks
proxy		proxy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.e2e.yml		docker-compose.e2e.yml
install.ps1		install.ps1
install.sh		install.sh
uninstall.ps1		uninstall.ps1
uninstall.sh		uninstall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rolling Context for Claude Code

`/compact` vs Rolling Context

How It Works

Install

Option 1: Claude Code Plugin (recommended)

Option 2: Manual install

How Compression Works

Architecture

Configuration

Proxy Chaining

Health Check

Debug

Uninstall

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rolling Context for Claude Code

/compact vs Rolling Context

How It Works

Install

Option 1: Claude Code Plugin (recommended)

Option 2: Manual install

How Compression Works

Architecture

Configuration

Proxy Chaining

Health Check

Debug

Uninstall

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/compact` vs Rolling Context

Packages