Skip to content

Commit 8fbbcc3

Browse files
committed
fixes readme
1 parent 0785ffc commit 8fbbcc3

8 files changed

Lines changed: 123 additions & 82 deletions

File tree

agent/context/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@ Both [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/compaction
3737

3838
- **Keep summaries intact** — every summarization pass discards detail. We never re-summarize existing summaries; they're preserved as-is and new summaries are added alongside them ([LangChain `moving_summary_buffer`](https://langchain-doc.readthedocs.io/en/latest/modules/memory/types/summary_buffer.html)).
3939
- **Keep recent turns verbatim** — the most recent exchanges carry the highest signal. Summarize the older prefix, never the tail ([Anthropic](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents), [LangChain summary-buffer](https://langchain-doc.readthedocs.io/en/latest/modules/memory/types/summary_buffer.html)). Anthropic's Claude Code uses the same shape: compressed context + the N most recently accessed items ([source](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)).
40-
- **Use cost as a secondary signal** — large cache write costs indicate a bloated context even if token counts look fine. We use cost thresholds alongside token thresholds ([Anthropic prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)).
41-
- **Compact before hitting the hard limit** — model recall drops well before the hard context window. Compact proactively at a soft threshold, not at the edge ([OpenAI cookbook](https://cookbook.openai.com/examples/context_summarization_with_realtime_api), [Anthropic post on compaction](https://docs.anthropic.com/en/docs/build-with-claude/compaction), [Anthropic post on context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)).
40+
- **Compact before hitting the hard limit** — model recall drops well before the hard context window. We compact at a configurable soft token limit (defaults to 60% of the context window), matching the industry-standard approach used by [Claude Code](https://docs.anthropic.com/en/docs/build-with-claude/compaction), [OpenAI](https://developers.openai.com/api/docs/guides/context-management), and [LangChain](https://langchain-doc.readthedocs.io/en/latest/modules/memory/types/summary_buffer.html).
4241
- **Don't summarize old tool call results** — raw tool output is useful when fresh but redundant once acted on. Clearing old results is the lightest-touch compaction step ([Anthropic](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)).
42+
- **Compress tool results in-place** — no matter if we compact the messages or not we do run tool result compression on it which truncates tool results older than the last 3 user turns.
4343

4444
## What you should know
4545

@@ -49,7 +49,7 @@ Both [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/compaction
4949

5050
When you call `compact()`:
5151

52-
1. **`checkLimit()`** — checks thresholds (cache write cost, cache write tokens, soft token limit). First match fires a `reached: true`. This mirrors Anthropic's token-threshold trigger and OpenAI's `compact_threshold`.
52+
1. **`checkLimit()`** — checks whether context tokens exceed the soft token limit (defaults to 60% of the context window). Returns `reached: true` when the limit is crossed.
5353
2. **`split()`** — finds the boundary between "prefix to summarize" and "tail to preserve." We try to keep 5 recent user turns, then 3, then 1, each checked against a token budget (40% of context window). Falls back to the largest token-bounded suffix that fits. Token-bounded retention follows the LangChain `max_token_limit` pattern.
5454
3. **`summarize()`** — sends the messages to compact to the LLM with a summarization prompt. Custom instructions replace the default prompt (when `instructions.strategy` is set to `replace`), matching Anthropic's `instructions` parameter behavior.
5555
4. **Reassemble** — We put together the messages to preserve from `split()` and the `summary` for a new array of messages.
@@ -58,7 +58,7 @@ When you call `compact()`:
5858

5959
```ts
6060
import type { Model, Api } from '@mariozechner/pi-ai';
61-
import { checkLimit, split, summarize } from '@kvendrik/compact';
61+
import { checkLimit, split, summarize, compressToolResults } from '@kvendrik/compact';
6262

6363
const model: {model: Model<Api>, key: string} = {...};
6464
const messages: AgentMessage[] = [];
@@ -79,5 +79,5 @@ if (limit.reached) {
7979
return [...compacted, ...preserve];
8080
}
8181

82-
return messages;
82+
return compressToolResults(messages);
8383
```

agent/context/checkLimit.ts

Lines changed: 2 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,16 @@
11
import type { Model, Api } from '@mariozechner/pi-ai';
22
import type { AgentMessage } from '@mariozechner/pi-agent-core';
3-
import { getLatestAssistantUsage, usage } from './usage';
3+
import { usage } from './usage';
44

55
export interface Limits {
66
/** Percentage of the context window to compact at (default is 60%). Model recall degrades well
77
* before the hard limit, so we compact proactively.
88
* @default 60 */
99
softTokenLimit?: number;
10-
/** High cache-write cost (in USD) signals a bloated context even when token
11-
* counts look fine. Assumes Sonnet-class pricing (~$3/MTok input);
12-
* adjust if the default model tier changes significantly.
13-
* @default 0.05 */
14-
cacheWriteCostLimit?: number;
15-
/** If cache-write tokens exceed this share of the context window,
16-
* trigger compaction. Catches uncached context growth that the
17-
* token-based soft limit might miss on the first turn after a
18-
* cache bust.
19-
* @default 40 */
20-
cacheWriteTokenLimit?: number;
2110
}
2211

23-
export const DEFAULT_LIMITS: Required<Limits> = {
12+
const DEFAULT_LIMITS: Required<Limits> = {
2413
softTokenLimit: 60,
25-
cacheWriteCostLimit: 0.05,
26-
cacheWriteTokenLimit: 40,
2714
};
2815

2916
interface CheckOptions {
@@ -44,31 +31,7 @@ export function checkLimit(
4431
const currentUsage = usage(messages);
4532
const contextWindow = model.contextWindow;
4633
const softLimit = Math.floor(contextWindow * (limits.softTokenLimit / 100));
47-
48-
const latestTurnTokenLimit = Math.floor(
49-
contextWindow * (limits.cacheWriteTokenLimit / 100),
50-
);
51-
5234
const currentTokens = currentUsage.tokens.used;
53-
const latestUsage = getLatestAssistantUsage(messages);
54-
const latestTurnCacheWriteCost =
55-
currentUsage.cost.cacheWrite === 0
56-
? undefined
57-
: currentUsage.cost.cacheWrite;
58-
const latestTurnCacheWriteTokens = latestUsage?.cacheWrite ?? 0;
59-
60-
if (
61-
typeof latestTurnCacheWriteCost === 'number' &&
62-
latestTurnCacheWriteCost >= limits.cacheWriteCostLimit
63-
) {
64-
const reason = `Cache write cost $${latestTurnCacheWriteCost.toFixed(4)} exceeded $${limits.cacheWriteCostLimit.toFixed(2)} threshold`;
65-
return { reached: true, reason };
66-
}
67-
68-
if (latestTurnCacheWriteTokens >= latestTurnTokenLimit) {
69-
const reason = `Cache write tokens ${latestTurnCacheWriteTokens} exceeded ${latestTurnTokenLimit} token threshold`;
70-
return { reached: true, reason };
71-
}
7235

7336
if (currentTokens <= softLimit) {
7437
const reason = `Context is within budget (${currentTokens}/${softLimit} tokens, ${Math.round((currentTokens / softLimit) * 100)}%)`;

agent/context/compact.ts

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
import type { Model, Api } from '@mariozechner/pi-ai';
22
import type { AgentMessage } from '@mariozechner/pi-agent-core';
3-
import { checkLimit, type Limits, DEFAULT_LIMITS } from './checkLimit';
3+
import { checkLimit, type Limits } from './checkLimit';
44
import { split } from './split';
55
import { summarize, type Instructions } from './summarize';
6+
import { compressToolResults } from './compress';
67

78
export interface CompactResult {
89
messages: AgentMessage[];
@@ -20,7 +21,7 @@ interface CompactOptions {
2021

2122
export async function compact(
2223
messages: AgentMessage[],
23-
{ signal, model, instructions, force, limits }: CompactOptions,
24+
{ signal, model, instructions, force, limits }: CompactOptions
2425
): Promise<CompactResult> {
2526
const effectiveSignal = signal ?? new AbortController().signal;
2627

@@ -34,22 +35,23 @@ export async function compact(
3435

3536
const { reached, reason } = checkLimit(messages, {
3637
model: model.model,
37-
limits: {
38-
...DEFAULT_LIMITS,
39-
...limits,
40-
},
38+
limits,
4139
});
4240

4341
if (!reached) {
44-
return { messages, didCompact: false, reason };
42+
return {
43+
messages: compressToolResults(messages),
44+
didCompact: false,
45+
reason,
46+
};
4547
}
4648

4749
return doCompact(reason);
4850

4951
async function doCompact(trigger: string): Promise<CompactResult> {
5052
const { compact: messagesToCompact, preserve } = split(
5153
messages,
52-
model.model,
54+
model.model
5355
);
5456

5557
if (messagesToCompact === null) {
@@ -63,7 +65,7 @@ export async function compact(
6365
});
6466

6567
return {
66-
messages: [...compactedMessages, ...preserve],
68+
messages: [...compactedMessages, ...compressToolResults(preserve)],
6769
didCompact: true,
6870
reason: trigger,
6971
};

agent/context/compress.ts

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
import type { AgentMessage } from '@mariozechner/pi-agent-core';
2+
import type {
3+
TextContent,
4+
ImageContent,
5+
ToolResultMessage,
6+
} from '@mariozechner/pi-ai';
7+
8+
/** Number of recent user turns whose tool results are preserved verbatim. */
9+
const PRESERVE_RECENT_TURNS = 3;
10+
11+
/** Max characters kept per text block in a compressed tool result. */
12+
const MAX_COMPRESSED_CHARS = 200;
13+
14+
const TRUNCATION_MARKER = '\n[truncated]';
15+
16+
export function compressToolResults(
17+
messages: AgentMessage[],
18+
{
19+
preserveRecentTurns = PRESERVE_RECENT_TURNS,
20+
maxCompressedChars = MAX_COMPRESSED_CHARS,
21+
}: { preserveRecentTurns?: number; maxCompressedChars?: number } = {}
22+
): AgentMessage[] {
23+
const boundary = findPreserveBoundary(messages);
24+
25+
if (boundary === 0) {
26+
return messages;
27+
}
28+
29+
const result: AgentMessage[] = [];
30+
31+
for (let idx = 0; idx < messages.length; idx++) {
32+
const msg = messages[idx];
33+
if (idx < boundary && isToolResult(msg)) {
34+
result.push(compressedToolResult(msg));
35+
} else {
36+
result.push(msg);
37+
}
38+
}
39+
40+
return result;
41+
42+
/** Index of the Nth-from-last user message. Messages before this index are
43+
* eligible for compression. Returns 0 when the conversation is too short. */
44+
function findPreserveBoundary(messages: AgentMessage[]): number {
45+
let userTurnsSeen = 0;
46+
47+
for (let idx = messages.length - 1; idx >= 0; idx--) {
48+
if (messages[idx].role === 'user') {
49+
userTurnsSeen++;
50+
if (userTurnsSeen === preserveRecentTurns) {
51+
return idx;
52+
}
53+
}
54+
}
55+
56+
return 0;
57+
}
58+
59+
function compressedToolResult(msg: ToolResultMessage): AgentMessage {
60+
const compressed: (TextContent | ImageContent)[] = [];
61+
let hadImage = false;
62+
63+
for (const block of msg.content) {
64+
if (block.type === 'image') {
65+
hadImage = true;
66+
continue;
67+
}
68+
69+
if (block.text.length <= maxCompressedChars) {
70+
compressed.push(block);
71+
} else {
72+
compressed.push({
73+
type: 'text',
74+
text: block.text.slice(0, maxCompressedChars) + TRUNCATION_MARKER,
75+
});
76+
}
77+
}
78+
79+
if (hadImage) {
80+
compressed.push({ type: 'text', text: '[image omitted]' });
81+
}
82+
83+
if (compressed.length === 0) {
84+
compressed.push({ type: 'text', text: '[result omitted]' });
85+
}
86+
87+
return { ...msg, content: compressed };
88+
}
89+
}
90+
91+
function isToolResult(msg: AgentMessage): msg is ToolResultMessage {
92+
return msg.role === 'toolResult';
93+
}

agent/context/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ export { summarize } from './summarize';
33
export { compact, type CompactResult } from './compact';
44
export { split } from './split';
55
export { checkLimit } from './checkLimit';
6+
export { compressToolResults } from './compress';

agent/context/package.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
"name": "@kvendrik/compact",
33
"version": "0.1.0",
44
"description": "A context compaction toolkit for pi-ai",
5-
"module": "src/index.ts",
5+
"module": "./index.ts",
66
"type": "module",
77
"license": "MIT",
88
"files": [
9-
"src",
9+
"*.ts",
1010
"README.md"
1111
],
1212
"repository": {
@@ -22,7 +22,7 @@
2222
],
2323
"scripts": {
2424
"test": "bun test",
25-
"lint": "eslint src/",
25+
"lint": "eslint .",
2626
"typecheck": "tsc --noEmit",
2727
"test:all": "bun test && bun lint && bun typecheck"
2828
},

agent/context/summarize.ts

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { completeSimple, type Model, type Api } from '@mariozechner/pi-ai';
22
import type { AgentMessage } from '@mariozechner/pi-agent-core';
3+
import { compressToolResults } from './compress';
34

45
const SUMMARIZE_SYSTEM = `You are a summarizer. Given a conversation history, produce a concise summary that preserves key facts, decisions, topics, and context needed to continue the conversation. Output only the summary, no preamble.`;
56

@@ -61,24 +62,15 @@ export async function summarize(
6162
return [summaryMessage];
6263
}
6364

64-
function messagesToTranscript(messages: AgentMessage[]): string {
65+
function messagesToTranscript(allMessages: AgentMessage[]): string {
6566
const lines: string[] = [];
66-
for (const m of messages) {
67-
const msg = m as {
68-
role: string;
69-
content?: string | { type?: string; text?: string }[];
70-
};
71-
72-
if (msg.role === 'tool') {
73-
lines.push('tool: [tool output omitted]');
74-
continue;
75-
}
7667

68+
const messages = compressToolResults(allMessages);
69+
70+
for (const msg of messages) {
7771
const content = msg.content;
78-
if (content === undefined) {
79-
continue;
80-
}
8172
let text = '';
73+
8274
if (typeof content === 'string') {
8375
text = content;
8476
} else if (Array.isArray(content)) {
@@ -93,10 +85,12 @@ function messagesToTranscript(messages: AgentMessage[]): string {
9385
.map((b) => (b as { text: string }).text)
9486
.join('');
9587
}
88+
9689
if (text.trim() !== '') {
9790
lines.push(`${msg.role}: ${text.trim()}`);
9891
}
9992
}
93+
10094
return lines.join('\n\n');
10195
}
10296

agent/context/usage.ts

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -45,18 +45,6 @@ function hasMeaningfulUsage(msgUsage: UsageWithCost): boolean {
4545
);
4646
}
4747

48-
export function getLatestAssistantUsage(
49-
messages: AgentMessage[],
50-
): UsageWithCost | null {
51-
for (let i = messages.length - 1; i >= 0; i--) {
52-
const message = messages[i];
53-
if (hasUsage(message)) {
54-
return message.usage;
55-
}
56-
}
57-
58-
return null;
59-
}
6048

6149
function getLatestAssistantMessageWithUsage(
6250
messages: AgentMessage[],

0 commit comments

Comments
 (0)