fix(file_manager): output-token truncation guards + append_file tool by Starlitnightly · Pull Request #55 · aristoteleo/PantheonOS

Starlitnightly · 2026-03-31T21:51:25Z

Summary

Merges dev into main. Contains the file manager truncation fix (from #52) plus comprehensive test coverage.

Changes

pantheon/toolsets/file/file_manager.py

write_file: hard-reject content > 12,000 chars; docstring teaches Two-Phase Write Protocol (scaffold → section fill → append)
append_file: new tool for chunked appending with 6,000-char limit; designed for BibTeX batches and section streaming
update_file: hard-reject new_string > 8,000 chars with guidance to split into smaller semantic units

pantheon/factory/templates/prompts/delegation.md

Failure recovery ladder: Two-Phase Write Protocol → format downgrade → inline output
Sub-agent failure recovery: retry narrower → self-execute → partial output
Hard rule: never terminate without producing at least one artifact

pantheon/factory/templates/teams/default.md

Scientific writing gate: MUST delegate research before writing reports/papers
Illustrator scope clarification: schematic diagrams vs data-driven charts

tests/test_file_manager.py (+10 new tests)

write_file: reject >12K, accept at limit, file not created on reject
append_file: basic append, multi-batch BibTeX pattern (5 batches), reject nonexistent, reject >6K, accept at limit
update_file: reject >8K new_string, accept at limit, original unchanged on reject
End-to-end Two-Phase Write Protocol: scaffold → section fill → bibliography append

Root Cause

When an LLM generates a large file (LaTeX paper, BibTeX bibliography) in a single write_file call, the relay API truncates the output stream mid-JSON before the closing " or }, causing AnthropicException - Failed to parse tool call arguments: Unterminated string. This is an output-token limit on the relay, not a context-window issue.

Test Plan

14/14 file manager tests passing (4 existing + 10 new)
All existing tests unaffected (content sizes well under new limits)

Closes #52

Problem A (partial): Add MANDATORY scientific writing gate to default.md — Leader must delegate to Researcher before writing any domain paper. Clarify Scientific Illustrator scope (schematic/pathway diagrams only, not data plots). Problem C: Add Failure Recovery section to delegation.md — three-tier ladder for file write failures (Two-Phase Write Protocol → format downgrade → inline) and sub-agent failures (narrow retry → self-execute → partial output). Hard rule: never terminate without producing at least one artifact. Validated by experiment (2026-03-30): - Case 3 (SSR1/GWAS): Leader called 3x parallel Researcher before any content; Researchers produced 978 lines across 3 reports using Two-Phase Write Protocol - Case 0 (EC论文): Leader called 2x parallel Researcher; BibTeX built to 397 lines via append_file batches (vs. previous silent truncation at char 88); PDF artifact (117KB) delivered despite E2BIG and relay-API update_file errors New bugs discovered (tracked separately): - Relay API truncates update_file tool call args mid-generation (high severity) - think tool infinite loop at ~90K token context (medium severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… tool P0 bug: when LLM generates large files (LaTeX papers, BibTeX) in a single write_file/update_file call, the relay API truncates the output stream mid-JSON, causing 'Unterminated string' parse errors and silent data loss. Root cause: LLM output token limit is separate from context window. File content in tool call parameters must be generated as LLM output, hitting max_tokens before the JSON closes. LaTeX/BibTeX content with escape chars inflates token count ~1.5x. Changes: - write_file: hard reject content > 12,000 chars; docstring teaches Two-Phase Write Protocol (scaffold first, fill by section, append for lists/bib) - append_file: new tool for chunked appending; 6,000 char limit; requires file to exist first; primary use case is BibTeX batches (<=10 entries per call) - update_file: hard reject new_string > 8,000 chars with guidance to split section into smaller semantic units Validated against 20-case baseline (15% success rate before fix): - Case 1 (LaTeX review paper, previously FAIL): now generates full PDF with 44 references via append_file batches — confirmed in controlled re-run - Agent proactively adopted Two-Phase protocol after reading docstring (0 content_too_large rejections; protocol was followed before guard triggered) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… tool Cherry-picked from PR #52 (fix/file-manager-output-token-truncation). - write_file: hard-reject content > 12,000 chars with Two-Phase Write Protocol guidance - append_file: new tool for chunked appending with 6,000-char limit - update_file: hard-reject new_string > 8,000 chars - delegation.md: failure recovery ladder - default.md: scientific writing gate

Tests for PR #52 file manager changes: - write_file: reject >12K, accept at limit, file not created on reject - append_file: basic append, multi-batch (BibTeX pattern), reject nonexistent file, reject >6K, accept at limit - update_file: reject new_string >8K, accept at limit, original unchanged - Two-Phase Write Protocol end-to-end: scaffold → section fill → append 14/14 file manager tests passing.

…n-truncation fix(file_manager): add output-token truncation guards and append_file tool

…esholds Root cause fix: acompletion_litellm() never passed max_tokens (output) to litellm. Anthropic models default to 4096 output tokens, causing tool_use JSON to be truncated mid-generation when the model writes large file content. Fix: auto-detect model's max_output_tokens via litellm.get_model_info() and set it as kwargs["max_tokens"] when not already specified by model_params. With the root cause fixed, the tool-level size guards from PR #52 are now defense-in-depth (not the primary fix). Raised thresholds to match actual output capacity: - write_file: 12K → 40K chars - update_file: 8K → 30K chars - append_file: 6K → 20K chars Thresholds moved to class-level constants (WRITE_FILE_MAX_CHARS, etc.) for easy per-deployment tuning. Tests updated to reference constants instead of hardcoded values. 14/14 file manager tests passing.

zqbake · 2026-04-01T01:09:06Z

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

Starlitnightly · 2026-04-01T01:14:21Z

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue.

zqbake · 2026-04-01T01:19:12Z

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue.

Is it really worth adding a new tool just to fix the relay issue? How about merging the two tools instead?

Starlitnightly · 2026-04-01T01:41:27Z

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue.

Is it really worth adding a new tool just to fix the relay issue? How about merging the two tools instead?

I don't think we should merge it now, because it relies heavily on litellm, and we'll have to wait until weize finishes refactoring the API backend.

This reverts commit df28566, reversing changes made to 7920a72.

Refactor to replace litellm with provider catalog and SDK adapters

fix(file_manager): output-token truncation guards + append_file tool (clean cherry-pick from #55)

hazelian0619 and others added 6 commits March 30, 2026 23:10

Merge pull request #52 from hazelian0619/fix/file-manager-output-toke…

de39993

…n-truncation fix(file_manager): add output-token truncation guards and append_file tool

Starlitnightly added 3 commits April 2, 2026 19:54

Merge branch 'claw' into dev

df28566

Revert "Merge branch 'claw' into dev"

376ad64

This reverts commit df28566, reversing changes made to 7920a72.

Merge pull request #61 from aristoteleo/main

b624140

Refactor to replace litellm with provider catalog and SDK adapters

Starlitnightly mentioned this pull request Apr 4, 2026

fix(file_manager): output-token truncation guards + append_file tool (clean cherry-pick from #55) #64

Merged

3 tasks

Starlitnightly closed this Apr 4, 2026

zqbake added a commit that referenced this pull request Apr 4, 2026

Merge pull request #64 from aristoteleo/fix/file-manager-guards

059be3c

fix(file_manager): output-token truncation guards + append_file tool (clean cherry-pick from #55)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(file_manager): output-token truncation guards + append_file tool#55

fix(file_manager): output-token truncation guards + append_file tool#55
Starlitnightly wants to merge 9 commits intomainfrom
dev

Starlitnightly commented Mar 31, 2026

Uh oh!

zqbake commented Apr 1, 2026

Uh oh!

Starlitnightly commented Apr 1, 2026

Uh oh!

zqbake commented Apr 1, 2026

Uh oh!

Starlitnightly commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Starlitnightly commented Mar 31, 2026

Summary

Changes

Root Cause

Test Plan

Uh oh!

zqbake commented Apr 1, 2026

Uh oh!

Starlitnightly commented Apr 1, 2026

Uh oh!

zqbake commented Apr 1, 2026

Uh oh!

Starlitnightly commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants