fix(file_manager): output-token truncation guards + append_file tool#55
fix(file_manager): output-token truncation guards + append_file tool#55Starlitnightly wants to merge 9 commits intomainfrom
Conversation
Problem A (partial): Add MANDATORY scientific writing gate to default.md — Leader must delegate to Researcher before writing any domain paper. Clarify Scientific Illustrator scope (schematic/pathway diagrams only, not data plots). Problem C: Add Failure Recovery section to delegation.md — three-tier ladder for file write failures (Two-Phase Write Protocol → format downgrade → inline) and sub-agent failures (narrow retry → self-execute → partial output). Hard rule: never terminate without producing at least one artifact. Validated by experiment (2026-03-30): - Case 3 (SSR1/GWAS): Leader called 3x parallel Researcher before any content; Researchers produced 978 lines across 3 reports using Two-Phase Write Protocol - Case 0 (EC论文): Leader called 2x parallel Researcher; BibTeX built to 397 lines via append_file batches (vs. previous silent truncation at char 88); PDF artifact (117KB) delivered despite E2BIG and relay-API update_file errors New bugs discovered (tracked separately): - Relay API truncates update_file tool call args mid-generation (high severity) - think tool infinite loop at ~90K token context (medium severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tool P0 bug: when LLM generates large files (LaTeX papers, BibTeX) in a single write_file/update_file call, the relay API truncates the output stream mid-JSON, causing 'Unterminated string' parse errors and silent data loss. Root cause: LLM output token limit is separate from context window. File content in tool call parameters must be generated as LLM output, hitting max_tokens before the JSON closes. LaTeX/BibTeX content with escape chars inflates token count ~1.5x. Changes: - write_file: hard reject content > 12,000 chars; docstring teaches Two-Phase Write Protocol (scaffold first, fill by section, append for lists/bib) - append_file: new tool for chunked appending; 6,000 char limit; requires file to exist first; primary use case is BibTeX batches (<=10 entries per call) - update_file: hard reject new_string > 8,000 chars with guidance to split section into smaller semantic units Validated against 20-case baseline (15% success rate before fix): - Case 1 (LaTeX review paper, previously FAIL): now generates full PDF with 44 references via append_file batches — confirmed in controlled re-run - Agent proactively adopted Two-Phase protocol after reading docstring (0 content_too_large rejections; protocol was followed before guard triggered) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tool Cherry-picked from PR #52 (fix/file-manager-output-token-truncation). - write_file: hard-reject content > 12,000 chars with Two-Phase Write Protocol guidance - append_file: new tool for chunked appending with 6,000-char limit - update_file: hard-reject new_string > 8,000 chars - delegation.md: failure recovery ladder - default.md: scientific writing gate
Tests for PR #52 file manager changes: - write_file: reject >12K, accept at limit, file not created on reject - append_file: basic append, multi-batch (BibTeX pattern), reject nonexistent file, reject >6K, accept at limit - update_file: reject new_string >8K, accept at limit, original unchanged - Two-Phase Write Protocol end-to-end: scaffold → section fill → append 14/14 file manager tests passing.
…n-truncation fix(file_manager): add output-token truncation guards and append_file tool
…esholds Root cause fix: acompletion_litellm() never passed max_tokens (output) to litellm. Anthropic models default to 4096 output tokens, causing tool_use JSON to be truncated mid-generation when the model writes large file content. Fix: auto-detect model's max_output_tokens via litellm.get_model_info() and set it as kwargs["max_tokens"] when not already specified by model_params. With the root cause fixed, the tool-level size guards from PR #52 are now defense-in-depth (not the primary fix). Raised thresholds to match actual output capacity: - write_file: 12K → 40K chars - update_file: 8K → 30K chars - append_file: 6K → 20K chars Thresholds moved to class-level constants (WRITE_FILE_MAX_CHARS, etc.) for easy per-deployment tuning. Tests updated to reference constants instead of hardcoded values. 14/14 file manager tests passing.
|
Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated. |
i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue. |
Is it really worth adding a new tool just to fix the relay issue? How about merging the two tools instead? |
I don't think we should merge it now, because it relies heavily on litellm, and we'll have to wait until weize finishes refactoring the API backend. |
Refactor to replace litellm with provider catalog and SDK adapters
fix(file_manager): output-token truncation guards + append_file tool (clean cherry-pick from #55)
Summary
Merges
devintomain. Contains the file manager truncation fix (from #52) plus comprehensive test coverage.Changes
pantheon/toolsets/file/file_manager.pywrite_file: hard-reject content > 12,000 chars; docstring teaches Two-Phase Write Protocol (scaffold → section fill → append)append_file: new tool for chunked appending with 6,000-char limit; designed for BibTeX batches and section streamingupdate_file: hard-rejectnew_string> 8,000 chars with guidance to split into smaller semantic unitspantheon/factory/templates/prompts/delegation.mdpantheon/factory/templates/teams/default.mdtests/test_file_manager.py(+10 new tests)write_file: reject >12K, accept at limit, file not created on rejectappend_file: basic append, multi-batch BibTeX pattern (5 batches), reject nonexistent, reject >6K, accept at limitupdate_file: reject >8Knew_string, accept at limit, original unchanged on rejectRoot Cause
When an LLM generates a large file (LaTeX paper, BibTeX bibliography) in a single
write_filecall, the relay API truncates the output stream mid-JSON before the closing"or}, causingAnthropicException - Failed to parse tool call arguments: Unterminated string. This is an output-token limit on the relay, not a context-window issue.Test Plan
Closes #52