Skip to content

fix(file_manager): output-token truncation guards + append_file tool#55

Closed
Starlitnightly wants to merge 9 commits intomainfrom
dev
Closed

fix(file_manager): output-token truncation guards + append_file tool#55
Starlitnightly wants to merge 9 commits intomainfrom
dev

Conversation

@Starlitnightly
Copy link
Copy Markdown
Collaborator

Summary

Merges dev into main. Contains the file manager truncation fix (from #52) plus comprehensive test coverage.

Changes

pantheon/toolsets/file/file_manager.py

  • write_file: hard-reject content > 12,000 chars; docstring teaches Two-Phase Write Protocol (scaffold → section fill → append)
  • append_file: new tool for chunked appending with 6,000-char limit; designed for BibTeX batches and section streaming
  • update_file: hard-reject new_string > 8,000 chars with guidance to split into smaller semantic units

pantheon/factory/templates/prompts/delegation.md

  • Failure recovery ladder: Two-Phase Write Protocol → format downgrade → inline output
  • Sub-agent failure recovery: retry narrower → self-execute → partial output
  • Hard rule: never terminate without producing at least one artifact

pantheon/factory/templates/teams/default.md

  • Scientific writing gate: MUST delegate research before writing reports/papers
  • Illustrator scope clarification: schematic diagrams vs data-driven charts

tests/test_file_manager.py (+10 new tests)

  • write_file: reject >12K, accept at limit, file not created on reject
  • append_file: basic append, multi-batch BibTeX pattern (5 batches), reject nonexistent, reject >6K, accept at limit
  • update_file: reject >8K new_string, accept at limit, original unchanged on reject
  • End-to-end Two-Phase Write Protocol: scaffold → section fill → bibliography append

Root Cause

When an LLM generates a large file (LaTeX paper, BibTeX bibliography) in a single write_file call, the relay API truncates the output stream mid-JSON before the closing " or }, causing AnthropicException - Failed to parse tool call arguments: Unterminated string. This is an output-token limit on the relay, not a context-window issue.

Test Plan

  • 14/14 file manager tests passing (4 existing + 10 new)
  • All existing tests unaffected (content sizes well under new limits)

Closes #52

hazelian0619 and others added 6 commits March 30, 2026 23:10
Problem A (partial): Add MANDATORY scientific writing gate to default.md —
Leader must delegate to Researcher before writing any domain paper. Clarify
Scientific Illustrator scope (schematic/pathway diagrams only, not data plots).

Problem C: Add Failure Recovery section to delegation.md — three-tier ladder
for file write failures (Two-Phase Write Protocol → format downgrade → inline)
and sub-agent failures (narrow retry → self-execute → partial output). Hard
rule: never terminate without producing at least one artifact.

Validated by experiment (2026-03-30):
- Case 3 (SSR1/GWAS): Leader called 3x parallel Researcher before any content;
  Researchers produced 978 lines across 3 reports using Two-Phase Write Protocol
- Case 0 (EC论文): Leader called 2x parallel Researcher; BibTeX built to 397
  lines via append_file batches (vs. previous silent truncation at char 88);
  PDF artifact (117KB) delivered despite E2BIG and relay-API update_file errors

New bugs discovered (tracked separately):
- Relay API truncates update_file tool call args mid-generation (high severity)
- think tool infinite loop at ~90K token context (medium severity)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tool

P0 bug: when LLM generates large files (LaTeX papers, BibTeX) in a single
write_file/update_file call, the relay API truncates the output stream mid-JSON,
causing 'Unterminated string' parse errors and silent data loss.

Root cause: LLM output token limit is separate from context window. File content
in tool call parameters must be generated as LLM output, hitting max_tokens before
the JSON closes. LaTeX/BibTeX content with escape chars inflates token count ~1.5x.

Changes:
- write_file: hard reject content > 12,000 chars; docstring teaches Two-Phase
  Write Protocol (scaffold first, fill by section, append for lists/bib)
- append_file: new tool for chunked appending; 6,000 char limit; requires file
  to exist first; primary use case is BibTeX batches (<=10 entries per call)
- update_file: hard reject new_string > 8,000 chars with guidance to split
  section into smaller semantic units

Validated against 20-case baseline (15% success rate before fix):
- Case 1 (LaTeX review paper, previously FAIL): now generates full PDF with
  44 references via append_file batches — confirmed in controlled re-run
- Agent proactively adopted Two-Phase protocol after reading docstring (0
  content_too_large rejections; protocol was followed before guard triggered)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tool

Cherry-picked from PR #52 (fix/file-manager-output-token-truncation).

- write_file: hard-reject content > 12,000 chars with Two-Phase Write Protocol guidance
- append_file: new tool for chunked appending with 6,000-char limit
- update_file: hard-reject new_string > 8,000 chars
- delegation.md: failure recovery ladder
- default.md: scientific writing gate
Tests for PR #52 file manager changes:
- write_file: reject >12K, accept at limit, file not created on reject
- append_file: basic append, multi-batch (BibTeX pattern), reject
  nonexistent file, reject >6K, accept at limit
- update_file: reject new_string >8K, accept at limit, original unchanged
- Two-Phase Write Protocol end-to-end: scaffold → section fill → append

14/14 file manager tests passing.
…n-truncation

fix(file_manager): add output-token truncation guards and append_file tool
…esholds

Root cause fix: acompletion_litellm() never passed max_tokens (output)
to litellm. Anthropic models default to 4096 output tokens, causing
tool_use JSON to be truncated mid-generation when the model writes
large file content.

Fix: auto-detect model's max_output_tokens via litellm.get_model_info()
and set it as kwargs["max_tokens"] when not already specified by
model_params.

With the root cause fixed, the tool-level size guards from PR #52 are
now defense-in-depth (not the primary fix). Raised thresholds to match
actual output capacity:
- write_file: 12K → 40K chars
- update_file: 8K → 30K chars
- append_file: 6K → 20K chars

Thresholds moved to class-level constants (WRITE_FILE_MAX_CHARS, etc.)
for easy per-deployment tuning. Tests updated to reference constants
instead of hardcoded values.

14/14 file manager tests passing.
@zqbake
Copy link
Copy Markdown
Collaborator

zqbake commented Apr 1, 2026

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

@Starlitnightly
Copy link
Copy Markdown
Collaborator Author

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue.

@zqbake
Copy link
Copy Markdown
Collaborator

zqbake commented Apr 1, 2026

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue.

Is it really worth adding a new tool just to fix the relay issue? How about merging the two tools instead?

@Starlitnightly
Copy link
Copy Markdown
Collaborator Author

Should we merge append_file into the write_file tool? I'm curious how CC handles file chunk writing. Ideally, we should keep our core toolset as minimal as possible. And also there are cli and ui related tool rendering should be updated.

i have discussed it with weize, cc dosen't use any relay api to transfer result. In my latest revised code, i have revised the max output of llm's response, this is the major reason caused this issue.

Is it really worth adding a new tool just to fix the relay issue? How about merging the two tools instead?

I don't think we should merge it now, because it relies heavily on litellm, and we'll have to wait until weize finishes refactoring the API backend.

This reverts commit df28566, reversing
changes made to 7920a72.
Refactor to replace litellm with provider catalog and SDK adapters
zqbake added a commit that referenced this pull request Apr 4, 2026
fix(file_manager): output-token truncation guards + append_file tool (clean cherry-pick from #55)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants