Case Study: PDF too large error in Claude Code CLI#1091
Conversation
Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #1082
This case study documents issue #1082 where the AI solver encountered a "PDF too large" error when attempting to read a 4.4MB PDF file using Claude Code CLI. Key findings: - The error is a known, persistent bug in Claude Code CLI (issues #13518, #11527, #9789, #15054) - After the error occurs, the session enters an infinite error loop - Multiple workarounds exist: pdftotext extraction, PDF splitting, /rewind - Root cause: PDF base64 encoding exceeds 25,000 token limit Includes: - Detailed timeline reconstruction from original log - Root cause analysis with token limits research - Workarounds documentation with code examples - References to related GitHub issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This reverts commit 5d2822e.
🤖 Solution Draft Log (Truncated)This log file contains the complete execution trace of the AI solution draft process. Click to expand solution draft log (6039KB, truncated)Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Please continue investigation, collect all data you can in our research. What is the root cause of 25000 tokens limitation for read tool for PDF? Or it is not read tool? Is it limitation of API or model architecture prevents us from using bigger files? It is related to all files read by I need all the details. And deep root causes. Also check code of OpenCode CLI, Gemini CLI, Qwen CLI. |
|
🤖 AI Work Session Started Starting automated work session at 2026-01-10T22:31:02.243Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
- Fix Prettier formatting in existing markdown files - Add comprehensive DEEP-ROOT-CAUSE-ANALYSIS.md with: - Three layers of limits analysis (CLI, API, Model) - Environment variables to override defaults - Comparison with OpenCode, Gemini CLI, and Qwen-Agent - Root cause hierarchy diagram - Proposed solutions for users and Anthropic - References to related GitHub issues - Update ANALYSIS.md to reference new documentation Key findings: - 25,000 token limit is CLI-level, not API/model - Can be overridden via MAX_MCP_OUTPUT_TOKENS env var - API actually supports 32MB/100 pages - Model context window is 200K tokens 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
Summary
This PR adds a comprehensive case study documenting the "PDF too large" error in Claude Code CLI, as requested in issue #1082.
Issue Reference
Fixes #1082
What's Included
ANALYSIS.md: Detailed case study with:
DEEP-ROOT-CAUSE-ANALYSIS.md: Deep technical investigation with:
workarounds.md: Practical workarounds including:
log-excerpts.md: Key excerpts from the original log showing:
Key Findings
Root Cause: CLI Implementation Choice, Not API/Model Limitation
The 25,000 token limit is hardcoded in Claude Code CLI, not an API or model limitation:
Environment Variables to Override
Known Bugs
Comparison with Other CLI Tools
max_input_tokens, RAG-based approach for large docsWorkarounds Available
pdftotextbefore reading/rewindcommand to recover from errorsTesting
This PR was created automatically by the AI issue solver