fix: complete Telegram image pipeline for Claude Code agents#561
Open
f-liva wants to merge 5 commits intoRightNow-AI:mainfrom
Open
fix: complete Telegram image pipeline for Claude Code agents#561f-liva wants to merge 5 commits intoRightNow-AI:mainfrom
f-liva wants to merge 5 commits intoRightNow-AI:mainfrom
Conversation
…outing and ClaudeCode driver Root cause 1: Vision model swap (kernel.rs) changed the model to qwen-vl-plus but left the provider as claude-code, routing images to the wrong driver. Root cause 2: ClaudeCodeDriver.build_prompt() called text_content() which silently dropped all ContentBlock::Image and ContentBlock::ImageUrl blocks. Fix: - kernel.rs: vision model swap now also updates the provider - claude_code.rs: full image support via temp files passed with --files flag (handles base64, data URIs, and HTTP URLs) - All other drivers: ensure ImageUrl content blocks are handled - compactor.rs: handle ImageUrl in conversation compaction - bridge.rs: improved image dispatch reliability Closes RightNow-AI#528 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check model catalog's supports_vision flag before swapping. Models like claude-opus-4-6 handle images natively — no need to swap to a separate vision model (which may use a different, unconfigured provider). Also warn when images arrive but no vision fallback is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When vision_model is explicitly set in config, always use it (forced override). Only fall back to the current agent model when no vision_model is configured and the model supports vision natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CLI option is --file (singular), not --files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The --file flag requires a session token for file downloads (Files API). Instead, embed @/tmp/image.jpg directly in the prompt text, which tells Claude Code CLI to read the local file natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the end-to-end Telegram image pipeline so photos sent via Telegram reach Claude Code agents correctly and are processed with vision capabilities.
Three bugs were found and fixed:
Vision model selection logic (
kernel.rs): When an image arrived, the kernel unconditionally swapped to avision_modelfrom config — even when the agent's current model (e.g. Claude Opus) already supports vision natively. This caused auth errors when the fallback vision model had misconfigured credentials. Now uses priority-based selection:vision_modelfrom config (forced override)Wrong CLI flag (
claude_code.rs): The driver used--files(plural) which doesn't exist in Claude Code CLI. Changed to--file(singular).--filerequires Files API (claude_code.rs): Even with the correct flag,--fileexpects Anthropic Files API file IDs and requiresCLAUDE_CODE_SESSION_ACCESS_TOKEN. Replaced with@pathsyntax — Claude Code's native way to reference local files inline in the prompt. Images are now written to temp files and referenced as@/tmp/openfang-img-xxx.jpgin the prompt text.Changes
crates/openfang-kernel/src/kernel.rs— Priority-based vision model selection with model catalog lookupcrates/openfang-runtime/src/drivers/claude_code.rs— Rewrote image handling: temp file extraction,@pathprompt injection, cleanup after CLI exits, refactoredbuild_args()for testabilitybuild_args()covering skip-permissions, streaming, and model flagsCloses #528
Test plan
🤖 Generated with Claude Code