Skip to content

fix: complete Telegram image pipeline for Claude Code agents#561

Open
f-liva wants to merge 5 commits intoRightNow-AI:mainfrom
f-liva:fix/telegram-image-complete
Open

fix: complete Telegram image pipeline for Claude Code agents#561
f-liva wants to merge 5 commits intoRightNow-AI:mainfrom
f-liva:fix/telegram-image-complete

Conversation

@f-liva
Copy link

@f-liva f-liva commented Mar 12, 2026

Summary

Fixes the end-to-end Telegram image pipeline so photos sent via Telegram reach Claude Code agents correctly and are processed with vision capabilities.

Three bugs were found and fixed:

  1. Vision model selection logic (kernel.rs): When an image arrived, the kernel unconditionally swapped to a vision_model from config — even when the agent's current model (e.g. Claude Opus) already supports vision natively. This caused auth errors when the fallback vision model had misconfigured credentials. Now uses priority-based selection:

    • Priority 1: Explicit vision_model from config (forced override)
    • Priority 2: Current agent model if it supports vision (no swap)
    • Priority 3: Warning — no vision capability available
  2. Wrong CLI flag (claude_code.rs): The driver used --files (plural) which doesn't exist in Claude Code CLI. Changed to --file (singular).

  3. --file requires Files API (claude_code.rs): Even with the correct flag, --file expects Anthropic Files API file IDs and requires CLAUDE_CODE_SESSION_ACCESS_TOKEN. Replaced with @path syntax — Claude Code's native way to reference local files inline in the prompt. Images are now written to temp files and referenced as @/tmp/openfang-img-xxx.jpg in the prompt text.

Changes

  • crates/openfang-kernel/src/kernel.rs — Priority-based vision model selection with model catalog lookup
  • crates/openfang-runtime/src/drivers/claude_code.rs — Rewrote image handling: temp file extraction, @path prompt injection, cleanup after CLI exits, refactored build_args() for testability
  • Added unit tests for build_args() covering skip-permissions, streaming, and model flags

Closes #528

Test plan

  • Send photo via Telegram to Claude Code agent (Opus) → agent sees and describes the image
  • No auth errors, no session token requirements
  • Temp files cleaned up after processing
  • Unit tests pass for build_args and build_prompt

🤖 Generated with Claude Code

f-liva and others added 5 commits March 12, 2026 15:10
…outing and ClaudeCode driver

Root cause 1: Vision model swap (kernel.rs) changed the model to qwen-vl-plus
but left the provider as claude-code, routing images to the wrong driver.

Root cause 2: ClaudeCodeDriver.build_prompt() called text_content() which
silently dropped all ContentBlock::Image and ContentBlock::ImageUrl blocks.

Fix:
- kernel.rs: vision model swap now also updates the provider
- claude_code.rs: full image support via temp files passed with --files flag
  (handles base64, data URIs, and HTTP URLs)
- All other drivers: ensure ImageUrl content blocks are handled
- compactor.rs: handle ImageUrl in conversation compaction
- bridge.rs: improved image dispatch reliability

Closes RightNow-AI#528

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check model catalog's supports_vision flag before swapping. Models like
claude-opus-4-6 handle images natively — no need to swap to a separate
vision model (which may use a different, unconfigured provider).

Also warn when images arrive but no vision fallback is available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When vision_model is explicitly set in config, always use it (forced
override). Only fall back to the current agent model when no
vision_model is configured and the model supports vision natively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CLI option is --file (singular), not --files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The --file flag requires a session token for file downloads (Files API).
Instead, embed @/tmp/image.jpg directly in the prompt text, which tells
Claude Code CLI to read the local file natively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(channels): Telegram image media_type is application/octet-stream — breaks vision

1 participant