feat(whatsapp-gateway): add media message support by f-liva · Pull Request #606 · RightNow-AI/openfang

f-liva · 2026-03-14T20:19:07Z

Summary

Closes #605

Detect media messages (images, video, audio, documents, stickers) in incoming WhatsApp messages
Download media via Baileys' downloadMediaMessage()
Upload to OpenFang's existing /api/agents/{id}/upload endpoint
Forward the file_id in the attachments array so the LLM receives the content
Graceful fallback: if media download/upload fails, text/caption is still forwarded

Test plan

Send image without caption → agent receives and describes the image
Send image with caption → agent receives both image and caption text
Send voice note → agent receives audio attachment
Send document (PDF) → agent receives document attachment
Send text-only message → behavior unchanged

🤖 Generated with Claude Code

- Add toolchain: Node.js 22, Claude Code CLI, Python 3, uv, Go, gh, ffmpeg - Add gosu + non-root user (openfang) with passwordless sudo - Entrypoint drops root privileges via gosu for Claude Code compatibility - Add GitHub Actions workflow to auto-sync upstream releases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace opaque dark-background logo with transparent PNG and add CSS invert filter for light theme so the snake is visible in both dark and light modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extend the CSS invert filter to also cover .message-avatar img, so the agent logo in chat messages is visible in light mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, Claude Code had key_required=false so detect_auth() always set NotRequired ("No Key Needed"), regardless of whether the CLI was installed or authenticated. Now it checks: - CLI installed + authenticated → Configured - CLI installed, not authenticated → Missing (Not Set) - CLI not installed → NotRequired (No Key Needed) This also fixes the Runtime/Overview page showing Claude Code as "not configured" when it is actually ready. Fixes RightNow-AI#376 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GITHUB_TOKEN cannot push changes to .github/workflows/ files. Use a PAT with workflows permission instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Install pip3, playwright, and Chromium with system dependencies for browser automation and PDF generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The rebase creates new commit hashes, and --force-with-lease fails when the remote was updated between fetches. Since this is an intentional rebase sync, --force is appropriate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

OpenFang's Browser Hand checks for 'python' not 'python3'. Debian only installs the python3 binary, so add a symlink. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The ResourceQuota default set max_cost_per_hour_usd to 1.0 while daily and monthly were 0.0 (unlimited). This caused agents without explicit quota configuration to hit a hidden $1/hour cap. Also fixes apply_budget_defaults() which compared against the old hardcoded default value of 1.0. Fixes RightNow-AI#416 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously the workflow only rebuilt when a new upstream tag was detected, so custom commits on main were silently ignored. Now pushes to main trigger a full build+push to Docker Hub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When upstream changes conflict with custom fork commits, the rebase now automatically skips the conflicting commits instead of failing. Skipped commits are typically duplicates already applied upstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Browser Hand no longer requires Playwright — it now needs Chromium directly. Replace pip3 playwright install with apt chromium package. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Upstream v0.3.46 removed the flag, causing all agents using Claude Code to be completely paralyzed — every command requires interactive terminal approval that cannot be given via dashboard or Telegram. Without this flag, Claude Code as a provider is unusable in any non-interactive context (web UI, Telegram, API, scheduled tasks). Refs: RightNow-AI#515, RightNow-AI#325 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tability Add qwen-code as a new subprocess-based LLM provider, mirroring the claude-code driver pattern. Qwen Code CLI (qwen) uses --yolo for non-interactive mode and supports json/stream-json output formats. New files: - drivers/qwen_code.rs: full driver with complete/stream, env filtering Changes: - drivers/mod.rs: register qwen-code provider (defaults, create_driver) - model_catalog.rs: add provider info, 3 models (qwen3-coder, qwen-coder-plus, qwq-32b), aliases, auth detection - claude_code.rs: extract build_args() for testability, fix duplicate --dangerously-skip-permissions flag that was always added regardless of skip_permissions setting Tests: 31 new/updated (18 qwen-code + 13 claude-code) covering build_args with/without permission flags, streaming, model selection, prompt building, JSON parsing, and catalog integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Install @qwen-code/qwen-code alongside Claude Code so the qwen-code provider is available in the container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…outing and ClaudeCode driver Root cause 1: Vision model swap (kernel.rs) changed the model to qwen-vl-plus but left the provider as claude-code, routing images to the wrong driver. Root cause 2: ClaudeCodeDriver.build_prompt() called text_content() which silently dropped all ContentBlock::Image and ContentBlock::ImageUrl blocks. Fix: - kernel.rs: vision model swap now also updates the provider - claude_code.rs: full image support via temp files passed with --files flag (handles base64, data URIs, and HTTP URLs) - All other drivers: ensure ImageUrl content blocks are handled - compactor.rs: handle ImageUrl in conversation compaction - bridge.rs: improved image dispatch reliability Closes RightNow-AI#528 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Notify on build success/failure via Telegram Bot API using TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID secrets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Check model catalog's supports_vision flag before swapping. Models like claude-opus-4-6 handle images natively — no need to swap to a separate vision model (which may use a different, unconfigured provider). Also warn when images arrive but no vision fallback is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When vision_model is explicitly set in config, always use it (forced override). Only fall back to the current agent model when no vision_model is configured and the model supports vision natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The CLI option is --file (singular), not --files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The --file flag requires a session token for file downloads (Files API). Instead, embed @/tmp/image.jpg directly in the prompt text, which tells Claude Code CLI to read the local file natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When multiple Claude Code subscriptions are configured via the `profiles` field in DriverConfig, the new MultiProfileDriver automatically rotates between them on rate-limit errors. Cooldown timestamps are derived from the Anthropic OAuth usage API (`/api/oauth/usage`) so profiles are re-enabled at exactly the right time. Config example: profiles = ["~/.claude", "~/.claude-profiles/account-2"] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

WhatsApp gateway sends sender metadata (phone number, display name) but the API was silently discarding it because MessageRequest had no metadata field. This caused agents to treat all WhatsApp users as their owner. Changes: - Add optional metadata field to MessageRequest (types.rs) - Parse metadata into SenderContext and propagate through kernel (routes.rs, kernel.rs) - Inject sender identity into agent system prompt (prompt_builder.rs) - Add SenderContext struct to message types (message.rs) - Activate is_allowed() filter with open-by-default behavior (whatsapp.rs) - Add allowed_users check in API routes (open mode when list is empty) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…te DM When a message arrives from a WhatsApp group (@g.us), the gateway now replies in the same group instead of sending a private DM to the sender. Changes: - Detect group messages via @g.us JID suffix - Extract real sender from msg.key.participant for groups - Reply to remoteJid (group) instead of always to sender JID - Add group metadata (group_jid, group_name, is_group) to forwarded context - Resolve agent name to UUID dynamically via /api/agents endpoint - sendMessage now accepts full JIDs (including @g.us groups) - Default agent changed from 'assistant' to 'ambrogio' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…io, documents) Previously the gateway silently dropped all media-only messages (images without captions, voice notes, documents, stickers). Now it downloads media via Baileys' downloadMediaMessage(), uploads to OpenFang's /upload endpoint, and forwards the file_id in the attachments array so the LLM can see the content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Federico Liva and others added 30 commits March 13, 2026 10:04

ci: add sync-build workflow

1f85bd4

ci: use main branch instead of custom

4e274aa

feat: add gogcli to image

bd2ef99

chore: sync to upstream v0.3.22

4a1214b

feat: install brew + gogcli as non-root, add PATH for npm-global

e550aad

chore: sync to upstream v0.3.23

8189b99

chore: sync to upstream v0.3.24

9b64dbc

docs: add Docker Hub README with auto-sync via GitHub Actions

a4c0bba

feat: add jq to image

b5d2bcc

docs: fix gogcli description — Google Workspace CLI, not GOG.com

bf31aa5

fix: logo visibility in light theme

3a5003b

Replace opaque dark-background logo with transparent PNG and add CSS invert filter for light theme so the snake is visible in both dark and light modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: invert logo in message avatar for light theme

0b73375

Extend the CSS invert filter to also cover .message-avatar img, so the agent logo in chat messages is visible in light mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use PAT_TOKEN for workflow push permission

c7c8458

GITHUB_TOKEN cannot push changes to .github/workflows/ files. Use a PAT with workflows permission instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add Playwright + Chromium to Docker image

6460e0d

Install pip3, playwright, and Chromium with system dependencies for browser automation and PDF generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: sync to upstream v0.3.26

4eaa5a4

fix: add python -> python3 symlink for Browser Hand detection

0510605

OpenFang's Browser Hand checks for 'python' not 'python3'. Debian only installs the python3 binary, so add a symlink. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: sync to upstream v0.3.27

8570594

chore: sync to upstream v0.3.34

189bd07

fix: replace Playwright with chromium package in Docker image

2f7cc39

Browser Hand no longer requires Playwright — it now needs Chromium directly. Replace pip3 playwright install with apt chromium package. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: sync to upstream v0.3.47

f5328e0

feat: add Qwen Code CLI to Docker image

ddb4a73

Install @qwen-code/qwen-code alongside Claude Code so the qwen-code provider is available in the container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: correct Qwen Code credentials path from ~/.qwen-code to ~/.qwen

2b3784b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

f-liva and others added 12 commits March 13, 2026 10:20

feat: add Telegram notifications to CI workflow

eb351f1

Notify on build success/failure via Telegram Bot API using TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID secrets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use --file instead of --files for Claude Code CLI

6d67a0c

The CLI option is --file (singular), not --files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add Telegram notification on build start

3c9ad58

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: sync to upstream v0.4.0

e860faf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(whatsapp-gateway): add media message support#606

feat(whatsapp-gateway): add media message support#606
f-liva wants to merge 42 commits intoRightNow-AI:mainfrom
f-liva:feat/whatsapp-media-support

f-liva commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

f-liva commented Mar 14, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants