Skip to content

feat(whatsapp-gateway): add media message support#606

Open
f-liva wants to merge 42 commits intoRightNow-AI:mainfrom
f-liva:feat/whatsapp-media-support
Open

feat(whatsapp-gateway): add media message support#606
f-liva wants to merge 42 commits intoRightNow-AI:mainfrom
f-liva:feat/whatsapp-media-support

Conversation

@f-liva
Copy link

@f-liva f-liva commented Mar 14, 2026

Summary

Closes #605

  • Detect media messages (images, video, audio, documents, stickers) in incoming WhatsApp messages
  • Download media via Baileys' downloadMediaMessage()
  • Upload to OpenFang's existing /api/agents/{id}/upload endpoint
  • Forward the file_id in the attachments array so the LLM receives the content
  • Graceful fallback: if media download/upload fails, text/caption is still forwarded

Test plan

  • Send image without caption → agent receives and describes the image
  • Send image with caption → agent receives both image and caption text
  • Send voice note → agent receives audio attachment
  • Send document (PDF) → agent receives document attachment
  • Send text-only message → behavior unchanged

🤖 Generated with Claude Code

Federico Liva and others added 30 commits March 13, 2026 10:04
- Add toolchain: Node.js 22, Claude Code CLI, Python 3, uv, Go, gh, ffmpeg
- Add gosu + non-root user (openfang) with passwordless sudo
- Entrypoint drops root privileges via gosu for Claude Code compatibility
- Add GitHub Actions workflow to auto-sync upstream releases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace opaque dark-background logo with transparent PNG and add
CSS invert filter for light theme so the snake is visible in both
dark and light modes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the CSS invert filter to also cover .message-avatar img,
so the agent logo in chat messages is visible in light mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, Claude Code had key_required=false so detect_auth() always
set NotRequired ("No Key Needed"), regardless of whether the CLI was
installed or authenticated. Now it checks:
- CLI installed + authenticated → Configured
- CLI installed, not authenticated → Missing (Not Set)
- CLI not installed → NotRequired (No Key Needed)

This also fixes the Runtime/Overview page showing Claude Code as "not
configured" when it is actually ready.

Fixes RightNow-AI#376

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GITHUB_TOKEN cannot push changes to .github/workflows/ files.
Use a PAT with workflows permission instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Install pip3, playwright, and Chromium with system dependencies
for browser automation and PDF generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rebase creates new commit hashes, and --force-with-lease fails
when the remote was updated between fetches. Since this is an
intentional rebase sync, --force is appropriate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenFang's Browser Hand checks for 'python' not 'python3'.
Debian only installs the python3 binary, so add a symlink.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ResourceQuota default set max_cost_per_hour_usd to 1.0 while daily
and monthly were 0.0 (unlimited). This caused agents without explicit
quota configuration to hit a hidden $1/hour cap.

Also fixes apply_budget_defaults() which compared against the old
hardcoded default value of 1.0.

Fixes RightNow-AI#416

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously the workflow only rebuilt when a new upstream tag was
detected, so custom commits on main were silently ignored.
Now pushes to main trigger a full build+push to Docker Hub.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When upstream changes conflict with custom fork commits, the rebase
now automatically skips the conflicting commits instead of failing.
Skipped commits are typically duplicates already applied upstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Browser Hand no longer requires Playwright — it now needs Chromium
directly. Replace pip3 playwright install with apt chromium package.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream v0.3.46 removed the flag, causing all agents using Claude Code
to be completely paralyzed — every command requires interactive terminal
approval that cannot be given via dashboard or Telegram.

Without this flag, Claude Code as a provider is unusable in any
non-interactive context (web UI, Telegram, API, scheduled tasks).

Refs: RightNow-AI#515, RightNow-AI#325

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tability

Add qwen-code as a new subprocess-based LLM provider, mirroring the
claude-code driver pattern. Qwen Code CLI (qwen) uses --yolo for
non-interactive mode and supports json/stream-json output formats.

New files:
- drivers/qwen_code.rs: full driver with complete/stream, env filtering

Changes:
- drivers/mod.rs: register qwen-code provider (defaults, create_driver)
- model_catalog.rs: add provider info, 3 models (qwen3-coder,
  qwen-coder-plus, qwq-32b), aliases, auth detection
- claude_code.rs: extract build_args() for testability, fix duplicate
  --dangerously-skip-permissions flag that was always added regardless
  of skip_permissions setting

Tests: 31 new/updated (18 qwen-code + 13 claude-code) covering
build_args with/without permission flags, streaming, model selection,
prompt building, JSON parsing, and catalog integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Install @qwen-code/qwen-code alongside Claude Code so the qwen-code
provider is available in the container.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f-liva and others added 12 commits March 13, 2026 10:20
…outing and ClaudeCode driver

Root cause 1: Vision model swap (kernel.rs) changed the model to qwen-vl-plus
but left the provider as claude-code, routing images to the wrong driver.

Root cause 2: ClaudeCodeDriver.build_prompt() called text_content() which
silently dropped all ContentBlock::Image and ContentBlock::ImageUrl blocks.

Fix:
- kernel.rs: vision model swap now also updates the provider
- claude_code.rs: full image support via temp files passed with --files flag
  (handles base64, data URIs, and HTTP URLs)
- All other drivers: ensure ImageUrl content blocks are handled
- compactor.rs: handle ImageUrl in conversation compaction
- bridge.rs: improved image dispatch reliability

Closes RightNow-AI#528

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Notify on build success/failure via Telegram Bot API using
TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check model catalog's supports_vision flag before swapping. Models like
claude-opus-4-6 handle images natively — no need to swap to a separate
vision model (which may use a different, unconfigured provider).

Also warn when images arrive but no vision fallback is available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When vision_model is explicitly set in config, always use it (forced
override). Only fall back to the current agent model when no
vision_model is configured and the model supports vision natively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CLI option is --file (singular), not --files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The --file flag requires a session token for file downloads (Files API).
Instead, embed @/tmp/image.jpg directly in the prompt text, which tells
Claude Code CLI to read the local file natively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When multiple Claude Code subscriptions are configured via the `profiles`
field in DriverConfig, the new MultiProfileDriver automatically rotates
between them on rate-limit errors.  Cooldown timestamps are derived from
the Anthropic OAuth usage API (`/api/oauth/usage`) so profiles are
re-enabled at exactly the right time.

Config example:
  profiles = ["~/.claude", "~/.claude-profiles/account-2"]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WhatsApp gateway sends sender metadata (phone number, display name) but
the API was silently discarding it because MessageRequest had no metadata
field. This caused agents to treat all WhatsApp users as their owner.

Changes:
- Add optional metadata field to MessageRequest (types.rs)
- Parse metadata into SenderContext and propagate through kernel (routes.rs, kernel.rs)
- Inject sender identity into agent system prompt (prompt_builder.rs)
- Add SenderContext struct to message types (message.rs)
- Activate is_allowed() filter with open-by-default behavior (whatsapp.rs)
- Add allowed_users check in API routes (open mode when list is empty)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te DM

When a message arrives from a WhatsApp group (@g.us), the gateway now
replies in the same group instead of sending a private DM to the sender.

Changes:
- Detect group messages via @g.us JID suffix
- Extract real sender from msg.key.participant for groups
- Reply to remoteJid (group) instead of always to sender JID
- Add group metadata (group_jid, group_name, is_group) to forwarded context
- Resolve agent name to UUID dynamically via /api/agents endpoint
- sendMessage now accepts full JIDs (including @g.us groups)
- Default agent changed from 'assistant' to 'ambrogio'

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…io, documents)

Previously the gateway silently dropped all media-only messages (images without
captions, voice notes, documents, stickers). Now it downloads media via Baileys'
downloadMediaMessage(), uploads to OpenFang's /upload endpoint, and forwards the
file_id in the attachments array so the LLM can see the content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WhatsApp gateway drops media-only messages (images, voice, documents)

2 participants