Skip to content

feat(image-upload): send images to agent#554

Open
vinhnxv wants to merge 1 commit intoslopus:mainfrom
vinhnxv:feat/send-images-to-agent
Open

feat(image-upload): send images to agent#554
vinhnxv wants to merge 1 commit intoslopus:mainfrom
vinhnxv:feat/send-images-to-agent

Conversation

@vinhnxv
Copy link

@vinhnxv vinhnxv commented Feb 7, 2026

Summary

Add the ability to send images (screenshots, UI mockups, photos) from the Happy mobile and desktop app to AI coding agents. This enables users to leverage Claude Code's vision capabilities directly from the app.

Core Feature: Image Upload Pipeline

Flow:

  1. User picks images from gallery (native) or file picker/clipboard paste (web)
  2. App resizes to max 1024px, converts to JPEG at quality 0.7 (~100-300KB)
  3. App uploads via existing RPC writeFile to CLI machine's OS temp dir ($TMPDIR/happy/uploads/{sessionId}/)
  4. App sends text message with [image: /path/to/file.jpg] references
  5. Agent reads the file using its Read tool to analyze the image

Why this approach: Zero server/protocol changes required. Reuses existing writeFile RPC (already encrypted), and Claude Code's Read tool already supports image files natively.

What's New

Area Change Files
Image picking (native) Gallery picker via expo-image-picker, resize via expo-image-manipulator imageUpload.ts
Image picking (web) File picker + clipboard paste, resize via Canvas API imageUpload.web.ts, MultiTextInput.web.tsx
Shared upload logic Base64 validation, RPC upload, upload dir caching, path sanitization imageUpload.shared.ts
Image upload hook useImageUpload — state management for pending images, pick/paste handlers useImageUpload.ts
UI: image button Action bar button with count badge, disabled at max (5 images) AgentInput.tsx
UI: image chips Horizontal scrollable strip above input with remove button per image AgentInput.tsx
Agent integration System prompt instructs agent to read [image: path] references systemPrompt.ts
CLI: upload dir RPC New getUploadDir RPC returns OS temp dir path registerCommonHandlers.ts
CLI: path security validatePath() extended with additionalAllowedDirs for temp upload dir pathSecurity.ts
CLI: payload stripping Strip base64 image data from tool results before socket transport apiSession.ts
i18n 6 new keys across all 11 supported languages translations/*.ts

Bug Fixes & Improvements

  • Fix crypto.getRandomValues crash on iOS/Android — Hermes doesn't have Web Crypto API; replaced with getRandomBytes from expo-crypto
  • Fix base64 encoding stack overflow — Chunked conversion in encodeBase64() for large buffers (>65KB) that previously crashed on web
  • Fix stale tool states — Reducer Phase 6 force-completes tools stuck in 'running' when agent has already responded with text
  • Fix ToolHeader overflow — Layout uses flexShrink instead of flexGrow to prevent text clipping on long file paths
  • Fix RPC timeout — Add 30s timeout to socket RPC calls with descriptive error messages (previously hung indefinitely)
  • Fix session init race — Create realtime session before SDK metadata extraction
  • Type safety — Extract ModelMode into const array with runtime validator isValidModelMode(), remove as any casts
  • Tool result parsing — Support image content blocks in tool results (not just text)
  • Cleanup — Remove leftover console.log debug statements from MultiTextInput
  • Tauri — Bump to ~2.9

Technical Notes

  • Payload safety: 520KB base64 cap ensures final encrypted payload stays under Socket.io's 1MB limit (double base64 encoding: 520KB → ~693KB on wire)
  • Platform split: imageUpload.ts (native) / imageUpload.web.ts (web) follows the existing MultiTextInput pattern. Shared logic extracted to imageUpload.shared.ts
  • No new dependencies: Both expo-image-picker and expo-image-manipulator were already in package.json but unused

Test plan

  • Pick single/multiple images on iOS simulator and verify upload + send
  • Pick images on web/desktop via file picker
  • Paste image from clipboard on web/desktop
  • Verify max 5 image limit enforced (button disables, visual feedback)
  • Remove an image chip before sending
  • Send message with only images (no text) — shows "Sent an image" / "Sent N images"
  • Send message with text + images
  • Verify /compact and /clear commands ignore pending images
  • Verify agent reads uploaded image files via Read tool
  • Verify large images are resized and stay under 520KB base64 limit
  • Verify path traversal attacks are blocked (pathSecurity.test.ts)
  • Verify ToolHeader doesn't clip long file paths
  • Verify stale 'running' tools are resolved after agent responds
  • yarn typecheck passes

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@vinhnxv vinhnxv force-pushed the feat/send-images-to-agent branch 2 times, most recently from 808a46d to 4efba38 Compare February 7, 2026 20:06
P1 fixes:
- Prevent symlink traversal in pathSecurity via realpathSync
- Serialize pick/paste with AsyncLock to prevent race conditions
- Replace queueMicrotask with 300ms setTimeout for double-tap guard

P2 fixes:
- Clean up session upload temp dir on shutdown
- Use shared encodeBase64 to fix O(n²) string concatenation in web
- Scope upload dir validation per session (cross-session isolation)
- Preserve non-timeout RPC errors instead of swallowing as "timed out"
- Type RPC result instead of any
- Fix isSessionMode type guard parameter type
- Tighten tool result content type to z.enum(['text', 'image'])
@vinhnxv vinhnxv force-pushed the feat/send-images-to-agent branch from 4efba38 to 230f495 Compare February 7, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant