Skip to content

feat: add inline image response mode for broader MCP client compatibility #27

@dwmkerr

Description

@dwmkerr

Summary

Shellwright currently returns download URLs for screenshots/recordings, requiring agents to have filesystem access (Bash + file read). This limits compatibility to agents like Claude Code.

Adding an optional inline mode would enable Shellwright to work with pure MCP clients like Claude Desktop and ChatGPT.

Current Behavior

{
  "filename": "screenshot.png",
  "download_url": "http://localhost:7498/files/.../screenshot.png",
  "hint": "Use curl -o <filename> <download_url> to save the file"
}

Requires: Agent must run curl then read the file - only works with filesystem-capable agents.

Proposed Behavior

Add inline parameter to shell_screenshot:

inline: z.boolean().optional().describe("Return base64 image in response instead of download URL (default: false)")

When inline: true:

{
  "content": [
    { "type": "image", "data": "<base64>", "mimeType": "image/png" }
  ]
}

Implementation Options

  1. Per-call parameter - inline: true on screenshot calls
  2. Server config - --inline-images flag
  3. Both - config sets default, parameter overrides

Testing Plan

Before (current - should fail)

Test with Claude Desktop (MCP-only, no filesystem):

  1. Configure Shellwright as MCP server in Claude Desktop
  2. Ask: "Start a bash session, run ls, and show me a screenshot"
  3. Expected: Claude receives URL but cannot fetch/display it
  4. Actual behavior to document: What error or limitation does the user see?

After (with inline mode)

Same test with inline: true or --inline-images:

  1. Configure Shellwright with inline mode enabled
  2. Ask: "Start a bash session, run ls, and show me a screenshot"
  3. Expected: Screenshot appears inline in Claude Desktop conversation

Test Matrix

Client Filesystem Access Current With Inline
Claude Code ✅ Works ✅ Works
Claude Desktop ❌ URL only ✅ Works
ChatGPT + MCP ❌ URL only ✅ Works
Cursor ✅ Works ✅ Works

Context Window Considerations

Inline images consume significant context (~50-200KB base64 per screenshot). Consider:

  • Scaling images down (like Playwright MCP's 1.15MP / 1568px limit)
  • Warning in docs about context usage
  • Keeping URL mode as default for context-sensitive workflows

References

  • Playwright MCP's approach: Returns scaled base64 images inline
  • scaleImageToFitMessage() limits to Claude's vision requirements (1.15 megapixels max)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions