-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Summary
Shellwright currently returns download URLs for screenshots/recordings, requiring agents to have filesystem access (Bash + file read). This limits compatibility to agents like Claude Code.
Adding an optional inline mode would enable Shellwright to work with pure MCP clients like Claude Desktop and ChatGPT.
Current Behavior
{
"filename": "screenshot.png",
"download_url": "http://localhost:7498/files/.../screenshot.png",
"hint": "Use curl -o <filename> <download_url> to save the file"
}Requires: Agent must run curl then read the file - only works with filesystem-capable agents.
Proposed Behavior
Add inline parameter to shell_screenshot:
inline: z.boolean().optional().describe("Return base64 image in response instead of download URL (default: false)")When inline: true:
{
"content": [
{ "type": "image", "data": "<base64>", "mimeType": "image/png" }
]
}Implementation Options
- Per-call parameter -
inline: trueon screenshot calls - Server config -
--inline-imagesflag - Both - config sets default, parameter overrides
Testing Plan
Before (current - should fail)
Test with Claude Desktop (MCP-only, no filesystem):
- Configure Shellwright as MCP server in Claude Desktop
- Ask: "Start a bash session, run
ls, and show me a screenshot" - Expected: Claude receives URL but cannot fetch/display it
- Actual behavior to document: What error or limitation does the user see?
After (with inline mode)
Same test with inline: true or --inline-images:
- Configure Shellwright with inline mode enabled
- Ask: "Start a bash session, run
ls, and show me a screenshot" - Expected: Screenshot appears inline in Claude Desktop conversation
Test Matrix
| Client | Filesystem Access | Current | With Inline |
|---|---|---|---|
| Claude Code | ✅ | ✅ Works | ✅ Works |
| Claude Desktop | ❌ | ❌ URL only | ✅ Works |
| ChatGPT + MCP | ❌ | ❌ URL only | ✅ Works |
| Cursor | ✅ | ✅ Works | ✅ Works |
Context Window Considerations
Inline images consume significant context (~50-200KB base64 per screenshot). Consider:
- Scaling images down (like Playwright MCP's 1.15MP / 1568px limit)
- Warning in docs about context usage
- Keeping URL mode as default for context-sensitive workflows
References
- Playwright MCP's approach: Returns scaled base64 images inline
scaleImageToFitMessage()limits to Claude's vision requirements (1.15 megapixels max)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels