Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/recording-eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ jobs:
with:
header: recording-eval
message: |
## Recording Evaluation

🎬 **[View recording comparisons](${{ steps.preview.outputs.preview-url }})**
## Evidence
### Screenshots and Recordings
**[View Recordings & Review](${{ steps.preview.outputs.preview-url }})**

- name: Upload recordings
if: github.event.action != 'closed'
Expand Down
4 changes: 4 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ npm run dev # Development with hot-reload
- **PTY** - Uses `node-pty` for pseudo-terminal management
- **File Server** - HTTP file server runs on port 7498 regardless of MCP transport mode (stdio or HTTP). Tool results include `download_url` which is always valid.

## MCP Tools

Tool names, descriptions, and parameters are registered in `src/index.ts` with schemas in `src/tools/`. The README.md [MCP Tools](#mcp-tools) section documents each tool with examples. **Any changes to tool schemas, parameters, or descriptions must be reflected in README.md.**

## Commit Format

Use conventional commits: `feat:`, `fix:`, `docs:`, `chore:`, `refactor:`, `test:`
Expand Down
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ Open [`k9s`](https://k9scli.io/) and show [Ark](https://github.com/mckinsey/agen

![Screenshot: Examples - K9S Agents](./docs/examples/k9s-agents.gif)

Take a bordered screenshot:

> Open Claude Code. Take a screenshot with a macOS window border titled "Shellwright".

![Screenshot: Examples - Bordered Screenshot](./docs/examples/claude-code-bordered.png)

Use [`htop`](https://github.com/htop-dev/htop):

> Open htop and show the most resource intensive process.
Expand Down Expand Up @@ -280,12 +286,13 @@ drwxr-xr-x 10 user staff 320 Dec 18 09:00 ..

### **shell_screenshot**

Capture terminal as PNG. Also saves SVG, ANSI, and plain text versions:
Capture terminal as PNG. Also saves SVG, ANSI, and plain text versions. Pass `name` without extension (`.png` is added automatically). Optionally add a macOS-style window border (off by default):

```json
{
"session_id": "shell-session-a1b2c3",
"name": "my-screenshot"
"name": "my-screenshot",
"border": { "style": "macos", "title": "Terminal" }
}
```

Expand All @@ -301,12 +308,13 @@ The response contains a `download_url` for curl to save the file locally:

### **shell_record_start**

Start recording frames for GIF export. Frames are captured at the specified FPS (default 10, max 30, compression occurs by deduplicating identical frames):
Start recording frames for GIF export. Frames are captured at the specified FPS (default 10, max 30, compression occurs by deduplicating identical frames). The optional `border` parameter (same as [`shell_screenshot`](#shell_screenshot)) applies window chrome to every frame:

```json
{
"session_id": "shell-session-a1b2c3",
"fps": 10
"fps": 10,
"border": { "style": "macos", "title": "Terminal" }
}
```

Expand Down
Binary file added docs/examples/claude-code-bordered.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 67 additions & 4 deletions evaluations/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Evaluations

Automated recording evaluations using Claude API with shellwright.
Automated recording and screenshot evaluations using Claude API with shellwright.

## Usage

Expand All @@ -9,24 +9,87 @@ Automated recording evaluations using Claude API with shellwright.
```bash
# Requires ANTHROPIC_API_KEY
npm run eval

# Run a single scenario
npm run eval -- screenshot-border
```

### Generate comparison table

```bash
npm run eval:compare
open scenarios/index.html
```

## Adding a new scenario

1. Create a folder in `scenarios/`
2. Add a `prompt.md` with instructions for Claude
3. Run evaluations to generate the recording
3. Run evaluations to generate artifacts (GIFs, PNGs)

## Baselines

Baselines are reference artifacts committed to the repo for visual comparison. Each artifact `<name>.<ext>` can have two baselines:

| File | Source |
|------|--------|
| `baseline-local-<name>.<ext>` | Developer machine |
| `baseline-cicd-<name>.<ext>` | CI environment |

### Updating baselines

**Local baseline:** Run the eval locally and copy the output:

```bash
npm run eval -- screenshot-border
cp scenarios/screenshot-border/screenshot.png scenarios/screenshot-border/baseline-local-screenshot.png
```

**CI/CD baseline:** Download the artifact from the PR preview and commit it:

```bash
curl -o scenarios/vim-session/baseline-cicd-recording.gif \
https://dwmkerr.github.io/shellwright/pr-preview/pr-XX/vim-session/recording.gif
```

The comparison page auto-discovers baselines by scanning for `baseline-{local,cicd}-*` files matching each artifact.

## MCP tools available in scenarios

Scenario prompts instruct Claude to use these shellwright MCP tools:

| Tool | Description |
|------|-------------|
| `shell_start` | Start a new PTY session with a command |
| `shell_send` | Send input to a PTY session (use `\r` for Enter) |
| `shell_read` | Read the current terminal buffer as plain text |
| `shell_screenshot` | Capture terminal screenshot as PNG |
| `shell_record_start` | Start recording a terminal session (captures frames for GIF) |
| `shell_record_stop` | Stop recording and save GIF |
| `shell_stop` | Stop a PTY session |

### Key parameters

**`shell_start`** — `command`, `args`, `cols`, `rows`, `theme` (e.g., `one-dark`)

**`shell_send`** — `input` (with escape sequences: `\r`=Enter, `\x1b`=Escape, `\x03`=Ctrl+C)

**`shell_screenshot`** — `name` (without extension), `border: { style: "macos", title: "..." }`

**`shell_record_start`** — `fps` (default: 10, max: 30)

**`shell_record_stop`** — `name` (without extension, `.gif` added automatically)

### Artifact naming

Tools append extensions automatically — pass names **without** extensions:
- `name: "recording"` → `recording.gif`
- `name: "screenshot"` → `screenshot.png`

## CI Integration

The `recording-eval.yaml` workflow runs on every PR:
1. Executes all scenarios
2. Generates comparison table
3. Uploads recordings as artifacts
4. Posts summary to PR
3. Deploys to GitHub Pages as PR preview
4. Uploads GIF and PNG artifacts
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions evaluations/template.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Recording Evaluation</title>
<title>Evidence: Screenshots and Recordings</title>
<style>
* { box-sizing: border-box; }
body {
Expand All @@ -25,7 +25,7 @@
</style>
</head>
<body>
<h1>Recording Evaluation</h1>
<h1>Evidence: Screenshots and Recordings</h1>
<div class="note">
<strong>Baselines:</strong> Local = developer machine, CI/CD = previous CI run.<br>
To update a baseline: download the PR artifact and commit as <code>baseline-local-&lt;name&gt;.ext</code> or <code>baseline-cicd-&lt;name&gt;.ext</code>
Expand Down
4 changes: 2 additions & 2 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ Tips:

server.tool(
"shell_screenshot",
"Capture terminal screenshot as PNG. Returns a download_url - use curl to save the file locally (e.g., curl -o screenshot.png <url>)",
"Capture terminal screenshot as PNG. Optionally add a macOS-style window border with border: { style: \"macos\", title: \"...\" } (off by default). Returns a download_url - use curl to save the file locally (e.g., curl -o screenshot.png <url>)",
shellScreenshotSchema,
async (params) => shellScreenshot(params, toolContext)
);
Expand All @@ -216,7 +216,7 @@ Tips:

server.tool(
"shell_record_start",
"Start recording a terminal session (captures frames for GIF/video export)",
"Start recording a terminal session (captures frames for GIF export). Optionally add a macOS-style window border to every frame with border: { style: \"macos\", title: \"...\" } (off by default).",
shellRecordStartSchema,
async (params) => shellRecordStart(params, toolContext)
);
Expand Down
18 changes: 12 additions & 6 deletions src/tools/shell-record-start.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,17 @@ import { ToolContext } from "./types.js";
export const shellRecordStartSchema = {
session_id: z.string().describe("Session ID"),
fps: z.number().optional().describe("Frames per second (default: 10, max: 30)"),
border: z.object({
style: z.enum(["macos"]).describe("Border style"),
title: z.string().optional().describe("Title text in the title bar"),
}).optional().describe("Optional window border decoration applied to every frame"),
};

export async function shellRecordStart(
params: { session_id: string; fps?: number },
params: { session_id: string; fps?: number; border?: { style: "macos"; title?: string } },
context: ToolContext
) {
const { session_id, fps } = params;
const { session_id, fps, border } = params;
const session = context.sessions.get(session_id);
if (!session) {
throw new Error(`Session not found: ${session_id}`);
Expand All @@ -34,14 +38,16 @@ export async function shellRecordStart(
framesDir,
frameCount: 0,
fps: recordingFps,
border,
interval: setInterval(async () => {
if (!session.recording) return;

const frameNum = session.recording.frameCount++;
const svg = bufferToSvg(session.terminal, session.cols, session.rows, {
theme: session.theme,
fontSize: context.config.FONT_SIZE,
fontFamily: context.config.FONT_FAMILY
const svg = bufferToSvg(session.terminal, session.cols, session.rows, {
theme: session.theme,
fontSize: context.config.FONT_SIZE,
fontFamily: context.config.FONT_FAMILY,
border,
});
const png = new Resvg(svg, context.resvgOptions).render().asPng();
const framePath = path.join(framesDir, `frame${String(frameNum).padStart(6, "0")}.png`);
Expand Down
1 change: 1 addition & 0 deletions src/tools/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export interface RecordingState {
frameCount: number;
interval: ReturnType<typeof setInterval>;
fps: number;
border?: { style: string; title?: string };
}

export interface Session {
Expand Down