Mac Mini Agent

macOS automation for AI agents. Direct your agents to steer, drive and listen. Watch our agent operate their own device.

Engineers are already using AI agents to write code — but those agents are trapped inside the terminal. Steer is the missing layer between "agent writes code" and "agent ships features." It gives AI agents full control of your Mac: clicking buttons, reading screens via OCR, typing into any app, and orchestrating terminals via tmux.

The Problem

AI agents can write code, plan, and reason. But they can't open an app. They can't click a button. They can't read what's on your screen. There's a massive gap between an agent that writes code and an agent that ships features.

That gap is computer use. And for macOS, the existing tools fall short:

Accessibility trees return nothing for Electron apps (VS Code, Slack, Notion)
Terminal agents can run commands but can't see output, recover from errors, or coordinate with GUI tools
No orchestration layer exists to combine terminal control with GUI automation

The Mac Mini Agent solves this with four purpose-built CLIs.

The Solution

Soon, there will be more agents operating computers than engineers operating computers — but you can be the one who deploys and controls them.

The Mac Mini Agent is a minimal toolkit you can adopt to deploy onto any of your Mac devices, to give AI agents the power to use your computer like you do.

When you increase your agent's autonomy, you increase your own. An agent that can only write code still needs you to test it, deploy it, and verify it. An agent that can see the screen, click buttons, orchestrate terminals, and operate across apps — that agent can ship while you sleep.

Two Skills, Four CLIs. Many Macs. Full agent autonomy.

Who Is This For?

Mid-Senior+ engineers who want to move beyond terminal agent interactions into true computer-use agents that can operate autonomously across an entire environment.

If your agent can write code but can't ship it — Steer and Drive close that gap.

Apps

Steer — GUI Control

macOS automation CLI for AI agents. Eyes and hands on your Mac.

Swift · v0.2.0 · 14 commands

Steer gives agents the ability to see, interact with, and control any macOS application through screenshots, OCR, accessibility trees, and input simulation.

Command	Purpose
`see`	Capture screenshots of apps, windows, or screens
`click`	Click at coordinates or on detected text elements
`type`	Type text into any focused application
`hotkey`	Send keyboard shortcuts (cmd+s, cmd+tab, etc.)
`scroll`	Scroll in any direction within an app
`drag`	Drag from one point to another
`apps`	List running applications
`screens`	List available displays
`window`	Move, resize, and manage windows
`ocr`	Extract text from screen via Vision framework
`focus`	Activate and focus a specific application
`find`	Locate UI elements on screen
`clipboard`	Read and write the system clipboard
`wait`	Wait for conditions (element visible, text appears)

OCR: The Equalizer

Electron apps (VS Code, Slack, Notion) return completely empty accessibility trees. Every AI agent trying to interact with these apps is flying blind — unless you use OCR.

One command — steer ocr --store — and suddenly every piece of text on screen becomes a clickable, addressable element. This was the breakthrough that unlocked everything else.

Drive — Terminal Control

Terminal automation CLI for AI agents. Programmatic tmux control.

Python · v0.1.0 · 6 commands

Drive gives agents full programmatic control over tmux sessions — creating terminals, sending commands, reading output, and orchestrating parallel workloads.

Command	Purpose
`session`	Create, list, and manage tmux sessions
`run`	Execute a command in a tmux pane and wait for completion
`send`	Send keystrokes to a tmux pane
`logs`	Capture pane output (capture-pane)
`poll`	Wait for a sentinel marker indicating command completion
`fanout`	Execute commands across multiple panes in parallel

The Sentinel Pattern

Drive uses a sentinel marker pattern (__DONE_<token>:<exit_code>) to reliably detect when commands complete in tmux sessions. This is what makes async terminal orchestration deterministic instead of guesswork.

Listen — Job Server

HTTP server + job manager for remote agent execution.

Python · v0.1.0 · FastAPI

Listen accepts job requests over HTTP, spawns Claude Code agents as worker processes, and tracks their progress via YAML job files. It's the dispatch layer that turns a prompt into a running agent.

Endpoint	Purpose
`POST /job`	Submit a prompt, get back a job ID
`GET /job/{id}`	Check job status, updates, and summary
`GET /jobs`	List all jobs
`DELETE /job/{id}`	Stop a running job

just listen                    # Start server on port 7600
just send "Open Safari..."     # Submit a job via Direct
just jobs                      # List all jobs
just job <id>                  # Check a specific job
just latest                    # Full details of the most recent job
just latest 3                  # Full details of the last 3 jobs

Direct — CLI Client

CLI client for the Listen server.

Python · v0.1.0 · httpx + Click

Direct is the command-line interface for sending prompts to Listen. It starts jobs, checks their status, and stops them.

Command	Purpose
`start`	Submit a new job prompt
`get`	Get current state of a job
`list`	List all jobs
`latest`	Show full details of the latest N jobs
`stop`	Stop a running job

Key Patterns

Cross-App Pipelines

Combine Steer and Drive to build pipelines that span multiple applications without human input:

Read GitHub issue (Browser via OCR) → Run Claude Code (Terminal via Drive) → Log result (Notion via Steer)

Agent-on-Agent Orchestration

One AI agent can control other AI agents through tmux — sending prompts, monitoring output, and managing sessions:

A primary Claude Code instance spins up tmux sessions, launches secondary agents, prompts them, monitors their work, and closes them when done. Agent inception — visible in real time.

Multi-Window Orchestration

Agents can operate across your entire desktop simultaneously — researching in the browser, coding in the terminal, documenting in Notion, monitoring logs:

Self-Healing Automation

Production-grade agents need resilience. Steer's wait, find, and ocr commands enable self-healing flows that recover from timeouts and errors:

Custom Agent Support

Steer and Drive are agent-agnostic tools. Any Skill + CLI-based AI agent can use them:

Claude Code — Anthropic's CLI agent
Pi Agent — Custom agent framework
Gemini CLI — Google's CLI agent
Codex CLI — OpenAI's CLI agent
OpenCode — Open-source alternative
Cursor CLI - Cursor's CLI agent

The agent just needs to be able to invoke shell commands. Steer and Drive handle the rest.

Quick Start

# GUI automation
steer see --app Safari                    # Screenshot Safari
steer ocr --app "VS Code" --store         # OCR all text in VS Code
steer click --text "Submit"               # Click the "Submit" button
steer type "Hello, world"                 # Type text

# Terminal automation
drive session create --name agent-1       # Create tmux session
drive run --session agent-1 "npm test"    # Run command and wait
drive logs --session agent-1              # Capture output
drive fanout --sessions a1,a2,a3 "build"  # Parallel execution

Project Structure

mac-mini-agent/
├── apps/
│   ├── steer/          # Swift CLI — macOS GUI automation
│   │   └── Sources/
│   │       └── steer/  # 14 command implementations
│   ├── drive/          # Python CLI — tmux terminal control
│   │   └── commands/   # 6 command implementations
│   ├── listen/         # Python — FastAPI job server
│   │   ├── jobs/       # YAML job state files
│   │   └── worker.py   # Agent spawner
│   └── direct/         # Python — CLI client for Listen
└── assets/
    ├── architecture-v3.svg
    ├── architecture-v3-animated.svg
    └── diagrams/       # Animated concept diagrams

Setup

The system runs across two machines: a Mac Mini that agents control autonomously, and your primary Mac where you work and send jobs. Each has a dedicated install command that handles dependencies, configuration, and verification.

Mac Mini Device (Agent Sandbox)

The Mac Mini is the agent's workstation — it runs Listen, executes jobs, and gives agents full GUI + terminal access.

Automated setup: Run the installer directly on the Mac Mini:

/install-agent-sandbox

This installs all dependencies (Swift, tmux, uv, just, yq, Claude Code), builds steer, sets up Python environments, and runs a 10-check verification suite covering screenshots, OCR, tmux sessions, the listen server, and more.

Manual prerequisites (must be done through System Settings — the installer will remind you):

Permission	Why	How
Accessibility	Steer needs to click, type, and read UI elements	System Settings > Privacy & Security > Accessibility > add Terminal
Screen Recording	Steer needs to capture screenshots (`steer see`)	System Settings > Privacy & Security > Screen Recording > add Terminal
Full Disk Access	Required for `systemsetup` and broad file access	System Settings > Privacy & Security > Full Disk Access > add Terminal
Remote Login (SSH)	Lets you manage the sandbox remotely	System Settings > General > Sharing > toggle Remote Login on

Once installed, start the agent server:

just listen    # Starts Listen on port 7600

Primary Device (Engineer Devbox)

Your laptop is where you send prompts and monitor jobs. It only needs the CLI client tools.

Automated setup: Run the installer on your primary Mac:

/install-engineer-devbox <mac-mini-ip-or-hostname>

This installs minimal dependencies (uv, just), configures the sandbox URL, and runs a 6-check verification suite covering network connectivity, the listen server, SSH access, and an end-to-end job test.

Once installed, send jobs:

just send "Open Safari, navigate to news.ycombinator.com, and read the top 3 headlines"
just jobs              # List all jobs
just job <id>          # Check a specific job
just stop <id>         # Kill a running job

SSH access (optional, for debugging):

ssh <user>@<mac-mini-ip>
# or via Bonjour
ssh <user>@<hostname>.local

Master Agentic Coding

Prepare for the future of software engineering

Software engineering has changed... Have you? Catch up and move beyond the AI tech industry with Tactical Agentic Coding

Follow the IndyDevDan YouTube channel to improve your agentic coding advantage.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
apps		apps
assets		assets
specs		specs
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mac Mini Agent

The Problem

The Solution

Who Is This For?

Apps

Steer — GUI Control

OCR: The Equalizer

Drive — Terminal Control

The Sentinel Pattern

Listen — Job Server

Direct — CLI Client

Key Patterns

Cross-App Pipelines

Agent-on-Agent Orchestration

Multi-Window Orchestration

Self-Healing Automation

Custom Agent Support

Quick Start

Project Structure

Setup

Mac Mini Device (Agent Sandbox)

Primary Device (Engineer Devbox)

Master Agentic Coding

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mac Mini Agent

The Problem

The Solution

Who Is This For?

Apps

Steer — GUI Control

OCR: The Equalizer

Drive — Terminal Control

The Sentinel Pattern

Listen — Job Server

Direct — CLI Client

Key Patterns

Cross-App Pipelines

Agent-on-Agent Orchestration

Multi-Window Orchestration

Self-Healing Automation

Custom Agent Support

Quick Start

Project Structure

Setup

Mac Mini Device (Agent Sandbox)

Primary Device (Engineer Devbox)

Master Agentic Coding

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages