I have almost 12,000 screenshots. Phone, laptop, years of them.
The pattern is always the same: see something interesting, screenshot it, think "I'll come back to this." Never do.
Recipes I meant to cook. Error messages I needed to debug. Articles I wanted to read properly. Code snippets. Memes. Random thoughts someone posted that resonated.
All sitting in folders, unsearchable, effectively lost.
This is a perfect task to outsource to AI. Each image takes a human 10-30 seconds to process mentally. A vision model does it in under a second, for fractions of a cent.
vex (vision extraction) batch-processes screenshots through Claude. It looks at each image, figures out what it is, and extracts the useful information. A recipe becomes searchable text. Technical content gets categorised. Pure images get described.
The output is a JSON dump of everything - searchable, sortable, finally usable.
Screenshots are write-only storage. Easy to capture, impossible to retrieve.
Search doesn't work on images. You can't grep a photo of a recipe. You can't find that error message from six months ago unless you remember exactly when you saw it.
This tool turns screenshots into structured data. Now they're searchable.
git clone https://github.com/teejayen/vex.git
cd vex
uv sync
# Add your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env# Test with a few images first
vex process /path/to/screenshots --limit 10
# Process a directory (realtime, one at a time)
vex process /path/to/screenshots
# Use batch API for large sets (50% cheaper, results in <24hr)
vex process /path/to/screenshots --batch
# Check batch progress
vex status msgbatch_xxx
# Download batch results
vex results msgbatch_xxx
# Sort processed files into category folders
vex organise results.jsonl --target /path/to/sorted/The extraction adapts to what it sees:
| Type | What You Get |
|---|---|
| Recipe | Title, full ingredients, method steps, servings, cook time |
| Code/Technical | Language, explanation, the actual code verbatim |
| Error Message | Exact error text, platform, stack trace, likely cause |
| Article/Text | Key points, quotes, author, source |
| Chat/Social | Who said what, platform, context, links shared |
| Document | Type, key content, dates, reference numbers |
| Meme/Image | Description, all text transcribed, context |
| UI/App | App name, screen shown, settings or data displayed |
| Shopping/Product | Item, price, store, specs, link |
| Map/Location | Place name, full address, directions |
| Receipt/Transaction | Merchant, amount, date, order number, items |
| Booking/Event | Event, date, time, venue, confirmation number |
| Contact Info | Name, phone, email, company, address |
| Music/Media | Song, artist, album, playlist, platform |
| Quote/Inspiration | The quote, attribution, source |
| Health/Fitness | Metrics, values, dates, what's tracked |
| Settings/Config | App, what settings, current values |
Everything gets a category and tags for filtering later.
Results land in a JSONL file (one JSON object per line):
{
"path": "/screenshots/IMG_4521.png",
"type": "article",
"category": "tech",
"summary": "Blog post about building Arc, a thinking partner that remembers context between sessions",
"extracted_content": "Building Arc: A Thinking Partner That Remembers\n\nArc is a personal AI system built on Claude that maintains context across sessions. Key features: persistent state via markdown files, journal for pattern detection over time, decision capture with reasoning, weekly reviews surfacing insights.\n\nThe difference between a stateless chatbot and a genuine thinking partner is memory.",
"tags": ["ai", "claude", "productivity", "arc", "thinking-partner"],
"source": "tim.neilen.com.au",
"processed_at": "2025-01-07T12:00:00",
"tokens_used": 1847
}Using Haiku 4.5 for 12,000 screenshots: roughly $10-15 USD.
Batch mode cuts that in half.
| Model | Realtime | Batch Mode |
|---|---|---|
| Haiku 4.5 | ~$12 | ~$6 |
| Sonnet 4.5 | ~$36 | ~$18 |
vex process <directory> Extract content from images
vex status <batch_id> Check batch processing status
vex results <batch_id> Download completed batch results
vex organise <jsonl> Sort files into category folders
-o, --output FILE Output file (default: vex-results.jsonl)
-m, --model MODEL Model to use (default: claude-haiku-4-5-20251001)
-r, --rate SECONDS Delay between requests (default: 0.5)
-l, --limit N Process only N images
--batch Use batch API (50% off, <24hr processing)
--no-resume Start fresh, ignore previous progress
--dry-run List files without processing
--target DIR Target directory for sorted files (required)
--copy Copy files instead of moving them
Processing saves progress as it goes. If it stops, run the same command again - it picks up where it left off.
For batch mode, use vex status and vex results to check progress and download when ready.
vex is also available as an AgentSkill - a portable format that lets AI agents process screenshots directly using their own vision capabilities, no script required.
The SKILL.md file contains instructions for agentic extraction. Any compatible agent can:
- Find images in a directory
- Analyse each image using vision
- Output structured JSONL
This means you can use vex's extraction logic in Claude Code, GitHub Copilot, or any agent that supports the AgentSkills format.
The extracted data becomes useful beyond search. Feed it to a personal knowledge system, surface patterns in what you capture, auto-route content to the right places.
I built Arc - a thinking partner that holds context between sessions. Screenshot data is perfect input: attention signals, things that mattered enough to capture, patterns over time.
Built by Tim Neilen because I was tired of screenshots being a black hole.
Written with AI. I provided the problem and direction; Claude wrote the code. More on how I use AI.
MIT License.