Skip to content

Latest commit

 

History

History
266 lines (192 loc) · 8.28 KB

File metadata and controls

266 lines (192 loc) · 8.28 KB

Agent Instructions

TL;DR: Test skill docs against multiple AI models. Run bun run src/cli.ts test <skill-file.md> "<task>" -m qwen/qwen3-coder:free and see what models found confusing.

Prerequisites

Before using Focus Group, you need:

  • Bun runtime — Install from https://bun.sh (curl -fsSL https://bun.sh/install | bash)
  • OpenRouter API key — Free signup at https://openrouter.ai/keys
  • A skill file — Markdown doc describing your tool (see "What's a Skill File?" below)

What This Is

Focus Group tests AI tool documentation by sending it to multiple AI models along with a task. The models respond as QA testers — reporting what confused them and suggesting improvements. This helps you find documentation gaps before shipping.

What's a Skill File?

A skill file is a markdown document containing instructions for AI agents. It describes a tool, API, or capability that an AI should be able to use. See examples/sample-skill.md for a complete example.

# Podcast Generator

Generate podcasts with multiple voices.

## Usage
podcast-gen --voices <count> --duration <minutes> --output <file>

## Parameters
- `--voices`: Number of distinct voices (1-5)
- `--duration`: Length in minutes
- `--output`: Output file path (.mp3)

## Example
podcast-gen --voices 3 --duration 30 --output episode.mp3

Installation

# Clone and install
git clone https://github.com/EmZod/Agent-Focus-Group
cd Agent-Focus-Group
bun install

Important: All commands must be run from inside the Agent-Focus-Group directory.

Quick Start (Free, Zero Cost)

# 1. Set your API key (get from https://openrouter.ai/keys)
export OPENROUTER_API_KEY="sk-or-v1-..."

# 2. Create a test skill file
cat > test-skill.md << 'EOF'
# Test Tool
Does something useful.

## Usage
test-tool --input <file>

## Example
test-tool --input data.txt
EOF

# 3. Run test with FREE model (zero cost)
bun run src/cli.ts test test-skill.md "Process the data in data.txt" -m qwen/qwen3-coder:free

# 4. Results appear in ~30-60 seconds
# View saved results anytime:
bun run src/cli.ts show latest

Tip: To avoid re-entering your API key each session, add it to your shell config:

echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc  # or ~/.bashrc

Cost Warning

COMMANDS WITHOUT -m FLAG COST MONEY (~$0.005 per test)

[BAD]  bun run src/cli.ts test skill.md "task"                        # COSTS MONEY
[GOOD] bun run src/cli.ts test skill.md "task" -m qwen/qwen3-coder:free  # FREE

When no -m flag is provided, Focus Group uses paid models. Always include -m qwen/qwen3-coder:free while learning.

Model IDs

All model IDs use the format provider/model-name. Free models add :free suffix.

Format Example Cost
provider/model openai/gpt-5-mini Paid
provider/model:free qwen/qwen3-coder:free Free

Without :free, the model uses the paid version. For example:

  • qwen/qwen3-coder = paid
  • qwen/qwen3-coder:free = free

Writing Good Tasks

Tasks should be specific and actionable — something a user would actually ask an AI to do.

Good Tasks Bad Tasks
"Generate a 30-minute podcast with 3 voices" "Complete this task"
"Create a user with email jay@example.com" "Test the tool"
"Convert the PDF at ./report.pdf to markdown" "Use this"

Quoting: Tasks must be in quotes. For tasks containing quotes, escape them:

bun run src/cli.ts test skill.md "Create user named \"admin\""

Commands

# Run tests (from Agent-Focus-Group directory)
bun run src/cli.ts test <skill.md> "<task>" -m qwen/qwen3-coder:free  # FREE
bun run src/cli.ts test <skill.md> "<task>"                    # Paid (~$0.005)
bun run src/cli.ts test <skill.md> "<task>" -m model1,model2   # Specific models
bun run src/cli.ts test <skill.md> "<task>" -p expensive       # Use preset
bun run src/cli.ts test <skill.md> "<task>" --no-save          # Don't save to DB
bun run src/cli.ts test <skill.md> "<task>" --timeout 120      # Longer timeout

# View results
bun run src/cli.ts show latest              # View last run
bun run src/cli.ts show <run-id>            # View specific run
bun run src/cli.ts history                  # List all runs
bun run src/cli.ts diff <run1> <run2>       # Compare runs
bun run src/cli.ts cost                     # Show API costs
bun run src/cli.ts config                   # Show configuration

File paths: Can be relative (./skill.md) or absolute (/home/user/skill.md).

Expected Timing

Model Type Response Time
Free models (:free) 30-60 seconds
Paid models (cheap preset) 10-20 seconds
Frontier models (opus, o3) 20-40 seconds

If nothing happens after 2 minutes, check Troubleshooting below.

Example Output

Focus Group Test
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Skill: test-skill.md
Task:  Process the data in data.txt
Models: 1

  qwen/qwen3-coder:free               [OK]  45.2s  $0.000

Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Completed: 1/1 models

Common Confusions:
  • "input file format unclear" (1/1 models)

Suggested Improvements:
  • Specify supported file formats (txt, csv, json)

How to read this:

  • [OK] or checkmark = model responded successfully (not that your docs are perfect!)
  • (1/1 models) = all tested models identified this issue
  • Higher counts = more serious documentation gaps
  • Results are saved to local database automatically

Free Models

# Single free model
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free

# Multiple free models (comma-separated, no spaces)
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free,meta-llama/llama-3.2-3b-instruct:free

Free models require OPENROUTER_API_KEY but have zero cost. They're slower than paid models.

Model Presets

Preset Models Cost Use When
cheap (default) gpt-5-mini, claude-haiku-4.5, gemini-2.5-flash ~$0.005 Quick iteration
expensive gpt-5, claude-sonnet-4.5, gemini-2.5-pro ~$0.05 Before shipping
frontier claude-opus-4.5, gpt-5.2-pro, o3-pro ~$0.20 Critical docs
comprehensive 8 models across all families ~$0.10 Full coverage

Usage: bun run src/cli.ts test ./skill.md "task" -p expensive

Data Storage

Results are automatically saved to a local SQLite database (no setup required):

  • macOS: ~/Library/Application Support/focus-group/
  • Linux: ~/.local/share/focus-group/

Use bun run src/cli.ts config to see exact paths.

Troubleshooting

"command not found: bun"

curl -fsSL https://bun.sh/install | bash
source ~/.zshrc  # or restart terminal

"Cannot find module" or "Cannot find package" You're likely running from the wrong directory. All commands must be run from inside the Agent-Focus-Group folder:

cd Agent-Focus-Group && bun install

"OPENROUTER_API_KEY not set"

export OPENROUTER_API_KEY="sk-or-v1-..."  # Get from https://openrouter.ai/keys

"Model not found"

"Request timed out"

  • Free models are slower. Add --timeout 120 for 2-minute timeout.
  • Check your internet connection.

"File not found"

  • Ensure skill file exists at the specified path
  • Try absolute path: /full/path/to/skill.md

No output after 2+ minutes

  • Free models can be slow during high traffic
  • Try a different free model or use --timeout 180
  • Check if OpenRouter is having issues at https://status.openrouter.ai

When to Use

  • Writing new skill documentation
  • Updating existing skill docs
  • Before shipping docs to production
  • Debugging why models misunderstand instructions

Development

bun install              # Install deps
bun test                 # Run tests (45 tests)
bun run typecheck        # Type check
bun run src/cli.ts       # Run CLI directly

Key source files:

  • src/core/runner.ts — Test execution
  • src/core/prompts.ts — System prompts sent to models
  • src/config/defaults.ts — Model presets