A continuous evaluation and iteration platform for AI prompt systems. Built with Next.js, Supabase, Prisma, and Inngest.
🚀 Live Demo: https://stranger-prompts.vercel.app
- Level 0 (Core): Define prompt systems, upload datasets, run evaluations, view results
- Level 1 (Monitor): Schedule automated runs, detect drastic output changes, Slack DM alerts
- Level 2 (Experimentation): Cross-model comparison, side-by-side results
- Level 3 (Optimize): Iterative prompt optimization
- De-duplication Logic: To maintain a clean history, the system only creates a new version if the template or model configuration actually changes. If you save a prompt that matches an existing version, it simply links back to that one.
- Genetic Optimization (Level 3): Iterative loop that improves prompts by analyzing failing examples.
- Optimizer Model: Uses GPT-4o by default as the expert prompt engineer.
- Process: Analyzes worst K failing rows per iteration, proposes incremental template improvements, and prunes candidates based on aggregate scores.
- Safety: Automatically detects and rejects "reward-hacking" templates that overfit to specific test strings.
- Frontend: Next.js 14 (App Router), React 18, TypeScript, Tailwind CSS, shadcn/ui
- Auth: Supabase Auth (Google SSO)
- Database: Supabase Postgres + Prisma ORM
- Background Jobs: Inngest
- LLM Providers: OpenAI, Anthropic, Google Gemini (BYOK)
- Node.js 18+
- npm or yarn
- Supabase account (free tier works)
- Inngest account (free tier works)
cd /path/to/PromptOps
npm install- Create a new Supabase project at https://supabase.com
- Go to Authentication → Providers → Enable Google
- Configure Google OAuth credentials in your Google Cloud Console
- Get your Supabase credentials from Settings → API
cp .env.example .envEdit .env with your values:
# Supabase
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
# Database (from Supabase Settings → Database → Connection string)
DATABASE_URL=postgresql://postgres:password@db.your-project.supabase.co:5432/postgres
# Encryption key for API keys (generate with: openssl rand -hex 32)
ENCRYPTION_KEY=your-64-char-hex-key
# Optional: Slack Bot Token for DM alerts
SLACK_BOT_TOKEN=xoxb-your-slack-bot-token# Generate Prisma client
npm run db:generate
# Push schema to database
npm run db:push
# Seed with sample data
npm run seedIn one terminal:
npm run devIn another terminal (for background jobs):
npx inngest-cli@latest dev- App: http://localhost:3000
- Inngest Dashboard: http://localhost:8288
This section covers deploying Stranger Prompts to production with Vercel, Supabase, and Inngest.
- Go to Google Cloud Console
- Create a new project (or select existing)
- Navigate to APIs & Services → OAuth consent screen
- Choose "External" user type
- Fill in app name, support email, and developer contact
- Add scopes:
email,profile,openid
- Go to APIs & Services → Credentials → Create Credentials → OAuth 2.0 Client ID
- Application type: Web application
- Name:
Stranger Prompts - Authorized redirect URIs:
https://YOUR-PROJECT.supabase.co/auth/v1/callback
- Copy the Client ID and Client Secret for Supabase setup
- Create a new project at supabase.com
- Configure Google Auth:
- Go to Authentication → Providers → Google
- Enable Google provider
- Paste your Google Client ID and Client Secret
- Save
- Get API Keys from Settings → API:
NEXT_PUBLIC_SUPABASE_URL=https://YOUR-PROJECT.supabase.co NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ... SUPABASE_SERVICE_ROLE_KEY=eyJ... - Get Database Connection Strings from Settings → Database → Connection string:
- Transaction Pooler (for serverless/Vercel):
DATABASE_URL=postgresql://postgres.YOUR-PROJECT:[PASSWORD]@aws-0-REGION.pooler.supabase.com:6543/postgres?pgbouncer=true - Direct Connection (for migrations):
DIRECT_URL=postgresql://postgres.YOUR-PROJECT:[PASSWORD]@aws-0-REGION.pooler.supabase.com:5432/postgres
- Transaction Pooler (for serverless/Vercel):
- Create an account at inngest.com
- Create a new app in the Inngest Dashboard
- Go to Manage → Signing Key to get your keys:
INNGEST_EVENT_KEY=your-event-key INNGEST_SIGNING_KEY=signkey-prod-... - Note: You'll configure the app URL after Vercel deployment
-
Deploy to Vercel:
npm i -g vercel vercel
Or connect your GitHub repo to Vercel for automatic deployments.
-
Add Environment Variables in Vercel Dashboard → Settings → Environment Variables:
- All variables from
.env.example - Make sure
DATABASE_URLuses the pooler connection with?pgbouncer=true
- All variables from
-
Run Database Migrations:
# Locally with DIRECT_URL set npx prisma migrate deploy -
Configure Inngest App URL:
- In Inngest Dashboard → Your App → App URL
- Set to:
https://your-app.vercel.app/api/inngest
If you have Vercel deployment protection enabled (preview deployments, password protection, etc.), Inngest won't be able to reach your /api/inngest endpoint. You must configure a bypass:
-
In Vercel Dashboard:
- Go to Settings → Deployment Protection
- Scroll to Protection Bypass for Automation
- Click Generate Secret
- Copy the generated secret
-
In Inngest Dashboard:
- Go to your App → Settings
- Find Vercel Protection Bypass
- Paste the bypass secret
-
This allows Inngest to invoke your background functions even when deployment protection is enabled.
Slack integration enables DM alerts for output change detection and quality regressions.
-
Create a Slack App:
- Go to api.slack.com/apps
- Click Create New App → From scratch
- Name:
Stranger Prompts Alerts - Select your workspace
-
Configure Bot Token Scopes:
- Go to OAuth & Permissions → Scopes → Bot Token Scopes
- Add these scopes:
users:read.email(lookup users by email)chat:write(send messages)im:write(open DM channels)
-
Install to Workspace:
- Go to OAuth & Permissions → Install to Workspace
- Authorize the app
- Copy the Bot User OAuth Token (
xoxb-...)
SLACK_BOT_TOKEN=xoxb-your-token -
Create Workspace Invite Link (Optional):
- In Slack, go to your workspace settings
- Create a shared invite link
NEXT_PUBLIC_SLACK_INVITE_URL=https://join.slack.com/t/your-workspace/shared_invite/... -
User Setup:
- Users must join your Slack workspace
- In the app Settings page, users enter their Slack email address
- They can test the integration with "Send Test DM"
| Variable | Required | Description |
|---|---|---|
NEXT_PUBLIC_SUPABASE_URL |
✅ | Supabase project URL |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
✅ | Supabase anonymous/public key |
SUPABASE_SERVICE_ROLE_KEY |
✅ | Supabase service role key (server-side only) |
DATABASE_URL |
✅ | Postgres connection string (use pooler with ?pgbouncer=true for Vercel) |
DIRECT_URL |
✅ | Postgres direct connection (for migrations) |
ENCRYPTION_KEY |
✅ | 64-character hex key for BYOK encryption (openssl rand -hex 32) |
INNGEST_EVENT_KEY |
✅ | Inngest event key for sending events |
INNGEST_SIGNING_KEY |
✅ | Inngest signing key for webhook verification |
SLACK_BOT_TOKEN |
❌ | Slack bot token for DM alerts (xoxb-...) |
NEXT_PUBLIC_SLACK_INVITE_URL |
❌ | Slack workspace invite link (shown in Settings UI) |
- Sign in with Google at http://localhost:3000
- Go to Settings → Add your OpenAI API key → Click "Save Key"
- Click "Test" on the saved key row to verify it works
- (Optional) Add your Slack Member ID for output change alerts
If you ran npm run seed, claim the demo data for your account:
# In browser console or via curl:
fetch('/api/dev/claim-demo-data', {method: 'POST'}).then(r => r.json()).then(console.log)Refresh the dashboard to see "Movie Sentiment Classifier".
| Step | UI Action |
|---|---|
| 1 | Dashboard → Click "Movie Sentiment Classifier" (or create new system) |
| 2 | Datasets tab → Upload a CSV or JSONL file (download samples from the links) |
| 3 | Evaluations tab → Create an eval config (e.g., CONTAINS type) |
| 4 | Run Test tab → Select dataset + eval → Click "Run Now" |
| 5 | Watch "Recent Runs" panel update → Click a run to see row-by-row results |
| Step | UI Action |
|---|---|
| 1 | System page → Run Test tab → Select dataset & eval |
| 2 | Set interval (e.g., 60 seconds for testing, or use presets: Hourly/Daily/Weekly) |
| 3 | Click "Save Schedule Configuration" to save your settings |
| 4 | Toggle the schedule switch ON to enable automatic runs |
| 5 | Watch Inngest dashboard (http://localhost:8288) → See system-scheduler function start |
| 6 | After 2+ scheduled runs, output-change-alert compares outputs |
| 7 | Check Notifications page for alerts (Dashboard → Notifications) |
| 8 | Toggle the schedule switch OFF when done |
Note: The scheduler is event-driven—it only runs when scheduling is enabled for a system, not polling every minute.
| Step | UI Action |
|---|---|
| 1 | System page → Compare Models tab |
| 2 | Select dataset and evaluation config |
| 3 | Click "+ Add Model" → Select provider/model (e.g., gpt-4o) |
| 4 | Add more models to compare (e.g., gpt-4o-mini, claude-3-haiku) |
| 5 | Click "Run Comparison" |
| 6 | Results appear below → Click each run to see detailed scores |
| Step | UI Action |
|---|---|
| 1 | System page → Optimize tab |
| 2 | Select dataset and evaluation config |
| 3 | Set max iterations (1-10) and target score (0-1) |
| 4 | Click "Start Optimization" |
| 5 | Check Inngest dashboard for optimize function progress |
| 6 | New prompt versions created after each iteration |
Download from UI or find in /public/sample/:
- reviews.csv: Movie reviews → sentiment (positive/negative/neutral)
- toxicity.jsonl: Text → toxicity classification (toxic/not_toxic)
Custom dataset requirements:
- CSV: columns matching
{{variables}}in prompt +expectedcolumn - JSONL: each line
{"inputs": {...}, "expected": "..."} - Max 10,000 rows
review,expected
"Great movie!",positive
"Terrible film.",negative{"review": "Great movie!", "expected": "positive"}
{"review": "Terrible film.", "expected": "negative"}Required: Column/field named expected
| Type | Description |
|---|---|
EXACT_MATCH |
Output must exactly match expected (case-insensitive) |
CONTAINS |
Output must contain expected string |
REGEX |
Output must match regex pattern |
JSON_SCHEMA |
Output JSON must validate against schema |
LLM_JUDGE |
LLM evaluates output quality (strict JSON response) |
| Method | Endpoint | Description |
|---|---|---|
| GET/POST | /api/systems |
List/create prompt systems |
| GET/PATCH | /api/systems/:id |
Get/update system |
| POST | /api/systems/:id/versions |
Create new version |
| POST | /api/systems/:id/schedule |
Configure scheduling |
| GET/POST | /api/datasets |
List/upload datasets |
| GET/POST | /api/runs |
List/create test runs |
| GET | /api/runs/:id/results |
Get paginated results |
| GET/POST | /api/keys |
List/save API keys |
| POST | /api/keys/test |
Test API key validity |
| POST | /api/compare |
Start cross-model comparison |
| POST | /api/optimize |
Start optimization loop |
| GET/PATCH | /api/notifications |
List/mark notifications read |
| GET/PATCH | /api/user |
Get/update user settings |
src/
├── app/ # Next.js App Router pages
│ ├── api/ # API routes
│ ├── dashboard/ # Dashboard page
│ ├── login/ # Login page
│ └── settings/ # Settings page
├── components/ui/ # shadcn/ui components
└── lib/
├── inngest/ # Inngest functions
│ ├── client.ts # Inngest client + event definitions
│ ├── index.ts # Function exports
│ └── functions/ # Background job handlers
│ ├── dataset-ingest.ts # Dataset ingestion
│ ├── run-execute.ts # Test run execution
│ ├── system-scheduler.ts # Event-driven scheduler (per-system)
│ ├── output-change-alert.ts # Output change detection
│ └── optimize.ts # Prompt optimization loop
├── llm/ # LLM provider adapters
├── eval/ # Evaluation logic
├── supabase/ # Supabase client utilities
├── crypto.ts # Encryption for BYOK
├── prisma.ts # Prisma client
├── slack.ts # Slack DM integration
└── utils.ts # Utility functions
| Event | Description |
|---|---|
dataset/ingest.requested |
Triggered when a dataset is uploaded |
run/execute.requested |
Triggered to start a test run |
run/completed |
Emitted when a run finishes (triggers output change detection) |
system/schedule.started |
Starts the event-driven scheduler for a system |
system/schedule.stopped |
Cancels the scheduler for a system |
optimize/start.requested |
Starts the optimization loop |
- Row-level fault tolerance: Individual row failures don't crash the run
- Run failure threshold: Run marked FAILED only if >50% rows fail
- Retry with backoff: Transient errors retry with exponential backoff + jitter
- Idempotent results: Upsert on
(runId, rowIndex)prevents duplicates - Pileup prevention: Scheduler skips if a run is already QUEUED/RUNNING
- Concurrency limits: Per-user run limits, per-dataset ingestion limits
- Event-driven scheduling: Schedulers only run when enabled, not polling every minute
- Scheduler lifecycle management: Interval changes automatically restart the scheduler with new settings
This project is licensed under the Business Source License 1.1 (BSL 1.1).
- Permitted: Non-production use, internal use, modifications, contributions
- Not Permitted: Offering as a competing Prompt Evaluation Service
- Change Date: January 1, 2029 (converts to Apache 2.0)
See LICENSE for full terms. For commercial licensing inquiries, please contact the maintainer.
