Skip to content

suhas-km/stranger-prompts

Repository files navigation

Stranger Prompts

Stranger Prompts

A continuous evaluation and iteration platform for AI prompt systems. Built with Next.js, Supabase, Prisma, and Inngest.

🚀 Live Demo: https://stranger-prompts.vercel.app

Features

  • Level 0 (Core): Define prompt systems, upload datasets, run evaluations, view results
  • Level 1 (Monitor): Schedule automated runs, detect drastic output changes, Slack DM alerts
  • Level 2 (Experimentation): Cross-model comparison, side-by-side results
  • Level 3 (Optimize): Iterative prompt optimization

Key Logic & Features

  • De-duplication Logic: To maintain a clean history, the system only creates a new version if the template or model configuration actually changes. If you save a prompt that matches an existing version, it simply links back to that one.
  • Genetic Optimization (Level 3): Iterative loop that improves prompts by analyzing failing examples.
    • Optimizer Model: Uses GPT-4o by default as the expert prompt engineer.
    • Process: Analyzes worst K failing rows per iteration, proposes incremental template improvements, and prunes candidates based on aggregate scores.
    • Safety: Automatically detects and rejects "reward-hacking" templates that overfit to specific test strings.

Tech Stack

  • Frontend: Next.js 14 (App Router), React 18, TypeScript, Tailwind CSS, shadcn/ui
  • Auth: Supabase Auth (Google SSO)
  • Database: Supabase Postgres + Prisma ORM
  • Background Jobs: Inngest
  • LLM Providers: OpenAI, Anthropic, Google Gemini (BYOK)

Prerequisites

  • Node.js 18+
  • npm or yarn
  • Supabase account (free tier works)
  • Inngest account (free tier works)

Quick Start (Raw Mac Setup)

1. Clone and Install

cd /path/to/PromptOps
npm install

2. Set Up Supabase

  1. Create a new Supabase project at https://supabase.com
  2. Go to Authentication → Providers → Enable Google
  3. Configure Google OAuth credentials in your Google Cloud Console
  4. Get your Supabase credentials from Settings → API

3. Configure Environment

cp .env.example .env

Edit .env with your values:

# Supabase
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key

# Database (from Supabase Settings → Database → Connection string)
DATABASE_URL=postgresql://postgres:password@db.your-project.supabase.co:5432/postgres

# Encryption key for API keys (generate with: openssl rand -hex 32)
ENCRYPTION_KEY=your-64-char-hex-key

# Optional: Slack Bot Token for DM alerts
SLACK_BOT_TOKEN=xoxb-your-slack-bot-token

4. Set Up Database

# Generate Prisma client
npm run db:generate

# Push schema to database
npm run db:push

# Seed with sample data
npm run seed

5. Start Development Server

In one terminal:

npm run dev

In another terminal (for background jobs):

npx inngest-cli@latest dev

6. Access the App


Production Deployment

This section covers deploying Stranger Prompts to production with Vercel, Supabase, and Inngest.

1. Google OAuth Setup

  1. Go to Google Cloud Console
  2. Create a new project (or select existing)
  3. Navigate to APIs & Services → OAuth consent screen
    • Choose "External" user type
    • Fill in app name, support email, and developer contact
    • Add scopes: email, profile, openid
  4. Go to APIs & Services → Credentials → Create Credentials → OAuth 2.0 Client ID
    • Application type: Web application
    • Name: Stranger Prompts
    • Authorized redirect URIs: https://YOUR-PROJECT.supabase.co/auth/v1/callback
  5. Copy the Client ID and Client Secret for Supabase setup

2. Supabase Setup

  1. Create a new project at supabase.com
  2. Configure Google Auth:
    • Go to Authentication → Providers → Google
    • Enable Google provider
    • Paste your Google Client ID and Client Secret
    • Save
  3. Get API Keys from Settings → API:
    NEXT_PUBLIC_SUPABASE_URL=https://YOUR-PROJECT.supabase.co
    NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...
    SUPABASE_SERVICE_ROLE_KEY=eyJ...
    
  4. Get Database Connection Strings from Settings → Database → Connection string:
    • Transaction Pooler (for serverless/Vercel):
      DATABASE_URL=postgresql://postgres.YOUR-PROJECT:[PASSWORD]@aws-0-REGION.pooler.supabase.com:6543/postgres?pgbouncer=true
      
    • Direct Connection (for migrations):
      DIRECT_URL=postgresql://postgres.YOUR-PROJECT:[PASSWORD]@aws-0-REGION.pooler.supabase.com:5432/postgres
      

3. Inngest Setup

  1. Create an account at inngest.com
  2. Create a new app in the Inngest Dashboard
  3. Go to Manage → Signing Key to get your keys:
    INNGEST_EVENT_KEY=your-event-key
    INNGEST_SIGNING_KEY=signkey-prod-...
    
  4. Note: You'll configure the app URL after Vercel deployment

4. Vercel Deployment

  1. Deploy to Vercel:

    npm i -g vercel
    vercel

    Or connect your GitHub repo to Vercel for automatic deployments.

  2. Add Environment Variables in Vercel Dashboard → Settings → Environment Variables:

    • All variables from .env.example
    • Make sure DATABASE_URL uses the pooler connection with ?pgbouncer=true
  3. Run Database Migrations:

    # Locally with DIRECT_URL set
    npx prisma migrate deploy
  4. Configure Inngest App URL:

    • In Inngest Dashboard → Your App → App URL
    • Set to: https://your-app.vercel.app/api/inngest

5. Vercel Deployment Protection Bypass (Critical for Inngest)

If you have Vercel deployment protection enabled (preview deployments, password protection, etc.), Inngest won't be able to reach your /api/inngest endpoint. You must configure a bypass:

  1. In Vercel Dashboard:

    • Go to Settings → Deployment Protection
    • Scroll to Protection Bypass for Automation
    • Click Generate Secret
    • Copy the generated secret
  2. In Inngest Dashboard:

    • Go to your App → Settings
    • Find Vercel Protection Bypass
    • Paste the bypass secret
  3. This allows Inngest to invoke your background functions even when deployment protection is enabled.

6. Slack Integration Setup

Slack integration enables DM alerts for output change detection and quality regressions.

  1. Create a Slack App:

    • Go to api.slack.com/apps
    • Click Create New App → From scratch
    • Name: Stranger Prompts Alerts
    • Select your workspace
  2. Configure Bot Token Scopes:

    • Go to OAuth & Permissions → Scopes → Bot Token Scopes
    • Add these scopes:
      • users:read.email (lookup users by email)
      • chat:write (send messages)
      • im:write (open DM channels)
  3. Install to Workspace:

    • Go to OAuth & Permissions → Install to Workspace
    • Authorize the app
    • Copy the Bot User OAuth Token (xoxb-...)
    SLACK_BOT_TOKEN=xoxb-your-token
    
  4. Create Workspace Invite Link (Optional):

    • In Slack, go to your workspace settings
    • Create a shared invite link
    NEXT_PUBLIC_SLACK_INVITE_URL=https://join.slack.com/t/your-workspace/shared_invite/...
    
  5. User Setup:

    • Users must join your Slack workspace
    • In the app Settings page, users enter their Slack email address
    • They can test the integration with "Send Test DM"

Environment Variables Reference

Variable Required Description
NEXT_PUBLIC_SUPABASE_URL Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY Supabase anonymous/public key
SUPABASE_SERVICE_ROLE_KEY Supabase service role key (server-side only)
DATABASE_URL Postgres connection string (use pooler with ?pgbouncer=true for Vercel)
DIRECT_URL Postgres direct connection (for migrations)
ENCRYPTION_KEY 64-character hex key for BYOK encryption (openssl rand -hex 32)
INNGEST_EVENT_KEY Inngest event key for sending events
INNGEST_SIGNING_KEY Inngest signing key for webhook verification
SLACK_BOT_TOKEN Slack bot token for DM alerts (xoxb-...)
NEXT_PUBLIC_SLACK_INVITE_URL Slack workspace invite link (shown in Settings UI)

UI Testing Walkthrough (Levels 0–2)

First-Time Setup

  1. Sign in with Google at http://localhost:3000
  2. Go to Settings → Add your OpenAI API key → Click "Save Key"
  3. Click "Test" on the saved key row to verify it works
  4. (Optional) Add your Slack Member ID for output change alerts

Claim Seeded Data

If you ran npm run seed, claim the demo data for your account:

# In browser console or via curl:
fetch('/api/dev/claim-demo-data', {method: 'POST'}).then(r => r.json()).then(console.log)

Refresh the dashboard to see "Movie Sentiment Classifier".


Level 0: Core Testing (via UI)

Step UI Action
1 Dashboard → Click "Movie Sentiment Classifier" (or create new system)
2 Datasets tab → Upload a CSV or JSONL file (download samples from the links)
3 Evaluations tab → Create an eval config (e.g., CONTAINS type)
4 Run Test tab → Select dataset + eval → Click "Run Now"
5 Watch "Recent Runs" panel update → Click a run to see row-by-row results

Level 1: Scheduling & Monitoring (via UI)

Step UI Action
1 System page → Run Test tab → Select dataset & eval
2 Set interval (e.g., 60 seconds for testing, or use presets: Hourly/Daily/Weekly)
3 Click "Save Schedule Configuration" to save your settings
4 Toggle the schedule switch ON to enable automatic runs
5 Watch Inngest dashboard (http://localhost:8288) → See system-scheduler function start
6 After 2+ scheduled runs, output-change-alert compares outputs
7 Check Notifications page for alerts (Dashboard → Notifications)
8 Toggle the schedule switch OFF when done

Note: The scheduler is event-driven—it only runs when scheduling is enabled for a system, not polling every minute.

Level 2: Cross-Model Comparison (via UI)

Step UI Action
1 System page → Compare Models tab
2 Select dataset and evaluation config
3 Click "+ Add Model" → Select provider/model (e.g., gpt-4o)
4 Add more models to compare (e.g., gpt-4o-mini, claude-3-haiku)
5 Click "Run Comparison"
6 Results appear below → Click each run to see detailed scores

Level 3: Optimization (Bonus, via UI)

Step UI Action
1 System page → Optimize tab
2 Select dataset and evaluation config
3 Set max iterations (1-10) and target score (0-1)
4 Click "Start Optimization"
5 Check Inngest dashboard for optimize function progress
6 New prompt versions created after each iteration

Sample Datasets

Download from UI or find in /public/sample/:

  • reviews.csv: Movie reviews → sentiment (positive/negative/neutral)
  • toxicity.jsonl: Text → toxicity classification (toxic/not_toxic)

Custom dataset requirements:

  • CSV: columns matching {{variables}} in prompt + expected column
  • JSONL: each line {"inputs": {...}, "expected": "..."}
  • Max 10,000 rows

Dataset Format

CSV

review,expected
"Great movie!",positive
"Terrible film.",negative

JSONL

{"review": "Great movie!", "expected": "positive"}
{"review": "Terrible film.", "expected": "negative"}

Required: Column/field named expected

Evaluation Types

Type Description
EXACT_MATCH Output must exactly match expected (case-insensitive)
CONTAINS Output must contain expected string
REGEX Output must match regex pattern
JSON_SCHEMA Output JSON must validate against schema
LLM_JUDGE LLM evaluates output quality (strict JSON response)

API Endpoints

Method Endpoint Description
GET/POST /api/systems List/create prompt systems
GET/PATCH /api/systems/:id Get/update system
POST /api/systems/:id/versions Create new version
POST /api/systems/:id/schedule Configure scheduling
GET/POST /api/datasets List/upload datasets
GET/POST /api/runs List/create test runs
GET /api/runs/:id/results Get paginated results
GET/POST /api/keys List/save API keys
POST /api/keys/test Test API key validity
POST /api/compare Start cross-model comparison
POST /api/optimize Start optimization loop
GET/PATCH /api/notifications List/mark notifications read
GET/PATCH /api/user Get/update user settings

Architecture

src/
├── app/                    # Next.js App Router pages
│   ├── api/               # API routes
│   ├── dashboard/         # Dashboard page
│   ├── login/             # Login page
│   └── settings/          # Settings page
├── components/ui/         # shadcn/ui components
└── lib/
    ├── inngest/           # Inngest functions
    │   ├── client.ts      # Inngest client + event definitions
    │   ├── index.ts       # Function exports
    │   └── functions/     # Background job handlers
    │       ├── dataset-ingest.ts    # Dataset ingestion
    │       ├── run-execute.ts       # Test run execution
    │       ├── system-scheduler.ts  # Event-driven scheduler (per-system)
    │       ├── output-change-alert.ts # Output change detection
    │       └── optimize.ts          # Prompt optimization loop
    ├── llm/               # LLM provider adapters
    ├── eval/              # Evaluation logic
    ├── supabase/          # Supabase client utilities
    ├── crypto.ts          # Encryption for BYOK
    ├── prisma.ts          # Prisma client
    ├── slack.ts           # Slack DM integration
    └── utils.ts           # Utility functions

Inngest Events

Event Description
dataset/ingest.requested Triggered when a dataset is uploaded
run/execute.requested Triggered to start a test run
run/completed Emitted when a run finishes (triggers output change detection)
system/schedule.started Starts the event-driven scheduler for a system
system/schedule.stopped Cancels the scheduler for a system
optimize/start.requested Starts the optimization loop

Robustness Features

  • Row-level fault tolerance: Individual row failures don't crash the run
  • Run failure threshold: Run marked FAILED only if >50% rows fail
  • Retry with backoff: Transient errors retry with exponential backoff + jitter
  • Idempotent results: Upsert on (runId, rowIndex) prevents duplicates
  • Pileup prevention: Scheduler skips if a run is already QUEUED/RUNNING
  • Concurrency limits: Per-user run limits, per-dataset ingestion limits
  • Event-driven scheduling: Schedulers only run when enabled, not polling every minute
  • Scheduler lifecycle management: Interval changes automatically restart the scheduler with new settings

License

This project is licensed under the Business Source License 1.1 (BSL 1.1).

  • Permitted: Non-production use, internal use, modifications, contributions
  • Not Permitted: Offering as a competing Prompt Evaluation Service
  • Change Date: January 1, 2029 (converts to Apache 2.0)

See LICENSE for full terms. For commercial licensing inquiries, please contact the maintainer.

About

One stop promptOps platform to track, optimize and store your agentic prompts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors