Skip to content

Test and benchmark prompts accross LLM providers and models

Notifications You must be signed in to change notification settings

glesage/ReliaPrompt

Repository files navigation

Relia Prompt

Test and benchmark prompts accross LLM providers and models

This tool is aimed at agentic use-cases for large production applications that require fast and reliable llm calls. For example, extracting sentiment from social media posts, converting a sentence into structured JSON, etc.

Screenshot

Features

  • Multi-Provider Testing – OpenAI, Bedrock, DeepSeek, Gemini, Groq, OpenRouter
  • Parallel Execution – Run tests concurrently across all configured LLMs
  • Repeatability – Each test runs N times per model to measure consistency
  • Version Control – Full prompt history with easy rollback

Quick Start

# Install dependencies
bun install

# Start development server
bun dev

# Open http://localhost:3000

Configure API keys in the app's Configuration page. At least one provider is required.

Usage

  1. Prompts – Create and version your system prompts
  2. Test Cases – Add input/expected output pairs (JSON) for each prompt
  3. Test Runs – Execute tests and view per-model scores

Development

bun dev              # Backend with hot reload
bun dev:frontend     # Frontend dev server
bun run build        # Build frontend + backend
bun run lint         # Lint backend
bun run test         # Unit tests
bun run test:e2e     # E2E tests (Playwright)
bun run format       # Format code
bun run db:studio    # Drizzle Studio

Project Structure

├── src/                    # Backend (Express + Bun)
│   ├── server.ts           # API routes
│   ├── db/                  # Drizzle schema & init
│   ├── llm-clients/        # Provider clients
│   └── services/           # Test runner
├── frontend/               # SvelteKit app
│   └── src/
│       ├── lib/            # Components & stores
│       └── routes/         # Pages
├── drizzle/                # Database migrations
└── data/                   # SQLite database

License

MIT

About

Test and benchmark prompts accross LLM providers and models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •