An open-source platform for collaborative prompt engineering, batch generation and hybrid evaluation of LLM outputs
Live Instance · Demo Video · Paper (IJCAI 2026)
LLARS bridges the gap between domain experts and developers for building LLM-based systems. It integrates three tightly connected modules into an end-to-end pipeline:
- Collaborative Prompt Engineering — Real-time co-authoring with version control and instant LLM testing
- Batch Generation — Configurable output production across user-selected prompts x models x data with cost control
- Hybrid Evaluation — Human and LLM evaluators jointly assess outputs through diverse assessment methods, with live agreement metrics and provenance analysis
New prompts and models are automatically available for batch generation, and completed batches can be turned into evaluation scenarios with a single click.
Paper: LLARS: An Open-Source Platform for Collaborative Prompt Engineering, Batch Generation and Hybrid Evaluation — IJCAI-ECAI 2026 (Demo Track). The demo video link can be found in the footnote of the "Demo and Conclusion" section at the bottom of the paper.
| Link | |
|---|---|
| Live Instance | https://llars.e-beratungsinstitut.de |
| Demo Video (YouTube) | https://youtu.be/FdG1nJ7OqE0 |
| Paper | Included in this repository (Paper/ijcai26.pdf) |
| Category | Features |
|---|---|
| Prompt Engineering | Real-time collaborative editing (YJS CRDT), version control, instant LLM testing |
| Batch Generation | Multi-model x multi-prompt x multi-data generation with cost control |
| Evaluation | Rating, Ranking, Pairwise Comparison, Labeling, Authenticity Detection |
| LLM Evaluator | Automated evaluation using LLMs as judges with configurable metrics |
| Agreement Metrics | Krippendorff's Alpha, agreement heatmaps, provenance analysis |
| RAG Pipeline | Document-based retrieval with ChromaDB + hybrid search |
| Chatbot Builder | Chatbots with RAG integration and web crawler |
| Scenario Wizard | AI-assisted evaluation scenario setup from uploaded data (CSV, JSON, JSONL) |
| Auth & RBAC | Authentik OAuth2/OIDC + role-based access control |
| Design System | 35+ custom L-components with LLARS signature styling |
- Docker & Docker Compose (Install)
- Git
# 1. Clone the repository
git clone https://github.com/th-nuernberg/llars.git
cd llars
# 2. Configure environment variables
cp .env.template.development .env
# 3. Start LLARS
./start_llars.shThe script starts all Docker containers and configures Authentik automatically.
| Service | URL |
|---|---|
| Frontend | http://localhost:55080 |
| Backend API | http://localhost:55080/api |
| Authentik Admin | http://localhost:55095 |
| User | Password | Role |
|---|---|---|
| admin | admin123 | Administrator |
| researcher | admin123 | Researcher (can create scenarios) |
| evaluator | admin123 | Evaluator (participates in evaluations) |
| chatbot_manager | admin123 | Chatbot Manager |
nginx (:80) -> Reverse Proxy
|-- / -> Vue Frontend (:5173)
|-- /api/ -> Flask Backend (:8081)
|-- /auth/ -> Flask Auth -> Authentik
|-- /authentik/ -> Authentik UI (:9000)
|-- /collab/ -> YJS WebSocket (:8082)
Databases:
|-- MariaDB -> Application data
|-- PostgreSQL -> Authentik
|-- ChromaDB -> RAG vectors
Tech Stack:
- Backend: Flask 3.0 + MariaDB 11.2 + ChromaDB + Gunicorn/gevent (production)
- Frontend: Vue 3.4 + Vuetify 3.5 + Vite 5.1
- Realtime: Socket.IO + YJS CRDT
- Auth: Authentik (OAuth2/OIDC, RS256 JWT)
llars/
|-- app/ # Flask Backend
| |-- auth/ # Authentication
| |-- routes/ # API Endpoints
| |-- services/ # Business Logic
| |-- db/ # Database Models
| |-- schemas/ # Pydantic Schemas
|-- llars-frontend/ # Vue 3 Frontend
| |-- src/components/ # Vue Components (35+ L-components)
| |-- src/composables/ # Vue Composables
| |-- src/views/ # Page Views
|-- yjs-server/ # YJS WebSocket Server
|-- docker/ # Docker Configuration
|-- Paper/ # IJCAI 2026 Demo Paper
|-- scripts/ # Utility Scripts
|-- tests/ # Backend Tests
|-- start_llars.sh # Startup Script
|-- docker-compose.yml # Docker Compose
# Start (development)
./start_llars.sh
# Full restart (DELETES ALL DATA!)
REMOVE_LLARS_VOLUMES=True ./start_llars.sh
# Logs
docker compose logs -f backend-flask-service
docker compose logs -f frontend-vue-service
# Tests
pytest tests/ # Backend
cd llars-frontend && npm run test:run # Frontend
cd llars-frontend && npx playwright test # E2EKey environment variables in .env:
PROJECT_STATE=development # or production
PROJECT_URL=http://localhost:55080
NGINX_EXTERNAL_PORT=55080
OPENAI_API_KEY=sk-... # For LLM features
LITELLM_API_KEY=... # Optional for open-source models via LiteLLMThis project is licensed under the MIT License.
Technische Hochschule Nurnberg Georg Simon Ohm
Faculty of Computer Science, Center for Artificial Intelligence (KIZ)
Faculty of Social Sciences, Institute for E-Counselling
